Friday, October 17, 2014

Teiid Platform Sizing Guidelines and Limitations

Users/customers always ask us about the sizing of their Data Virtaulization infrastructure based on Teiid or the JDV product from Redhat. Typically this is very involved question and not a very easy one answer in plain terms. This is due to fact that it involves taking into consideration questions like:
  • What kind of sources that user is working with? Relational, file, CRM, NoSQL etc.
  • How many sources they are trying to integrate? 10, 20, 100?
  • What are the volumes of data they are working with? 10K, 100K, 1M+?
  • What are the query latency times from the sources? 
  • How you are using Teiid to implement the data integration/virtualization solution. What kind of queries that user is executing? Even small federated results may take a lot of server side processing - especially if the plan needs tweaking.
  • Is materializing being used?
  • Is query written in optimal way?
  • and so on..
Each and every one of the question affects the performance profoundly, and if you got mixture of those then it become that much more harder to give a specific configuration.

Before you start to thinking about beefing up your DV infrastructure, the first thing you want to check is:
  • Is my current infrastructure serving my current needs and future expectations?
  • What kind changes your are expecting?
  • Is there a change in type of sources  coming, like using Hadoop or using cloud based solutions?
We need to build the DV infrastructure on based on these available resources combined with mandated requirements for your usecase. Since Teiid is real time data virtualization engine, it heavily depends upon the underlying sources for data retrieval (there are caching strategies to minimize this). If Teiid is working with slow data sources, no matter much hardware you throw at it, you still going to get a slower response.  The place where the more memory and faster hardware can help DV is, when Teiid engine doing lots of aggregations, filtering, grouping and sorting as result of a user query over large sets of rows of results. That means all the above questions I raised may directly impact based on each individual query in terms of CPU and memory.

There are some limitations that Teiid engine itself has:

1.  hard limits which breaks down along several lines in terms of # of storage objects tracked, disk storage, streaming data size/row limits, etc.
  • Internal tables and result sets are limited to 2^31 rows. 
  • The buffer manager has a max addressable space of 16 terabytes - but due to fragmentation you'd expect that the max usable would be less (this is relatively easy to scale up with a larger block size when we need to).  This is the maximum amount of storage available to Teiid for all temporary lobs, internal tables, intermediate results, etc.
  • The max size of an object (batch or table page) that can be serialized by the buffer manager is 32 GB - but no one should ever get near that (the default limit is 8 MB). A batch is set or rows that are flowing through Teiid engine.
Handling a source that has tera/petabytes of data doesn't by itself impact Teiid in any way.  What matters is the processing operations that are being performed and/or how much of that data do we need to store on a temporary basis in Teiid.  With a simple forward-only query, as long as the result row count is less than 2^31, Teiid be perfectly happy to return a petabyte of data.

2. what are the soft limits for Teiid based upon the configuration such that it could impact sizing

Each batch/table page requires an in memory cache entry of approximately ~ 128 bytes - thus the total tracked max batches are limited by the heap and is also why we recommend to increase the processing batch size on larger memory or scenarios making use of large internal materializations. The actual batch/table itself is managed by buffer manager, which has layered memory buffer structure with spill over facility to disk. 

3. There are open file handle and other resource considerations (such as buffers being allocated by drivers) that are somewhat indirect from Teiid depending upon the particulars of the data source configurations that can have an impact as well.


4. Using internal materialization is based on buffer manager, it is directly dependent upon it.

5. When using XA the source access is serialized, otherwise source access happens in parallel. This can be controlled using # source threads/per user query.

Some scenarios may not be appropriate for Teiid.  Something contrived, such as 1M x 1M rows cross-join in Teiid, may not be a good fit for the vituralization layer.  But is that a real usecase where you are going to cursor over trillion rows to find what you are looking for? Is there a better targeted query? These are the kind of questions you need to be asking yourself when designing a data virtualization layer. 

Take look at query plan, command log and record the source latencies for a given query and see if your Teiid instance is performing optimally for your usecase. Is it CPU bound vs IO bound (larger source results and large source wait times). See if your submitted queries has been waiting in queue (you can check queue depth). Depending upon where you see the fallout is that is where you may need additional resources.

Our basic hardware recommendation is for smaller departmental use case is (double if you need HA or for disaster recovery) 
  • 16 Core processor
  • Minimum of 32 GB RAM
  • 100+ GB of buffer manager temp disk  (may be use of SSD based device will get better results when lot of cache miss or swapping of results)
  • Redhat Linux 6+
  • gigabit Ethernet
Then do a simple pilot with your own usecase(s) with your own data in your infrastructure with anticipated load. If you think that a DV server is totally CPU bound and queries are being delayed due to that, then you can consider adding additional cores to server or additional nodes in a cluster. Note again, to make to sure your source infrastructure is built to handle the load that DV is executing against it.

What would be really great would be sharing your hardware profiles that you selected for your Teiid environments, and techniques you used to get to the decision.

Thank you.

Ramesh & Steve.

Tuesday, October 14, 2014

Teiid 8.9 CR1 Posted

After a small delay Teiid 8.9 CR1 has been posted to the maven repository and the downloads: http://teiid.jboss.org/downloads/

A recap of the feature highlights so far with Teiid 8.9:
  • TEIID-3009 WITH project minimization - common table expressions will have their project columns minimized.
  • TEIID-3038 geoSpatial support for MongoDB translator
  • TEIID-3050 Increased Insert Performance with sources that support batching or insert with iterator.
  • TEIID-3044 Function Metadata is available through system tables and DatabaseMetaData.
  • TEIID-1910 TeiidPlatform for EclipseLink integration is now provided via the teiid-eclipselink-platform jar in maven.
  • TEIID-3119 Performance improvements in grouping and duplicate removal as well as general improvements to memory management.
  • TEIID-3156 Collation aware prevention of order by pushdown via the collationLocale translator property and the org.teiid.requireTeiidCollation system property.
In addition Ramesh has been hard at work getting in the initial commit of an OData 4 compliant layer utilizing the not yet fully released Olingo 4.  Ramesh has worked several issue for Olingo and may end up being a committer on the project - so congrats and let's hope for more cross project synergy with them as we move forward.

Expect another blog posting and more documentation around the new OData layer in the near future.

We are closing in on 100 issues addressed for this release - and should have time to address anything of importance you can find in the 10 days or so.  More than likely there will be a CR, so the final should be just in time for Halloween.

Thanks,
Steve / The Teiid Team

Thursday, September 25, 2014

8.9 Beta2 Available

8.9 Beta2 has been posted to the downloads: http://teiid.jboss.org/downloads/

This pre-release builds upon Beta1 to provide performance improvements for dup removal and any usage of our internal table structure and many new fixes.  In fact we are up to ~75 issues being addressed and will likely get close to 100 before the end of the release.

Special thanks to Joseph Chidiac for uncovering a variety of issues from Salesforce to windowing and general planning issues.  Community usage of the later pre-releases is vital to producing a quality final release.  So be sure to engage the forums and log issues as soon as you experience them.

Expect a candidate release to follow in approximately 2 weeks.  At which time we can start on the 8.10 and ideally the 9.0 Wildfly efforts.

Thanks,

Steve / The Teiid Team

Tuesday, September 23, 2014

Teiid Designer 8.6 Final Released

Teiid Designer 8.6 Final is now available our update site or download the archive.

This release focused on improving existing features and bug fixes.

Primary drivers for this release:
  • Enhance and upgrade Teiid Connection Importer
  • Added support for defining Native Query Procedure
  • REST Importer Enhancements
    • Added Dynamic Parameters
    • JSON WS support
  • Improved Security Definition for Data Roles

Check out our What's New for 8.6 article for details...

Barry LaFond
Teiid Designer Team

Monday, September 8, 2014

Teiid 8.9 Beta1

8.9 Beta1 is now available from the downloads and maven.  Feature highlights since Alpha1 include:
  • TEIID-3050 Increased Insert Performance with sources that support batching or insert with iterator.
  • TEIID-3044 Function Metadata is available through system tables and DatabaseMetaData.
  • TEIID-1910 TeiidPlatform for EclipseLink integration is now provided via the teiid-eclipselink-platform jar in maven.
With a total of 59 issues addressed so far, this continues to be primarily a fix release - but there's still time to vote for or log enhancements to make this release.

Ideally we'll still get OData 4 support via Olingo in the near term.  However it is becoming doubtful that it will make it into the 8.9 release.

Thanks again for all the community support, especially when there are infrastructure issues. We try to ensure that every forum posting gets attention, but notifications are still inconsistent at best. If something hasn't been addressed in a couple of days, feel free to re-comment on your topic.

Thanks,
Steve

Thursday, August 21, 2014

Red Hat JBoss Data Virtualization & Hortonworks Big Data Webinar Series


Discover Red Hat and Apache Hadoop for the Open Modern Data architecture

As the Enterprise's big data program matures and Apache Hadoop becomes more deeply embedded in critical operations, the ability to support and operate it efficiently and reliably becomes increasingly important. To aid enterprise in operating modern data architecture at scale, Red Hat and Hortonworks have collaborated to integrate Hortonworks Data Platform with Red Hat's proven platform technologies.
register-now-button.png
Join us in this interactive series, as we'll demonstrate how Red Hat JBoss Data Virtualization can integrate with Hadoop through Hive and provide users easy access to data.
Here's what you'll be signing up for:
  • Webinar 1: Delivering the Open Modern Data Architecture
  • Webinar 2: Red Hat JBoss and HDP: Turn your data into strategic asset (demo/deep dive)
  • Webinar 3: Red Hat and JBoss and HDP: Delivers the Data Lake (demo/deep dive)

We hope you can join us!
register-now-button.png

Wednesday, August 20, 2014

Teiid Market Place Git Repository

Introducing Teiid market place git repository, where you are free to share your project with all the Teiid community of users. (maybe sell?) Checkout https://github.com/teiid-marketplace

If you are a Teiid user, maybe you have written a translator to a source that Teiid does not offer, or maybe you gave a presentation you want to share, a slide-deck, a video or you wrote a cool utility over Teiid. Whatever it may be, now you can share with rest of the Teiid community at one market place.

Over the time, if a certain project is popular and licensing terms are acceptable by the Teiid project, then we can accept it as contribution to be maintained as a main Teiid project on a case by case basis. Or you can think of this as incubator space. When you contribute, I will make you the owner of that repository for any management purposes.

The only restriction is this needs to be a Teiid-specific project to be acceptable. I am not putting any license restrictions on what it needs to be; it can be of any license of your choosing (unless I am told otherwise). The acceptance of your project does NOT mean it is automatically promoted by Teiid developers; it is only an acknowledgement that it is a Teiid-specific project. Use these projects at your own RISK, as they have not been validated by Teiid project members unless they were written by members of the Teiid project.

I intend to provide a web page on http://teiid.org to link this repo's contents with descriptions soon. I can use some help here :)  So, if you have any Teiid utilities/projects to share, let me know.

I am still planning out what this needs to be, so if you have suggestions please let me know.

Thanks.

Ramesh..