Wednesday, July 1, 2015

Teiid on top of HP Vertica Analytic Database Server

HP Vertica Analytic Database Server is designed to manage large, fast-growing volumes of data and provide very fast query performance when used for data warehouses and other query-intensive applications.

Starting from Teiid 8.12, with Vertica Data Source(Vertica supply JDBC Driver) Teiid support connect/transform/federate data exist in Vertica table.

vertica-as-a-datasource is example demonstrates using the Vertica Translator to access data in Vertica Server, this can be a quick start for implementing your usecase. 

Saturday, June 20, 2015

8.11 Final Released, Support for OData V4 and SAP HANA

Keeping up with tradition of releasing a minor release every 3 months, the Teiid community is proud to announce yet another feature rich release - 8.11.0.Final. We have resolved 117 JIRA issues (features, enhancements, bugs) in this release. You can download the latest version from http://teiid.jboss.org/downloads

JBoss EAP & Teiid:
Note that in this release the base environment to deploy Teiid is moved JBoss EAP 6.4 platform. For your convenience, we are also providing a download link where EAP 6.4 Alpha + Teiid 8.11 + Teiid Web-Console bundled all in single downloadable file. You also have choice of using Docker image if you choose to at Docker Hub.

Embedded Teiid:
If you are working with Embedded Teiid checkout our new examples repository at https://github.com/teiid/teiid-embedded-examples For embedded we are changing from providing a kit, to just relying on maven and the examples for 8.12.  We want provide as many example scenarios as possible, so that you can pick a example and start implementing your usecase.

Top highlights:


  • TEIID-2973 OData V4 Access your VDB using OData V4 protocol
  • TEIID-3151 SAP Hana A new translator integrate data from SAP Hana.
  • TEIID-3372 Multiple metadata elements can now be used to configure a vdb.
  • TEIID-3369 Custom query rewrite - via a configurable PreParser.
  • TEIID-3389 Kerberos Delegation JDBC driver can now participate in kerberos delegation based authentication
  • TEIID-3390 SQL/XML improvements in performance and disk utilization.
  • TEIID-3362 ARRAYITERATE system procedure for iterating over an array.
  • TEIID-3393 Embedded AdminApi is now available from EmbeddedServer.getAdmin.
  • TEIID-3270 Virtual Functions can not be defined with procedure syntax.
  • TEIID-3434 More caching control over ttls in the result set cache down to the schema/table level.
  • TEIID-3412 MS Access support via the UCanAccess driver.
  • TEIID-3371 EAP 6.4 as the target platform.
  • TEIID-3491 REGEXP_REPLACE function added


  • Community:
    We had great community support from Ben Kibler, Bram Gadeyne, Bruuno Macho, Burek Serder, Gautam Benerjee, Harrison Gentry, Jaime Merritt, Mark Addleman, Monica Ahuja, Pranav K, Salvatore R, Sanjeev Gour, Shiveeta Matto and Tom Arnold for participating in the community in terms of submitting JIRAs and code contributions. Special thanks to Tom Arnold for submitting regexp_replace and to Ted Jones for developing the Hana support.

    Teiid is also seeing a large number of community members that are asking questions in out forums, we sincerely thank you all. Keep those questions and comments rolling in. We need to keep the momentum going. Write a blog, share your story.

    What's Coming:
    Lot of exciting stuff happening, we started to work on 8.12 and also planning on 9.0 simultaneously. We hope to bring in WildFly support in Teiid 8.12 along with EAP 6.4 and also move permanently to WildFly in Teiid 9.0 releases. We are gathering requirements. So, challenge for you is "How is your enterprise IT changing? What kind of changes you want to see from Teiid Data Virtualization platform?" come engage us in conversation that is important for you. We highly encourage it.

    Thank you again for your continued support.

    Ramesh & Steve

    Wednesday, June 10, 2015

    SQL on MongoDB using Teiid - Part 2

    In the previous article SQL on MongoDB using Teiid - Part 1 I showed a simple example how one can start using SQL based queries on MongoDB using Teiid. I have showed how to install and configure the Teiid and test.

    The previous example was very simple that generated a MongoDB collection for defined relational table in Teiid. One of many advantages of NoSQL based stores like MongoDB is, user's ability to de-normalize data such that data for related entities is co-located. When the data is co-located there is no requirement for defining the relationships across entities, and queries will be naturally faster to execute. However, in relational databases data is more often very normalized, the naive implementation of relational to MongoDB where one table is mapped to a collection will result in very poorly performing queries. The solution is, utilize the MongoDB's nested document feature to represent a relationships in Relational database.

    In this article, I will show how to define relationships across different tables and it's representation in MongoDB document, where the performance is not compromised.

    ONE-2-ONE

     CREATE FOREIGN TABLE Customer (  
       CustomerId integer PRIMARY KEY,  
       FirstName varchar(25),  
       LastName varchar(25)  
     ) OPTIONS(UPDATABLE 'TRUE');  
    
     CREATE FOREIGN TABLE Address (  
       CustomerId integer,  
       Street varchar(50),  
       City varchar(25),  
       State varchar(25),  
       Zipcode varchar(6),  
       FOREIGN KEY (CustomerId) REFERENCES Customer (CustomerId)  
     ) OPTIONS(UPDATABLE 'TRUE');  
    

    When you issue INSERT statements against above tables, by default it will produce the documents like

     Customer  
     {  
      _id: 1,  
      FirstName: "John",  
      LastName: "Doe"  
     }  
     Address  
     {   
      _id: ObjectID("..."),   
       CustomerId: 1,  
       Street: "123 Lane"  
       City: "New York",  
       State: "NY"  
       Zipcode: "12345"  
     }  
    

    You can enhance the storage in MongoDB to a single collection by using "teiid_mongo:MERGE' extension property on the table's OPTIONS clause


     CREATE FOREIGN TABLE Customer (  
       CustomerId integer PRIMARY KEY,  
       FirstName varchar(25),  
       LastName varchar(25)  
     ) OPTIONS(UPDATABLE 'TRUE');  
    
     CREATE FOREIGN TABLE Address (  
       CustomerId integer PRIMARY KEY,  
       Street varchar(50),  
       City varchar(25),  
       State varchar(25),  
       Zipcode varchar(6),  
       FOREIGN KEY (CustomerId) REFERENCES Customer (CustomerId)  
      ) OPTIONS(UPDATABLE 'TRUE', "teiid_mongo:MERGE" 'Customer');  
    

    this will produce single collection in MongoDB
     Customer  
     {  
      _id: 1,  
      FirstName: "John",  
      LastName: "Doe",  
      Address:   
        {   
         Street: "123 Lane",  
         City: "New York",  
         State: "NY",  
         Zipcode: "12345"  
        }  
     }  
    

    You can issue queries like

     SELECT * FROM Customer JOIN Address ON Customer.CustomerId = Address.CustomerId  
    

    ONE-2-MANY

    Following the similar strategy using "teiid_mongo:MERGE" property, the below shows the one to many relationship  .


     CREATE FOREIGN TABLE Customer (  
       CustomerId integer PRIMARY KEY,  
       FirstName varchar(25),  
       LastName varchar(25)  
     ) OPTIONS(UPDATABLE 'TRUE');  
    
     CREATE FOREIGN TABLE Order (      
       OrderID integer PRIMARY KEY,  
       CustomerId integer,  
       OrderDate date,  
       Status integer,  
       FOREIGN KEY (CustomerId) REFERENCES Customer (CustomerId)  
     ) OPTIONS(UPDATABLE 'TRUE', "teiid_mongo:MERGE" 'Customer');  
    

    Will generate MongoDB collection like below

     {  
      _id: 1,  
      FirstName: "John",  
      LastName: "Doe",  
      Order:   
      [  
        {   
         _id: 100,   
         OrderDate: ISODate("2000-01-01T06:00:00Z")  
         Status: 2  
        },  
        {   
         _id: 101,   
         OrderDate: ISODate("2001-03-06T06:00:00Z")  
         Status: 5  
        }  
        ...  
       ]  
     }  
    

    You can issue SQL queries like

    SELECT * FROM Customer JOIN Orders ON Customer.CustomerId = Orders.CustomerId 
      WHERE OrderDate = <date>  
    

    Note that above I showed the nesting of the documents at single level, however you can define even define deeper nesting. For example, Order table has OrderItems table

      CREATE FOREIGN TABLE Customer (   
       CustomerId integer PRIMARY KEY,   
       FirstName varchar(25),   
       LastName varchar(25)   
      ) OPTIONS(UPDATABLE 'TRUE');  
     
      CREATE FOREIGN TABLE Order (     
       OrderID integer PRIMARY KEY,   
       CustomerId integer,   
       OrderDate date,   
       Status integer,   
       FOREIGN KEY (CustomerId) REFERENCES Customer (CustomerId)   
      ) OPTIONS(UPDATABLE 'TRUE', "teiid_mongo:MERGE" 'Customer'); 
      
     CREATE FOREIGN TABLE OrderDetails (  
      OrderID integer NOT NULL,  
      UnitPrice double default '0',  
      Quantity integer default '1',  
      Discount float default '0',  
      FOREIGN KEY (OrderID) REFERENCES Orders (OrderID),  
     ) OPTIONS ("teiid_mongo:MERGE" 'Orders', UPDATABLE 'TRUE');  
    

    after inserting some data, the result MongoDB document will look like

     {  
      _id: 1,  
      FirstName: "John",  
      LastName: "Doe",  
      Order:   
      [  
        {   
         _id: 100,   
         OrderDate: ISODate("2000-01-01T06:00:00Z")  
         Status: 2,  
         OrderDetails:  
         [  
           {  
             UnitPrice: 23.12  
             Quantity: 5  
             Discount: 34.2  
           },  
           {  
             UnitPrice: 51.17  
             Quantity: 10  
             Discount: 15.0  
           }  
           ...  
         ]  
        },  
        {   
         _id: 101,   
         OrderDate: ISODate("2001-03-06T06:00:00Z")  
         Status: 5,  
         OrderDetails:  
         [  
           {  
             UnitPrice: 78.00  
             Quantity: 4  
             Discount: 20  
           },  
           {  
             UnitPrice: 43.12  
             Quantity: 100  
             Discount: 30  
           }  
           ...  
         ]      
        }  
        ...  
       ]  
     }  
    

    So, as you can see you can create pretty complex nested documents in MongoDB using Teiid, to save query performance, at the same time you can use your expertise in SQL to design the queries using the JOINS to  read the data that you are interested in or run BI tool on top it to generate reports etc.

    For more information please read https://docs.jboss.org/author/display/TEIID/MongoDB+Translator 

    If you are working with existing MongoDB database, Teiid can import the existing collections and their document structure and automatically create the relational tables in Teiid. For that please refer to article https://developer.jboss.org/wiki/ConnectToAMongoDBSource

    I have showed in these articles how to create and use MongoDB database and access the MongoDB using SQL. This is great tool, if you looking to replace your relational database like Oracle, SQLServer etc with MongoDB. Most of the concepts in your relational world map one to one using Teiid.

    Thank you, If you have any questions please let us know.

    Ramesh..

    Tuesday, June 9, 2015

    Teiid 8.11 CR1

    Teiid 8.11 CR1 is now available from the downloads and maven.  

    There are no new features since Beta3 as we were focused on address as many regular issues as possible, including further refinements to the materialization logic.  The feature set for the entire release is looking good:
    • TEIID-3372 Multiple metadata elements can now be used to configure a vdb.
    • TEIID-3369 Custom query rewrite - via a configurable PreParser.
    • TEIID-3389 Kerberos Delegation JDBC driver can now participate in kerberos delegation based authentication
    • TEIID-3390 SQL/XML improvements in performance and disk utilization.
    • TEIID-3362 ARRAYITERATE system procedure for iterating over an array.
    • TEIID-3393 Embedded AdminApi is now available from EmbeddedServer.getAdmin.
    • TEIID-3270 Virtual Functions can not be defined with procedure syntax.
    • TEIID-3434 More caching control over ttls in the result set cache down to the schema/table level.
    • TEIID-3412 MS Access support via the UCanAccess driver.
    • TEIID-3371 EAP 6.4 as the target platform.
    • TEIID-3491 REGEXP_REPLACE function added
    • TEIID-2973 OData V4 support based on Apache Olingo
    The Final release should be produced by June 18th.

    There are now quite a few issues that are in the 8.12 release, which we will work somewhat in parallel with Teiid 9.0.  If possible we'll try to bridge the gap by offering a Wildfly based 8.12 as well.  Please continue to log issues/requests so that we can get them in the appropriate release.

    Thanks again for all of the community efforts.

    Steve

    Sunday, June 7, 2015

    SQL on MongoDB using Teiid - Part 1

    In this article series, I will showcase how you can use SQL based queries with MongoDB. In general there seems to be resurgence of SQL based access to all NoSQL based stores in the market space, take for example Hive, Impala, Spark, Apache Drill etc. The main reason for this shift is there are abundant amount of talent pool out there for SQL based developers, and even today (and years to come) our strong dependence on the relational stores for the enterprise data. I do not see either of them fading away any time soon. So utilizing the known skills on new types of data stores will save you lot of time and provide better integration with rest of your enterprise applications.

    I know there are plenty of folks offering SQL based access to MongoDB, why you should choose Teiid?
    • Teiid provides full ANSI compatible SQL based access to MongoDB. This includes full SQL-92, and most SQL-99 and SQL-2003 support.
    • Provides JDBC/ODBC access to execute SQL queries.
    • Provides ODATA based access to MongoDB.
    • Provides framework to easily create custom REST based services over MongoDB or any other data source
    • Teiid is a data virtualization system, that means you can integrate data from MongoDB with other enterprise sources like Orcale, DB2, SQL Server etc.
    • Complex nested document support up to three+ levels of embedded documents. I have not seen any other project support more than one level of nested documents.
    • Supports full range of SELECT, INSERT, UPDATE and DELETE calls.
    • Array support, Blob Support, GEO functions support.  
    • A JBoss community Open Source project.
    Depending upon your architecture and needs there are different ways you use Teiid with MongoDB
    • Standalone Teiid server
    • Embedded Teiid
    • Teiid Docker 
    However, in all the above are only different deployment scenarios, as per defining your interaction with MongoDB in-terms of SQL is exactly same. In this article I will focus on Standalone Teiid Server. I will not go in any details about installation of Teiid or MongoDB. Also, I assume user already has familiarity with Teiid concepts, if not please read http://teiid.jboss.org/basics/ and go through a quick start example here https://docs.jboss.org/author/display/teiidexamples/Teiid+Quick+Starts

    So, let's get started. 

      <resource-adapters>   
       <resource-adapter id="mongodb">   
        <module slot="main" id="org.jboss.teiid.resource-adapter.mongodb"/>   
        <transaction-support>NoTransaction</transaction-support>   
        <connection-definitions>   
         <connection-definition class-name="org.teiid.resource.adapter.mongodb.MongoDBManagedConnectionFactory"   
           jndi-name="java:/mongoDS"   
           enabled="true"   
           use-java-context="true"   
           pool-name="teiid-mongodb-ds">   
          <!-- MongoDB server list (host:port[;host:port...]) -->   
          <config-property name="RemoteServerList">localhost:27017</config-property>   
          <!-- Database Name in the MongoDB -->   
          <config-property name="Database">test</config-property>   
           <!--    
            Uncomment these properties to supply user name and password   
           <config-property name="Username">user</config-property>   
           <config-property name="Password">user</config-property>   
           -->    
         </connection-definition>   
        </connection-definitions>   
       </resource-adapter>   
      </resource-adapters>  
    

    • The above will create a JCA connection pool to the MongoDB. Save it and start the JBoss EAP.
    • Now it is time to create VDB. The following shows a very simple VDB

     <vdb name="nothwind" version="1">  
       <model name="northwind">  
         <source name="local" translator-name="mongodb" connection-jndi-name="java:/mongoDS"/>  
           <metadata type="DDL"><![CDATA[  
             CREATE FOREIGN TABLE Customer (  
               customer_id integer,  
               FirstName varchar(25),  
               LastName varchar(25)  
             ) OPTIONS(UPDATABLE 'TRUE');  
           ]]> </metadata>  
       </model>  
     <vdb>  
    

    Now deploy this VDB using either web-console or CLI into Teiid. Make sure it is deployed in ACTIVE state. Now using any JDBC client like SquirreL, and issue SQL queries like


     INSERT INTO Customer(customer_id, FirstName, LastName) VALUES (1, 'John', 'Doe');  
    

    This will produce a document like below in the MongoDB database.

     {  
      _id: ObjectID("509a8fb2f3f4948bd2f983a0"),  
      customer_id: 1,  
      FirstName: "John",  
      LastName: "Doe"  
     }  
    

    If you issue a query like

     SELECT * FROM Customer  
    

    you will see the result like

     customer_id  FirstName  LastName  
     1           John       Doe  
    

    This is a illustration of what it takes to use SQL on MongoDB using Teiid. But we only scratched SQL capabilities. In the next article I will show case, how to create complex nested documents and resulting SQL/DDL for it.

    Thanks

    Ramesh..

    Wednesday, June 3, 2015

    Tech Tip: Teiid SQL Language MAKEDEP Hint Explained

    In this article I will explain what a MAKEDEP hint is, how and when, why it should be used in Teiid.

    What: MAKEDEP is query hint.  When a query hint is defined in the SQL query it influences the Teiid query planner to optimize the query in a way that is driven by the user. MAKEDEP means "make this as a dependent join".

    What is a Dependent Join?

    For example if we have query like:

     SELECT * FROM X INNER JOIN Y ON X.PK = Y.FK  
    

    Where the data for X, and Y are coming from two different sources like Oracle and WebService in Teiid, so in relational algebra you can represent above query as


    Here the result tuples from node X and node Y are being simultaneously fetched by Teiid query engine, then it joins the both the results inside Teiid engine based on the specified X.PK = Y.PK condition and returns the filtered resulted to the user. simple..

    Now, what if, if X table has 5 rows and Y table has 100K rows? In order to do the JOIN naively Teiid need sto read all the 5 rows from X side and 100K rows from Y side then proceed with the JOIN. That is where MAKEDEP comes to rescue if the planner cannot use statistics to automatically determine a better plan for you.

    Let's modify query like this and provide a MAKEDEP hint:

     SELECT * FROM X INNER JOIN /*+ MAKEDEP */ Y ON X.PK = Y.FK  
    

    here what you are suggesting to the query planner is make the node Y as dependent node on X. That means the data from Y is dependent on X data. In this scenario, the execution then be like


    The query planner will do the operations in sequence this time
    (1) Fetch 5 rows from X
    (2) Push the distinct equi-join values from X into Y side using IN clause(2)
    (3) Fetch resultant rows from Y that matches the JOIN condition

    The SQL executed are

    SELECT * FROM X;
    SELECT * FROM Y WHERE Y.FK IN (X Values);

    here the Y node will return ONLY relevant data, which can be significantly less that the full relation. By doing this, you avoided lot of network traffic in retrieving the rows, and also processing inside the Teiid for match the X.PK = Y.FK condition. This results in match faster query.

    Now what if X has 10K rows? Imagine fetching all the 10K rows, and sending them to node Y node in "IN" clause? The issues are some databases do not allow SQL statement bigger than certain size, have limits to values in the IN clause, limits to prepared bindings, etc.  To compensate multiple queries must be issued.  Generally the processing will suffer with a larger numbers as the number of source queries increases. Teiid offers another solution for this - to create a temporary table with the relevant values from X, then issue a join query based on temporary table and Y.

    To do this, for JDBC translators you need to add translator override property called "EnableDependentJoins" to "true".  Then when you submit the query

     SELECT * FROM X INNER JOIN /*+ MAKEDEP */ Y ON X.PK = Y.FK  
    

    The processing will be

    The query planner will do the operations in sequence this time
    (1) Fetch 10K rows from X
    (2) Insert the distinct equi-join values from X into Y side using batched inserts
    (4) Fetch resultant rows from Y that matches the JOIN condition
    (5) send rows back to user

    The executed sudo SQL statements are

    SELECT * FROM X;
    CREATE TABLE #TEIID_XXX (XPK coltype);
    INSERT INTO #TEIID_XXX (X key values);
    SELECT * FROM Y JOIN #TEIID_XXX ON Y.FK = #TEIID_XXX.XPK;

    Depending upon number of values that is being pushed, this can result even faster query.

    You can also customize MAKEDEP to force additional behavior.  The planner will choose when to back off of (if there are too many independent values) a dependent join based upon statistics.  This behavior can be forced with the MAX option:

     SELECT * FROM X INNER JOIN Y MAKEDEP(MAX:5000) ON X.PK = Y.FK  

    That means only create the dependent join when there are less then 5000 rows from X side.

    The planner based upon the plan and the source support for dependent joins can also choose to instead send all of the relevant rows from X over to Y.  This can be forced with the JOIN option:

     SELECT * FROM X INNER JOIN Y MAKEDEP(JOIN) ON X.PK = Y.FK  

    This is same as above temp table scenario, Using a temp table for the entire independent side.  This option is best suited to situations where more of plan can be pushed, for example aggregation and other processing above the join:

     select
         grouping
              join
                  access
                  access

     can become:
       access (performing the join via data shipment)
            select
               grouping

    If there isn't any additional processing and a wide set of values (or something that is using lobs), then the best you can do is just creating a temporary table for the key set (the previous example only uses key values), in which case the plan still looks the same as the default dependent join.

    You can read more about MAKEDEP here https://docs.jboss.org/author/display/TEIID/Federated+Optimizations

    There also MAKEIND hint, which opposite of MAKEDEP which is placed on independent side of a dependent query. MAKENOTDEP forces the query engine not to plan a dependent query.

    Hopefully this gave a good material as to how, when to use the hints to write better performing queries. Note that, when costing information is defined on the tables, most of these decisions are made automatically, if the Teiid not doing it you now know how to force it to use :)

    Thanks

    Ramesh.. 

    Monday, June 1, 2015

    Teiid 8.11 Beta3 Posted

    Teiid 8.11 Beta3 is now available from the downloads and maven.  

    The main feature highlight since Beta2 is a community contribution from Tom Arnold to add support for the REGEXP_REPLACE function - a more powerful version of the REPLACE function that supports regular expressions.

    There were a lot of late changes, thus the decision to release another beta rather than progress to CR1.  It is expected that we'll produce a CR1 by June 10th - with the final to follow at least a week later.

    Also we're looking at starting on Teiid 9.0 soon.  If you have any major features you'd like to see, please start a forum topic of log/vote for a JIRA issue.  Another parallel effort could be to produce a Wildfly enabled 8.x release -please let us know if there is any interest in that.

    For 8.11 we've addressed 90+ issues so far - with your help we'll close out quite a few more before the final release.

    Thanks again for all of the community efforts.

    Steve