Skip to main content

Tech Tip: Teiid SQL Language MAKEDEP Hint Explained

In this article I will explain what a MAKEDEP hint is, how and when, why it should be used in Teiid.

What: MAKEDEP is query hint.  When a query hint is defined in the SQL query it influences the Teiid query planner to optimize the query in a way that is driven by the user. MAKEDEP means "make this as a dependent join".

What is a Dependent Join?

For example if we have query like:

 SELECT * FROM X INNER JOIN Y ON X.PK = Y.FK  

Where the data for X, and Y are coming from two different sources like Oracle and WebService in Teiid, so in relational algebra you can represent above query as


Here the result tuples from node X and node Y are being simultaneously fetched by Teiid query engine, then it joins the both the results inside Teiid engine based on the specified X.PK = Y.PK condition and returns the filtered resulted to the user. simple..

Now, what if, if X table has 5 rows and Y table has 100K rows? In order to do the JOIN naively Teiid need sto read all the 5 rows from X side and 100K rows from Y side then proceed with the JOIN. That is where MAKEDEP comes to rescue if the planner cannot use statistics to automatically determine a better plan for you.

Let's modify query like this and provide a MAKEDEP hint:

 SELECT * FROM X INNER JOIN /*+ MAKEDEP */ Y ON X.PK = Y.FK  

here what you are suggesting to the query planner is make the node Y as dependent node on X. That means the data from Y is dependent on X data. In this scenario, the execution then be like


The query planner will do the operations in sequence this time
(1) Fetch 5 rows from X
(2) Push the distinct equi-join values from X into Y side using IN clause(2)
(3) Fetch resultant rows from Y that matches the JOIN condition

The SQL executed are

SELECT * FROM X;
SELECT * FROM Y WHERE Y.FK IN (X Values);

here the Y node will return ONLY relevant data, which can be significantly less that the full relation. By doing this, you avoided lot of network traffic in retrieving the rows, and also processing inside the Teiid for match the X.PK = Y.FK condition. This results in match faster query.

Now what if X has 10K rows? Imagine fetching all the 10K rows, and sending them to node Y node in "IN" clause? The issues are some databases do not allow SQL statement bigger than certain size, have limits to values in the IN clause, limits to prepared bindings, etc.  To compensate multiple queries must be issued.  Generally the processing will suffer with a larger numbers as the number of source queries increases. Teiid offers another solution for this - to create a temporary table with the relevant values from X, then issue a join query based on temporary table and Y.

To do this, for JDBC translators you need to add translator override property called "EnableDependentJoins" to "true".  Then when you submit the query

 SELECT * FROM X INNER JOIN /*+ MAKEDEP */ Y ON X.PK = Y.FK  

The processing will be

The query planner will do the operations in sequence this time
(1) Fetch 10K rows from X
(2) Insert the distinct equi-join values from X into Y side using batched inserts
(4) Fetch resultant rows from Y that matches the JOIN condition
(5) send rows back to user

The executed sudo SQL statements are

SELECT * FROM X;
CREATE TABLE #TEIID_XXX (XPK coltype);
INSERT INTO #TEIID_XXX (X key values);
SELECT * FROM Y JOIN #TEIID_XXX ON Y.FK = #TEIID_XXX.XPK;

Depending upon number of values that is being pushed, this can result even faster query.

You can also customize MAKEDEP to force additional behavior.  The planner will choose when to back off of (if there are too many independent values) a dependent join based upon statistics.  This behavior can be forced with the MAX option:

 SELECT * FROM X INNER JOIN Y MAKEDEP(MAX:5000) ON X.PK = Y.FK  

That means only create the dependent join when there are less then 5000 rows from X side.

The planner based upon the plan and the source support for dependent joins can also choose to instead send all of the relevant rows from X over to Y.  This can be forced with the JOIN option:

 SELECT * FROM X INNER JOIN Y MAKEDEP(JOIN) ON X.PK = Y.FK  

This is same as above temp table scenario, Using a temp table for the entire independent side.  This option is best suited to situations where more of plan can be pushed, for example aggregation and other processing above the join:

 select
     grouping
          join
              access
              access

 can become:
   access (performing the join via data shipment)
        select
           grouping

If there isn't any additional processing and a wide set of values (or something that is using lobs), then the best you can do is just creating a temporary table for the key set (the previous example only uses key values), in which case the plan still looks the same as the default dependent join.

You can read more about MAKEDEP here https://docs.jboss.org/author/display/TEIID/Federated+Optimizations

There also MAKEIND hint, which opposite of MAKEDEP which is placed on independent side of a dependent query. MAKENOTDEP forces the query engine not to plan a dependent query.

Hopefully this gave a good material as to how, when to use the hints to write better performing queries. Note that, when costing information is defined on the tables, most of these decisions are made automatically, if the Teiid not doing it you now know how to force it to use :)

Thanks

Ramesh.. 

Comments

Popular posts from this blog

Teiid Spring Boot 1.7.0 Released

Teiid Spring Boot version 1.7.0 to support Teiid 16.0 has been released. This release is mainly to support the Teiid's latest version.  In this release, the support for OpenAPI code generation based on VDB has been removed as there is no community interest and moreover it was at OpenAPI 2.0, and the industry has moved to 3.0 and beyond. There are no plans to further pursue this feature. VDB maven plugin was also removed, which was intended to be a replacement for the VDB importing feature was to be used when working on OpenShift, however, since it requires the Maven repository and does not completely represent the feature as defined on the WildFly based deployments this is also removed. You can still use the VDB import feature with Teiid Spring Boot, simply define the VDB with your "IMPORT DATABASE" statements and provide the additional files along with the main VDB file. During the start of the application, Teiid Spring Boot will load all the necessary DDL files for the

Teiid 8.13.3 Released

Teiid 8.13.3 is now  available .  In total 8.13.3 addresses 10 issues since 8.13.2: [ TEIID-4028 ] - adding salesforce-34 resource adapter does not work through the cli [ TEIID-4066 ] - Odata translator ClassNotFoundException: com.sun.ws.rs.ext.RuntimeDelegateImpl [ TEIID-4070 ] - Issues with resource adapters with api modules in wildfly [ TEIID-4089 ] - Teiid JDBC driver does not reset the update count when calling getMoreResults(int) [ TEIID-4093 ] - OData authentication fails with NPE when gss-pattern related properties are included in VDB [ TEIID-4096 ] - AssertionError with independent side of a dependent join that has an ordered limit [ TEIID-3050 ] - allow for more incremental insert with iterator [ TEIID-4075 ] - Netezza translator to support common table expressions [ TEIID-4098 ] - Always preserve columns order in google spreadsheets models [ TEIID-4046 ] - OData - $skip is beeing ignor

Access Teiid from node.js

Are you writing a "node.js" application and would like to access Teiid VDB from it? If "yes", this is currently possible using NPM package "pg" .  Since, Teiid supports the PG transport, you can use this PostgreSQL client for "node.js" for accessing the Teiid. For example if you have VDB called "northwind" deployed on your Teiid server, and it has table called "customers" and you are using default configuration such as user = 'user' password = 'user' host = 127.0.0.1 port = 35432 then you can use following to write simple access program to Teiid var pg = require('pg'); var connectionString = "pg://user:user@localhost:35432/northwind" pg.connect(connectionString, function(err, client) { client.query('SELECT CustomerID, ContactName, ContactTitle FROM Customers', function(err, result) { console.log(result.rows) }); }); If you want