Skip to main content

Tech Tip: Teiid SQL Language MAKEDEP Hint Explained

In this article I will explain what a MAKEDEP hint is, how and when, why it should be used in Teiid.

What: MAKEDEP is query hint.  When a query hint is defined in the SQL query it influences the Teiid query planner to optimize the query in a way that is driven by the user. MAKEDEP means "make this as a dependent join".

What is a Dependent Join?

For example if we have query like:

 SELECT * FROM X INNER JOIN Y ON X.PK = Y.FK  

Where the data for X, and Y are coming from two different sources like Oracle and WebService in Teiid, so in relational algebra you can represent above query as


Here the result tuples from node X and node Y are being simultaneously fetched by Teiid query engine, then it joins the both the results inside Teiid engine based on the specified X.PK = Y.PK condition and returns the filtered resulted to the user. simple..

Now, what if, if X table has 5 rows and Y table has 100K rows? In order to do the JOIN naively Teiid need sto read all the 5 rows from X side and 100K rows from Y side then proceed with the JOIN. That is where MAKEDEP comes to rescue if the planner cannot use statistics to automatically determine a better plan for you.

Let's modify query like this and provide a MAKEDEP hint:

 SELECT * FROM X INNER JOIN /*+ MAKEDEP */ Y ON X.PK = Y.FK  

here what you are suggesting to the query planner is make the node Y as dependent node on X. That means the data from Y is dependent on X data. In this scenario, the execution then be like


The query planner will do the operations in sequence this time
(1) Fetch 5 rows from X
(2) Push the distinct equi-join values from X into Y side using IN clause(2)
(3) Fetch resultant rows from Y that matches the JOIN condition

The SQL executed are

SELECT * FROM X;
SELECT * FROM Y WHERE Y.FK IN (X Values);

here the Y node will return ONLY relevant data, which can be significantly less that the full relation. By doing this, you avoided lot of network traffic in retrieving the rows, and also processing inside the Teiid for match the X.PK = Y.FK condition. This results in match faster query.

Now what if X has 10K rows? Imagine fetching all the 10K rows, and sending them to node Y node in "IN" clause? The issues are some databases do not allow SQL statement bigger than certain size, have limits to values in the IN clause, limits to prepared bindings, etc.  To compensate multiple queries must be issued.  Generally the processing will suffer with a larger numbers as the number of source queries increases. Teiid offers another solution for this - to create a temporary table with the relevant values from X, then issue a join query based on temporary table and Y.

To do this, for JDBC translators you need to add translator override property called "EnableDependentJoins" to "true".  Then when you submit the query

 SELECT * FROM X INNER JOIN /*+ MAKEDEP */ Y ON X.PK = Y.FK  

The processing will be

The query planner will do the operations in sequence this time
(1) Fetch 10K rows from X
(2) Insert the distinct equi-join values from X into Y side using batched inserts
(4) Fetch resultant rows from Y that matches the JOIN condition
(5) send rows back to user

The executed sudo SQL statements are

SELECT * FROM X;
CREATE TABLE #TEIID_XXX (XPK coltype);
INSERT INTO #TEIID_XXX (X key values);
SELECT * FROM Y JOIN #TEIID_XXX ON Y.FK = #TEIID_XXX.XPK;

Depending upon number of values that is being pushed, this can result even faster query.

You can also customize MAKEDEP to force additional behavior.  The planner will choose when to back off of (if there are too many independent values) a dependent join based upon statistics.  This behavior can be forced with the MAX option:

 SELECT * FROM X INNER JOIN Y MAKEDEP(MAX:5000) ON X.PK = Y.FK  

That means only create the dependent join when there are less then 5000 rows from X side.

The planner based upon the plan and the source support for dependent joins can also choose to instead send all of the relevant rows from X over to Y.  This can be forced with the JOIN option:

 SELECT * FROM X INNER JOIN Y MAKEDEP(JOIN) ON X.PK = Y.FK  

This is same as above temp table scenario, Using a temp table for the entire independent side.  This option is best suited to situations where more of plan can be pushed, for example aggregation and other processing above the join:

 select
     grouping
          join
              access
              access

 can become:
   access (performing the join via data shipment)
        select
           grouping

If there isn't any additional processing and a wide set of values (or something that is using lobs), then the best you can do is just creating a temporary table for the key set (the previous example only uses key values), in which case the plan still looks the same as the default dependent join.

You can read more about MAKEDEP here https://docs.jboss.org/author/display/TEIID/Federated+Optimizations

There also MAKEIND hint, which opposite of MAKEDEP which is placed on independent side of a dependent query. MAKENOTDEP forces the query engine not to plan a dependent query.

Hopefully this gave a good material as to how, when to use the hints to write better performing queries. Note that, when costing information is defined on the tables, most of these decisions are made automatically, if the Teiid not doing it you now know how to force it to use :)

Thanks

Ramesh.. 

Comments

Popular posts from this blog

Teiid Runtimes Explained

If you have been following Teiid lately we have been going through a whole lot of renovations. Yes, renovations or reorganization or refactoring or whatever you want to call it. Basically, we are making Teiid more modular with fewer dependencies that can be used by however your use case dictates rather than use it as one monolith application deployed into WildFly JEE Application Server. There is nothing wrong in using Teiid as server model, but with the proliferation of container-based workloads and cloud-based architectures, the previous server-based model does not work or simply won't scale. So, we needed to think of alternatives, thus Teiid team introduced a couple different versions modular Teiid what we are calling as "Teiid Runtimes".

Note that in these modular Teiid runtimes, not all the features you were used to using in Teiid Server model may not be there but you will have extensions to add in those that are most appropriate for your domain. If you are looking …

Teiid Platform Sizing Guidelines and Limitations

Users/customers always ask us about the sizing of their Data Virtaulization infrastructure based on Teiid or the JDV product from Redhat. Typically this is very involved question and not a very easy one answer in plain terms. This is due to fact that it involves taking into consideration questions like:
What kind of sources that user is working with? Relational, file, CRM, NoSQL etc.How many sources they are trying to integrate? 10, 20, 100?What are the volumes of data they are working with? 10K, 100K, 1M+?What are the query latency times from the sources? How you are using Teiid to implement the data integration/virtualization solution. What kind of queries that user is executing? Even small federated results may take a lot of server side processing - especially if the plan needs tweaking.Is materializing being used?Is query written in optimal way?and so on..Each and every one of the question affects the performance profoundly, and if you got mixture of those then it become that much…