In this article I will explain what a MAKEDEP hint is, how and when, why it should be used in Teiid.
What: MAKEDEP is query hint. When a query hint is defined in the SQL query it influences the Teiid query planner to optimize the query in a way that is driven by the user. MAKEDEP means "make this as a dependent join".
What is a Dependent Join?
For example if we have query like:
Where the data for X, and Y are coming from two different sources like Oracle and WebService in Teiid, so in relational algebra you can represent above query as
Here the result tuples from node X and node Y are being simultaneously fetched by Teiid query engine, then it joins the both the results inside Teiid engine based on the specified X.PK = Y.PK condition and returns the filtered resulted to the user. simple..
Now, what if, if X table has 5 rows and Y table has 100K rows? In order to do the JOIN naively Teiid need sto read all the 5 rows from X side and 100K rows from Y side then proceed with the JOIN. That is where MAKEDEP comes to rescue if the planner cannot use statistics to automatically determine a better plan for you.
Let's modify query like this and provide a MAKEDEP hint:
here what you are suggesting to the query planner is make the node Y as dependent node on X. That means the data from Y is dependent on X data. In this scenario, the execution then be like
The query planner will do the operations in sequence this time
(1) Fetch 5 rows from X
(2) Push the distinct equi-join values from X into Y side using IN clause(2)
(3) Fetch resultant rows from Y that matches the JOIN condition
The SQL executed are
here the Y node will return ONLY relevant data, which can be significantly less that the full relation. By doing this, you avoided lot of network traffic in retrieving the rows, and also processing inside the Teiid for match the X.PK = Y.FK condition. This results in match faster query.
Now what if X has 10K rows? Imagine fetching all the 10K rows, and sending them to node Y node in "IN" clause? The issues are some databases do not allow SQL statement bigger than certain size, have limits to values in the IN clause, limits to prepared bindings, etc. To compensate multiple queries must be issued. Generally the processing will suffer with a larger numbers as the number of source queries increases. Teiid offers another solution for this - to create a temporary table with the relevant values from X, then issue a join query based on temporary table and Y.
To do this, for JDBC translators you need to add translator override property called "EnableDependentJoins" to "true". Then when you submit the query
The processing will be
The query planner will do the operations in sequence this time
(1) Fetch 10K rows from X
(2) Insert the distinct equi-join values from X into Y side using batched inserts
(4) Fetch resultant rows from Y that matches the JOIN condition
(5) send rows back to user
The executed sudo SQL statements are
What: MAKEDEP is query hint. When a query hint is defined in the SQL query it influences the Teiid query planner to optimize the query in a way that is driven by the user. MAKEDEP means "make this as a dependent join".
What is a Dependent Join?
For example if we have query like:
SELECT * FROM X INNER JOIN Y ON X.PK = Y.FK
Where the data for X, and Y are coming from two different sources like Oracle and WebService in Teiid, so in relational algebra you can represent above query as
Here the result tuples from node X and node Y are being simultaneously fetched by Teiid query engine, then it joins the both the results inside Teiid engine based on the specified X.PK = Y.PK condition and returns the filtered resulted to the user. simple..
Now, what if, if X table has 5 rows and Y table has 100K rows? In order to do the JOIN naively Teiid need sto read all the 5 rows from X side and 100K rows from Y side then proceed with the JOIN. That is where MAKEDEP comes to rescue if the planner cannot use statistics to automatically determine a better plan for you.
Let's modify query like this and provide a MAKEDEP hint:
SELECT * FROM X INNER JOIN /*+ MAKEDEP */ Y ON X.PK = Y.FK
here what you are suggesting to the query planner is make the node Y as dependent node on X. That means the data from Y is dependent on X data. In this scenario, the execution then be like
The query planner will do the operations in sequence this time
(1) Fetch 5 rows from X
(2) Push the distinct equi-join values from X into Y side using IN clause(2)
(3) Fetch resultant rows from Y that matches the JOIN condition
The SQL executed are
SELECT * FROM X;
SELECT * FROM Y WHERE Y.FK IN (X Values);
here the Y node will return ONLY relevant data, which can be significantly less that the full relation. By doing this, you avoided lot of network traffic in retrieving the rows, and also processing inside the Teiid for match the X.PK = Y.FK condition. This results in match faster query.
Now what if X has 10K rows? Imagine fetching all the 10K rows, and sending them to node Y node in "IN" clause? The issues are some databases do not allow SQL statement bigger than certain size, have limits to values in the IN clause, limits to prepared bindings, etc. To compensate multiple queries must be issued. Generally the processing will suffer with a larger numbers as the number of source queries increases. Teiid offers another solution for this - to create a temporary table with the relevant values from X, then issue a join query based on temporary table and Y.
To do this, for JDBC translators you need to add translator override property called "EnableDependentJoins" to "true". Then when you submit the query
SELECT * FROM X INNER JOIN /*+ MAKEDEP */ Y ON X.PK = Y.FK
The processing will be
The query planner will do the operations in sequence this time
(1) Fetch 10K rows from X
(2) Insert the distinct equi-join values from X into Y side using batched inserts
(4) Fetch resultant rows from Y that matches the JOIN condition
(5) send rows back to user
The executed sudo SQL statements are
SELECT * FROM X;
CREATE TABLE #TEIID_XXX (XPK coltype);
INSERT INTO #TEIID_XXX (X key values);
SELECT * FROM Y JOIN #TEIID_XXX ON Y.FK = #TEIID_XXX.XPK;
Depending upon number of values that is being pushed, this can result even faster query.
You can also customize MAKEDEP to force additional behavior. The planner will choose when to back off of (if there are too many independent values) a dependent join based upon statistics. This behavior can be forced with the MAX option:
That means only create the dependent join when there are less then 5000 rows from X side.
The planner based upon the plan and the source support for dependent joins can also choose to instead send all of the relevant rows from X over to Y. This can be forced with the JOIN option:
This is same as above temp table scenario, Using a temp table for the entire independent side. This option is best suited to situations where more of plan can be pushed, for example aggregation and other processing above the join:
select
grouping
join
access
access
can become:
access (performing the join via data shipment)
select
grouping
If there isn't any additional processing and a wide set of values (or something that is using lobs), then the best you can do is just creating a temporary table for the key set (the previous example only uses key values), in which case the plan still looks the same as the default dependent join.
You can read more about MAKEDEP here https://docs.jboss.org/author/display/TEIID/Federated+Optimizations
There also MAKEIND hint, which opposite of MAKEDEP which is placed on independent side of a dependent query. MAKENOTDEP forces the query engine not to plan a dependent query.
Hopefully this gave a good material as to how, when to use the hints to write better performing queries. Note that, when costing information is defined on the tables, most of these decisions are made automatically, if the Teiid not doing it you now know how to force it to use :)
Thanks
Ramesh..
You can also customize MAKEDEP to force additional behavior. The planner will choose when to back off of (if there are too many independent values) a dependent join based upon statistics. This behavior can be forced with the MAX option:
SELECT * FROM X INNER JOIN Y
MAKEDEP(MAX:5000) ON X.PK = Y.FK
That means only create the dependent join when there are less then 5000 rows from X side.
The planner based upon the plan and the source support for dependent joins can also choose to instead send all of the relevant rows from X over to Y. This can be forced with the JOIN option:
SELECT * FROM X INNER JOIN Y
MAKEDEP(JOIN) ON X.PK = Y.FK
This is same as above temp table scenario, Using a temp table for the entire independent side. This option is best suited to situations where more of plan can be pushed, for example aggregation and other processing above the join:
select
grouping
join
access
access
can become:
access (performing the join via data shipment)
select
grouping
If there isn't any additional processing and a wide set of values (or something that is using lobs), then the best you can do is just creating a temporary table for the key set (the previous example only uses key values), in which case the plan still looks the same as the default dependent join.
You can read more about MAKEDEP here https://docs.jboss.org/author/display/TEIID/Federated+Optimizations
There also MAKEIND hint, which opposite of MAKEDEP which is placed on independent side of a dependent query. MAKENOTDEP forces the query engine not to plan a dependent query.
Hopefully this gave a good material as to how, when to use the hints to write better performing queries. Note that, when costing information is defined on the tables, most of these decisions are made automatically, if the Teiid not doing it you now know how to force it to use :)
Thanks
Ramesh..
Comments
Post a Comment