Skip to main content

Teiid User Defined Aggregate Support

Building upon our support for pushdown and non-pushdown user defined scalar functions, Teiid with 8.0 CR1 also offers the ability to specify user defined aggregate functions (UDAF).

UDAFs are useful to extend Teiid's already extensive aggregate function library.  They also allow Teiid to implement the final processing over pushdown aggregation. For partitioned datasets there is no additional work for Teiid to do, but the UDAF may be needed to express custom aggregation to the participating data sources. For non-partitioned datasets single argument UDAFs may be marked as decomposable to indicate that not only can the initial aggregation be pushed down, but that Teiid will perform the final integration/aggregation of results itself. This provides map-reduce like behavior where processing can be performed in parallel on a large number of source systems.

UDAFs are defined similarly to UDFs and share most of the same metadata properties.  Building on the prior post using Dynamic VDB DDL metadata we can add a user defined aggregate to mimic the commonly supported GROUP_CONCAT aggregate function:

dynamic-portfolio-vdb.xml

 <vdb name="DynamicPortfolio" version= "1">  
   <model name="MarketData">  
     <source name="text-connector" translator-name="file" connection-jndi-name="java:/marketdata-file"/>  
   </model>  
   <model visible = "true" type = "VIRTUAL" name = "portfolio">  
      <metadata type = "DDL"><![CDATA[  
        CREATE VIEW stock (  
         symbol varchar,  
         price decimal  
         ) AS   
          select stock.* from (call MarketData.getTextFiles('*.txt')) f,   
          TEXTTABLE(f.file COLUMNS symbol string, price bigdecimal HEADER) stock;

        CREATE VIRTUAL FUNCTION GROUP_CONCAT(val STRING, sep CHAR) RETURNS STRING 
          OPTIONS (AGGREGATE 'TRUE', "NULL-ON-NULL" 'TRUE', JAVA_CLASS 'example.GroupConcat',
            "ALLOWS-ORDERBY" 'TRUE', JAVA_METHOD 'addInput');
      ]]>  
      </metadata>  
   </model>  
 </vdb> 

The Java class shown below to back the non-pushdown function is straight-forward. The class, designated by the JAVA_CLASS option, must extend org.teiid.UserDefinedAggregate and provide a method, designated by the JAVA_METHOD option, with a signature that accepts the expected inputs.


public static class GroupConcat implements UserDefinedAggregate {
  
 private boolean first = true;
 private boolean isNull = true;
 private StringBuffer buffer = new StringBuffer();
  
 public void addInput(String val, char separator) {
  if (!first) {
   buffer.append(separator);
  }
  buffer.append(val);
  first = false;
  isNull = false;
 }
  
 @Override
 public String getResult(org.teiid.CommandContext commandContext) {
  if (isNull) {
   return null;
  }
  return buffer.toString();
 }

 @Override
 public void reset() {
  first = true;
  isNull = true;
  buffer = new StringBuffer();
 } 
}

You can follow the instructions in the Developer's Guide for more on coding and deploying user defined functions. With the VDB and the UDAF module deployed you should now be able to use the aggregate in your queries.

For example:
SELECT GROUP_CONCAT(SYMBOL, ',' ORDER BY PRICE) AS STOCK_LIST FROM STOCK

For now though this feature is for leading edge users as Teiid Designer support is still unfolding.  Designer 8.0 will overhaul their usage of the Function model which has traditionally been used for UDFs.  So there will be more to come from both the Teiid and Teiid Designer sides shortly.    

Steve

Comments

Popular posts from this blog

Teiid Platform Sizing Guidelines and Limitations

Users/customers always ask us about the sizing of their Data Virtaulization infrastructure based on Teiid or the JDV product from Redhat. Typically this is very involved question and not a very easy one answer in plain terms. This is due to fact that it involves taking into consideration questions like:
What kind of sources that user is working with? Relational, file, CRM, NoSQL etc.How many sources they are trying to integrate? 10, 20, 100?What are the volumes of data they are working with? 10K, 100K, 1M+?What are the query latency times from the sources? How you are using Teiid to implement the data integration/virtualization solution. What kind of queries that user is executing? Even small federated results may take a lot of server side processing - especially if the plan needs tweaking.Is materializing being used?Is query written in optimal way?and so on..Each and every one of the question affects the performance profoundly, and if you got mixture of those then it become that much…

Teiid Runtimes Explained

If you have been following Teiid lately we have been going through a whole lot of renovations. Yes, renovations or reorganization or refactoring or whatever you want to call it. Basically, we are making Teiid more modular with fewer dependencies that can be used by however your use case dictates rather than use it as one monolith application deployed into WildFly JEE Application Server. There is nothing wrong in using Teiid as server model, but with the proliferation of container-based workloads and cloud-based architectures, the previous server-based model does not work or simply won't scale. So, we needed to think of alternatives, thus Teiid team introduced a couple different versions modular Teiid what we are calling as "Teiid Runtimes".

Note that in these modular Teiid runtimes, not all the features you were used to using in Teiid Server model may not be there but you will have extensions to add in those that are most appropriate for your domain. If you are looking …