SQL-based ingestion using the multi-stage query task engine is a preview feature available starting in Imply Enterprise and Imply Hybrid 2022.06. It is not available in Polaris yet. Preview features enable early adopters to benefit from new functionality while providing ongoing feedback to help shape and evolve the feature. All functionality documented on this page is subject to change or removal in future releases. Preview features are provided "as is" and are not subject to Imply SLAs.
Multi-stage query task runtime
Fault tolerance is not implemented. If any task fails, the entire query fails.
Worker task stage outputs are stored in the working directory given by
druid.indexer.task.baseDir. Stages that generate a large amount of output data may exhaust all available disk space. In this case, the query fails with an UnknownError with a message including "No space left on device".
SELECT from a Druid datasource does not include unpublished real-time data.
GROUPING SETS and UNION ALL are not implemented. Queries using these features return a QueryNotSupported error.
The numeric varieties of the EARLIEST and LATEST aggregators do not work properly. Attempting to use the numeric varieties of these aggregators lead to an error like
java.lang.ClassCastException: class java.lang.Double cannot be cast to class org.apache.druid.collections.SerializablePair. The string varieties, however, do work properly.
INSERT and REPLACE
INSERT and REPLACE with column lists, like
INSERT INTO tbl (a, b, c) SELECT ..., is not implemented.
INSERT ... SELECTand
REPLACE ... SELECTinsert columns from the SELECT statement based on column name. This differs from SQL standard behavior, where columns are inserted based on position.
EXTERN with input sources that match large numbers of files may exhaust available memory on the controller task.
EXTERN does not accept
druidinput sources. Use FROM instead.