Skip to main content

Migrate lookups from Druid

When you migrate lookups from open source Apache Druid® or Imply's distribution of Apache Druid® to Imply Polaris, you can retain your existing queries without change. On the Polaris side, you need to create the following:

  • A lookup source. Polaris only supports tables as a lookup source.
  • A lookup. The lookup references the table as its source.
  • One or more aliases for the lookup. A lookup alias is optional and provided for query compatibility.

This topic presents a high-level overview of configuring lookups in Polaris when you're migrating lookups from Druid.

For information about lookups in Polaris, see Lookups.

Differences from Druid

You can only source lookup information from a table in Polaris. Lookups in Polaris don't support JDBC or streaming-based data sources.

Polaris doesn't automatically poll for updates to the lookup source. To update the data source of a lookup, you reingest data in the table that the lookup references. When the ingestion job completes, the lookup is updated. To periodically update your lookup data, you can automate a recurring ingestion job into the table using the Jobs API. For more information, see Refresh a lookup.

Migration process

The general process to migrate lookups from Druid involves the following steps:

  1. Prepare your lookup data for ingestion. Ensure your data complies with one of the supported formats.

  2. Create a table and ingest the lookup data. Ensure your table meets the requirements to be a lookup source, such as having string-typed columns and ALL partitioning granularity. For details, see Lookup sources.

  3. Define a lookup that references the table. A lookup definition takes a name to identify it and an existing table that serves as the lookup source. To learn how to create a lookup in Polaris, see Create a lookup.

  4. (optional) Create a lookup alias for the lookup. You only need to create an alias when you want to retain the syntax of existing queries. A lookup alias is associated with an existing lookup. The alias presets what column to use for keys and what column to represent values. You only need to create an alias when you want to retain the syntax of existing queries. For more details, see Lookup aliases.

  5. (optional) Refresh your lookup. Update the data in your lookup table in order to get the most up to date information for your lookup queries. When the table has the updated data, the lookup is also updated. For more information, see Refresh a lookup.

    To regularly update your lookup, you can create a script to reingest data on a recurring schedule, such as every set number of minutes. Another strategy is to automatically trigger reingestion based on an external stimulus, such as a change detected in the source data.

Lookup aliases

Lookup aliases allow you to retain your existing query syntax from Druid. Use lookup aliases to migrate your queries as is without having to update your lookup references.

If you have an ordinary lookup called my_lookup, you'd reference it in a query with the specification of its key and value column. For example, my_lookup[user_ids][comment_count] would replace user IDs from a table being queried with the comment count of the user.

With a lookup alias, you wouldn't need to specify user_ids and comment_count. For the same example, you can create a lookup alias called my_lookup_ids_count that points to my_lookup and sets the key and value columns to user_ids and comment_count, respectively. You use the lookup alias the same way that you use a lookup. The alias my_lookup_ids_count is equivalent to specifying my_lookup[user_ids][comment_count].

Keep in mind the following details regarding lookups and lookup aliases:

  • A lookup may have multiple aliases associated with it.
  • Any column of a lookup table may serve as the key or value column, provided it's a string column.
  • Lookups and lookup aliases share the same namespace. You can't create a lookup or lookup alias with the same name as an existing lookup or lookup alias.
  • Lookup aliases have a slight performance overhead compared to using lookups directly. It's recommended to only use lookup aliases for query migration purposes.

Create a lookup alias

Lookup aliases currently only have API support. To create a lookup alias using the API, see Create or update aliases for a lookup.

Syntax

Use a lookup alias the same way you would use a lookup in a query:

LOOKUP("COLUMN_NAME", 'LOOKUP_ALIAS_NAME', ['OPTIONAL_DEFAULT_VALUE'])

Replace the following values in the LOOKUP function:

  • COLUMN_NAME: Column name in the table that you're querying.
  • LOOKUP_ALIAS_NAME: Name that identifies the lookup alias.
    The alias includes the specification for the key and value columns.
  • OPTIONAL_DEFAULT_VALUE: Constant string to use as the default when the lookup source doesn't have a value for the provided key. If you don't provide the default value, Polaris returns NULL.

Learn more

For information about lookups in Polaris, see Lookups.

To learn how to create and manage lookups and lookup aliases by API, see Create lookups.