Skip to main content

Lookups tutorial

info

Lookups are a beta feature available to select customers. Imply must enable the feature for you. Contact Polaris support to find out more.

You can use lookups to enrich data from a table with values from a lookup source at query time. A lookup source in Polaris is a specialized table that meets specific requirements.

This tutorial teaches you how to set up and use lookups in queries. The tutorial takes you through the following steps:

In this tutorial, you also learn how to apply lookups at ingestion time to supplement your source data. To skip to this example, see How to use lookups at ingestion. To learn more about strategies for using lookups, see Lookup strategies.

Prerequisites

To follow along, download the following datasets, and upload them to the file staging area.

You need the following permissions to create and use lookups:

  • ManageTables: Create a lookup table and the lookup itself.
  • ManageIngestionJobs: Ingest data into a lookup table.
  • AccessQueries: Run SQL queries.

For information on permissions, see Permissions reference.

How to use lookups in queries

In this section, you create a table, car-trips, that represents all the trips taken by a fleet of cars. You create a second table, car-makes, that contains information on each car's make, model, and color. Both tables have a column for vehicle ID. You use the car-makes table as a lookup source to provide more information when querying the data in car-trips.

Create a table

In this section, you create a table, car-trips, that contains quantitative information about trips taken by a fleet of cars. The data contains vehicle IDs that identify which vehicle made the trip.

  1. From the left navigation menu, access the Jobs page.

  2. Click Create job > New table. Enter car-trips for the table name.

  3. Select the Files source. Select car-trips.json and click Next > Continue.

  4. In the step to Map source to table, select Edit for the vid column. Cast the column to string, since lookups only operate on string-typed values. To do so, enter the input expression CAST("vid" AS VARCHAR).

    Cast type when ingesting car trips

  5. Click Apply > Start ingestion.

Create a lookup source

In this section, you create a lookup source, a table called car-makes. The table contains qualitative information that describes the car associated with each vehicle ID.

  1. From the left navigation menu, access the Jobs page.

  2. Click Create job > New table. Enter car-makes for the table name.

  3. Select the Files source. Select car-makes.json and click Next > Continue.

  4. In the step to Map source to table, cast vid and year to string as follows:

    1. Select Edit for the vid column. Cast it to string using the input expression CAST("vid" AS VARCHAR).
    2. Select Edit for the year column. Cast it to string using the input expression CAST("year" AS VARCHAR).
  5. Click Partitioning in the menu bar above the column editor.

  6. Select the All time partitioning since a lookup table requires the partitioning set to All.
    If you don't set this, Polaris applies the default value, Day. You can still edit the table to change its partitioning.

    Select all partitioning for car makes

  7. Start the ingestion.

Create a lookup

In this section, you create a lookup that references the lookup source.

  1. Click Lookups from the left navigation menu.
  2. Click Create lookup.
  3. For the name, enter vid-makes.
  4. For the source table, select car-makes.
  5. Click Create.

Query the data

Now you query the car-trips data. In the first query, you select data that's only in the table itself. In the second query, you supplement the query results using the lookup.

  1. Click SQL in the left navigation menu to access the SQL workbench.

  2. Run the following query to identify trips longer than 10 miles and with an average fuel economy above 30 miles per hour:

    SELECT
    "vid",
    TIMESTAMPDIFF(MINUTE, TIME_PARSE("start_time"), "__time") AS "elapsed_minutes",
    "len_miles",
    "avg_mph"
    FROM "car-trips"
    WHERE "len_miles" > 10 AND avg_mph > 30

    Polaris returns query results from data within this table only:

    Query with table only

  3. Use the lookup you just created, vid-makes, to get car information for the identified trips.

    SELECT
    "vid",
    TIMESTAMPDIFF(MINUTE, TIME_PARSE("start_time"), "__time") AS "elapsed_minutes",
    "len_miles",
    "avg_mph",
    LOOKUP("vid", 'vid-makes[vid][make]') AS "make",
    LOOKUP("vid", 'vid-makes[vid][model]') AS "model"
    FROM "car-trips"
    WHERE "len_miles" > 10 AND avg_mph > 30

    The query results now include information from car-makes:

    Query using lookups

How to use lookups at ingestion

You can use lookups at ingestion time to reduce the query burden of using a separate lookup table in every query. The following example uses the car-makes table and the vid-makes lookup from the previous example. You ingest the car trips data with columns added to identify details about each car.

To skip the first five steps and use SQL-based ingestion, submit the following query using the SQL code editor or the API:

INSERT INTO "car-trips-makes"
SELECT
TIME_PARSE("end_time") AS "__time",
"start_time" AS "start_time",
"vid" AS "vid",
"len_miles" AS "len_miles",
"avg_mph" AS "avg_mph",
LOOKUP("vid", 'vid-makes[vid][make]') AS "make",
LOOKUP("vid", 'vid-makes[vid][model]') AS "model"
FROM TABLE(POLARIS_SOURCE('{"fileList":["car-trips.json"],"inputSchema":[{"dataType":"string","name":"vid"},{"dataType":"string","name":"end_time"},{"dataType":"string","name":"start_time"},{"dataType":"double","name":"len_miles"},{"dataType":"double","name":"avg_mph"}],"formatSettings":{"format":"nd-json"},"type":"uploaded"}'))
PARTITIONED BY DAY
  1. From the left navigation menu, access the Job page.

  2. Click Create job > New table. Enter car-trips-makes for the table name.

  3. Select the Files source. Select car-trips.json and click Next > Continue.

  4. In the step to Map source to table, cast the vid column to string using the input expression CAST("vid" AS VARCHAR).

  5. Add columns with the following names and input expressions:

    1. make: LOOKUP("vid", 'vid-makes[vid][make]', 'NA')
    2. model: LOOKUP("vid", 'vid-makes[vid][model]', 'NA')

    The optional third argument, NA, substitutes for any missing keys. If you don't set this value, Polaris returns NULL for missing keys.

  6. Click Start ingestion.

  7. When the ingestion completes, query the resulting table using the following SQL:

    SELECT
    "vid",
    TIMESTAMPDIFF(MINUTE, TIME_PARSE("start_time"), "__time") AS "elapsed_minutes",
    "len_miles",
    "avg_mph",
    "make",
    "model"
    FROM "car-trips-makes"
    WHERE "len_miles" > 10 AND avg_mph > 30

    Notice that the results are the same as the query using lookups at ingestion time. Since you ingested the data from both tables together, you no longer need to use lookups at query time.

Learn more

For information about lookups in Polaris, see Lookups.

To learn how to create and manage lookups and lookup aliases by API, see Create lookups.