Lookups tutorial
You can use lookups to enrich data from a table with values from a lookup source at query time. A lookup source in Polaris is a specialized table that meets specific requirements.
This tutorial teaches you how to set up and use lookups in queries. The tutorial takes you through the following steps:
In this tutorial, you also learn how to apply lookups at ingestion time to supplement your source data. To skip to this example, see How to use lookups at ingestion. To learn more about strategies for using lookups, see Lookup strategies.
Prerequisites
To follow along, download the following datasets, and upload them to the file staging area.
You need the following permissions to create and use lookups:
ManageTables
: Create a lookup table and the lookup itself.ManageIngestionJobs
: Ingest data into a lookup table.AccessQueries
: Run SQL queries.
For information on permissions, see Permissions reference.
How to use lookups in queries
In this section, you create a table, car-trips
, that represents all the trips taken by a fleet of cars.
You create a second table, car-makes
, that contains information on each car's make, model, and color.
Both tables have a column for vehicle ID.
You use the car-makes
table as a lookup source to provide more information when querying the data in car-trips
.
Ingest data
In this section, you ingest data into a table named car-trips
that contains quantitative information about trips taken by a fleet of cars.
The data contains vehicle IDs that identify which vehicle made the trip.
From the left navigation menu, access the Jobs page.
Click Create job > New table. Enter
car-trips
for the table name.Select the Files source. Select
car-trips.json
and click Next > Continue.In the step to Map source to table, select Edit for the
vid
column. Cast the column to string, since lookups only operate on string-typed values. To do so, enter the input expressionCAST("vid" AS VARCHAR)
.Click Apply > Start ingestion.
Ingest lookup data
In this section, you ingest a lookup source, a table called car-makes
.
The table contains qualitative information that describes the car associated with each vehicle ID.
From the left navigation menu, access the Jobs page.
Click Create job > New table. Enter
car-makes
for the table name.Select the Files source. Select
car-makes.json
and click Next > Continue.In the step to Map source to table, cast
vid
andyear
to string as follows:- Select Edit for the
vid
column. Cast it to string using the input expressionCAST("vid" AS VARCHAR)
. - Select Edit for the
year
column. Cast it to string using the input expressionCAST("year" AS VARCHAR)
.
- Select Edit for the
Click Partitioning in the menu bar above the column editor.
Select the
All
time partitioning since a lookup table requires the partitioning set toAll
.
If you don't set this, Polaris applies the default value,Day
. You can still edit the table to change its partitioning.Start the ingestion.
After initial ingestion, you may later need to update or refresh data used in the lookup.
When ingesting data into an active lookup table, make sure to use the replace
ingestion mode
to replace all data in the table.
This example used the insert
ingestion mode, since the table wasn't referenced by a lookup yet.
For more information, see the Lookup guardrails.
Create a lookup
In this section, you create a lookup that references the lookup source.
- Click Lookups from the left navigation menu.
- Click Create lookup.
- For the name, enter
vid-makes
. - For the source table, select
car-makes
. - Click Create.
Query the data
Now you query the car-trips
data.
In the first query, you select data that's only in the table itself.
In the second query, you supplement the query results using the lookup.
Click SQL in the left navigation menu to access the SQL workbench.
Run the following query to identify trips longer than 10 miles and with an average fuel economy above 30 miles per hour:
SELECT
"vid",
TIMESTAMPDIFF(MINUTE, TIME_PARSE("start_time"), "__time") AS "elapsed_minutes",
"len_miles",
"avg_mph"
FROM "car-trips"
WHERE "len_miles" > 10 AND avg_mph > 30Polaris returns query results from data within this table only:
Use the lookup you just created,
vid-makes
, to get car information for the identified trips.SELECT
"vid",
TIMESTAMPDIFF(MINUTE, TIME_PARSE("start_time"), "__time") AS "elapsed_minutes",
"len_miles",
"avg_mph",
LOOKUP("vid", 'vid-makes[vid][make]') AS "make",
LOOKUP("vid", 'vid-makes[vid][model]') AS "model"
FROM "car-trips"
WHERE "len_miles" > 10 AND avg_mph > 30The query results now include information from
car-makes
:
How to use lookups at ingestion
You can use lookups at ingestion time to reduce the query burden of using a separate lookup table in every query.
The following example uses the car-makes
table and the vid-makes
lookup from the previous example.
You ingest the car trips data with columns added to identify details about each car.
To skip the first five steps and use SQL-based ingestion, submit the following query using the SQL code editor or the API:
INSERT INTO "car-trips-makes"
SELECT
TIME_PARSE("end_time") AS "__time",
"start_time" AS "start_time",
"vid" AS "vid",
"len_miles" AS "len_miles",
"avg_mph" AS "avg_mph",
LOOKUP("vid", 'vid-makes[vid][make]') AS "make",
LOOKUP("vid", 'vid-makes[vid][model]') AS "model"
FROM TABLE(POLARIS_SOURCE('{"fileList":["car-trips.json"],"inputSchema":[{"dataType":"string","name":"vid"},{"dataType":"string","name":"end_time"},{"dataType":"string","name":"start_time"},{"dataType":"double","name":"len_miles"},{"dataType":"double","name":"avg_mph"}],"formatSettings":{"format":"nd-json"},"type":"uploaded"}'))
PARTITIONED BY DAY
From the left navigation menu, access the Job page.
Click Create job > New table. Enter
car-trips-makes
for the table name.Select the Files source. Select
car-trips.json
and click Next > Continue.In the step to Map source to table, cast the
vid
column to string using the input expressionCAST("vid" AS VARCHAR)
.Add columns with the following names and input expressions:
make
:LOOKUP("vid", 'vid-makes[vid][make]', 'NA')
model
:LOOKUP("vid", 'vid-makes[vid][model]', 'NA')
The optional third argument,
NA
, substitutes for any missing keys. If you don't set this value, Polaris returns NULL for missing keys.Click Start ingestion.
When the ingestion completes, query the resulting table using the following SQL:
SELECT
"vid",
TIMESTAMPDIFF(MINUTE, TIME_PARSE("start_time"), "__time") AS "elapsed_minutes",
"len_miles",
"avg_mph",
"make",
"model"
FROM "car-trips-makes"
WHERE "len_miles" > 10 AND avg_mph > 30Notice that the results are the same as the query using lookups at ingestion time. Since you ingested the data from both tables together, you no longer need to use lookups at query time.
Learn more
For information about lookups in Polaris, see Lookups.
To learn how to create and manage lookups and lookup aliases by API, see Create lookups.