Ingest earliest or latest values into a table measure
Imply Polaris supports EARLIEST_BY
and LATEST_BY
in input expressions to ingest
the earliest or latest value from a string input field.
Consider a use case in which you want to combine rows with the same timestamp
and dimension values, but for each bucket, you want to store the earliest value of a certain input field.
To do this, you create a measure in an aggregate table
and use the EARLIEST_BY
aggregation function in the
input expression for the measure.
This topic provides a description of EARLIEST_BY
and LATEST_BY
and walks you through an example of
using these functions during ingestion.
Aggregation functions
You can use the following functions in input expressions.
Function | Description |
---|---|
EARLIEST_BY(expr, timestampExpr, maxBytesPerString) | For the earliest time value in timestampExpr , returns the associated value of the string expr . |
LATEST_BY(expr, timestampExpr, maxBytesPerString) | For the latest time value in timestampExpr , returns the associated value of the string expr . |
Keep the following details in mind when using EARLIEST_BY
or LATEST_BY
in an input expression:
EARLIEST_BY
andLATEST_BY
only work on string input fields. Applying these functions on a non-string input results in an invalid expression.- Generally,
timestampExpr
should correspond to the same timestamp value of the primary timestamp,__time
. - Set
maxBytesPerString
to the number of bytes to allocate for aggregating strings. Polaris truncates any string beyond this limit; however, too high a value may lead to wasted memory. A reasonable value formaxBytesPerString
is 1000 or 1024.
Example
The following example shows how to use input expressions during ingestion to accomplish the following:
- Aggregate data by
Minute
granularity. - Populate the
first_IP
column based on the earliest occurringsrcIP
value of the aggregated values. - Populate the
last_IP
column based on the latest occurringdstIP
value of the aggregated values.
This example is based on the network flow event data used in the rollup example.
Follow the steps from the rollup example, but do not start the ingestion.
On the Map source to table page, add a string measure named
first_IP
with the following input expression:EARLIEST_BY("srcIP", TIME_PARSE("timestamp"), 1000)
Add a second string measure named
last_IP
with the following input expression:LATEST_BY("dstIP", TIME_PARSE("timestamp"), 1000)
Your table should look similar to the following. The
lhs
field lists the timestamp value in milliseconds, and therhs
field lists the first or last value.Click Start ingestion.
Learn more
For more information, see the following topics:
- Introduction to tables for details on aggregate tables.
- Introduction to data rollup for how Polaris aggregates raw data during ingestion.
- Map and transform data with input expressions for details on input expressions and supported aggregation functions.