Skip to main content

Ingest earliest or latest values into a table measure

Imply Polaris supports EARLIEST_BY and LATEST_BY in input expressions to ingest the earliest or latest value from a string input field. Consider a use case in which you want to combine rows with the same timestamp and dimension values, but for each bucket, you want to store the earliest value of a certain input field. To do this, you create a measure in an aggregate table and use the EARLIEST_BY aggregation function in the input expression for the measure.

This topic provides a description of EARLIEST_BY and LATEST_BY and walks you through an example of using these functions during ingestion.

Aggregation functions

You can use the following functions in input expressions.

FunctionDescription
EARLIEST_BY(expr, timestampExpr[, maxBytesPerValue])For the earliest time value in timestampExpr, returns the associated value of the string expr.
LATEST_BY(expr, timestampExpr[, maxBytesPerValue])For the latest time value in timestampExpr, returns the associated value of the string expr.

Keep the following details in mind when using EARLIEST_BY or LATEST_BY in an input expression:

  • EARLIEST_BY and LATEST_BY only work on string input fields. Applying these functions on a non-string input results in an invalid expression.
  • Generally, timestampExpr should correspond to the same timestamp value of the primary timestamp, __time.
  • The optional parameter maxBytesPerValue determines the number of bytes to allocate for aggregating strings. The default value is 1024 bytes. Polaris truncates any string beyond this limit; however, too high a value may lead to wasted memory.

Example

The following example shows how to use input expressions during ingestion to accomplish the following:

  • Aggregate data by Minute granularity.
  • Populate the first_IP column based on the earliest occurring srcIP value of the aggregated values.
  • Populate the last_IP column based on the latest occurring dstIP value of the aggregated values.

This example is based on the network flow event data used in the rollup example.

  1. Follow the steps from the rollup example, but do not start the ingestion.

  2. On the Map source to table page, add a measure as follows:

    • Name: first_IP
    • Data type: Long String Pair
    • Input expression:
      EARLIEST_BY("srcIP", TIME_PARSE("timestamp"))
  3. Add a second measure as follows:

    • Name: last_IP
    • Data type: Long String Pair
    • Input expression:
      LATEST_BY("dstIP", TIME_PARSE("timestamp"))

    Your table should look similar to the following. The lhs field lists the timestamp value in milliseconds, and the rhs field lists the first or last value.

    Earliest and latest by aggregations

  4. Click Start ingestion.

Learn more

For more information, see the following topics: