Skip to main content

High-precision geospatial filters

Preview feature

High-precision geospatial filters for native queries are available as a preview feature.

Apache Druid supports two query languages: Druid SQL and native queries. This document describes a feature that is only available in the native language.

High-precision geospatial filters use a geo dimension to provide the same filters and bound types as spatial dimensions and filters but at a higher level of precision, offering more options for how you work with and utilize your geospatial data.

Prerequisites

To ingest high-precision geospatial data and use high-precision geospatial filters to query it, make sure you have the imply-utility-belt extension loaded. Geospatial data ingested prior to high-precision geospatial feature being available (2024.04 STS) is not compatible with high-precision filters.

If you load the imply-utility-belt extension, the high-precision geospatial filters replace Druid's out-of-the-box geospatial filters that are less precise.

Migrate to high-precision geospatial filters

Since high-precision geospatial filters are not compatible with the standard spatial dimensions, Imply recommends that you take the following steps when switching to high-precision:

  1. Ingest your geospatial data into a new column next to your existing geospatial data column. You can use JSON-based or SQL-based ingestion for this.
  2. Test your queries. Update them to use high-precision geospatial data and filters. Compare the results to your queries that used the lower-precision geospatial filters.
  3. If the results are as expected, switch to using the updated queries that use your high-precision geo dimension with high-precision filters.
  4. Update your ingestions to remove the old lower-precision column and remove the column from newly ingested data.
  5. Optionally, delete the previous geospatial column that stored your lower-precision geospatial data in your existing data.

Ingest high-precision geospatial data

You can ingest high-precision geospatial data through classic JSON-based batch ingestion or SQL-based ingestion.

JSON-based ingestion

When creating your ingestion spec, you'll need to include the following:

  • The geo column type in dimensionsSpec.dimensions for the column that you'll store your high-precision geospatial data in:

    "dimensionsSpec": {
    "dimensions": [
    ...
    {
    "name": "high_precision_coords",
    "type": "geo"
    }
    ...
    ]
    }
  • A transform that concatenates your latitude and longitude columns and assigns the result to the geo column type you defined in your dimensions:

    "transformSpec": {
    "filter": null,
    "transforms": [
    {
    "type": "expression",
    "name": "high_precision_coords",
    "expression": "concat(my_lat,',', my_lon)"
    }
    ]
    }

The following sample ingestion spec ingests data from a local CSV file named ingest.100 containing data that resembles this snippet with a timestamp, latitude, and longitude:

2023-12-04 19:34:38.414,39.09535002,-84.51699812
Show the ingestion spec
{
"type": "index_parallel",
"spec": {
"dataSchema": {
"dataSource": "geoTestHP",
"timestampSpec": {
"format": "auto",
"column": "ts"
},
"dimensionsSpec": {
"dimensions": [
{
"name": "status",
"type": "string"
},
{
"name": "primary_coords",
"type": "geo"
}
]
},
"metricsSpec": [
],
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "HOUR",
"queryGranularity": "MINUTE",
"rollup": true
},
"transformSpec": {
"filter": null,
"transforms": [
{
"type": "expression",
"name": "primary_coords",
"expression": "concat(primary_lat,',', primary_lon)"
}
]
}
},
"ioConfig": {
"type": "index_parallel",
"inputSource": {
"type": "local",
"baseDir": "/Users/geo_data/",
"filter": "ingest.100"
},
"inputFormat": {
"type": "csv",
"listDelimiter": "^A",
"columns": ["ts", "primary_lat", "primary_lon", "status"]
},
"appendToExisting": false
}
},
"tuningConfig": {
"type": "index_parallel",
"maxRowsPerSegment": 500000,
"maxNumConcurrentSubTasks": 5,
"indexSpec": {
"geoComplexIndexingSpecs": {}
}
}
}

SQL-based ingestion

For SQL-based ingestion, use the BUILD_GEO function to create a high-precision geospatial data column from your latitude and longitude data:

BUILD_GEO(LATITUDE_COLUMN, LONGITUDE_COLUMN) as HIGH_PRECISION_COLUMN

The following example creates a high-precision column named geo_column based on primary_lat (a column that contains latitude data) and primary_lon (longitude data) from a datasource named geo_data

SELECT
Build_GEO(primary_lat,primary_lon) AS geo_column,
FROM geo_data

The following sample ingestion query uses a local CSV filed named ingest.100 containing data that looks like the following snippet with a timestamp, latitude, and longitude:

2023-12-04 19:34:38.414,39.09535002,-84.51699812

The query ingests the CSV file into a datasource named geo_data:

INSERT INTO "geo_data"
WITH geo_data AS (
SELECT * FROM TABLE(
EXTERN(
'{"type":"local", "baseDir": "/Users/geo_data/","filter": "ingest.100"}',
'{"type":"csv", "listDelimiter": "^A", "columns": ["ts", "primary_lat", "primary_lon"]}',
'[
{"name":"primary_lat","type":"string"},
{"name":"primary_lon","type":"string"}
]'
)
))
SELECT
Build_GEO(primary_lat,primary_lon) AS geo_column,
status as status
FROM geo_data
PARTITIONED BY ALL

High-precision spatial filters

A filter is a JSON object indicating which rows of data should be included in the computation for a query. You can filter on spatial structures, such as rectangles and polygons, using the spatial filter. The high-precision version of these spatial filters replace the lower-precision geospatial filters that come out-of-the-box.

Bounds let you filter on ranges of dimension values.

They have the following structure:

"filter": {
"type": "and",
"fields": [
{
"type": "spatial",
"dimension": NAME_OF_HI_PRECISION_GEO_DIM,
"bounds": {
"type": BOUND_TYPE,
BOUND_FIELDS: ...
...
...
}
}
]
...
}
}

You can define rectangular, radius, or polygon filter bounds.

The following example shows a high-precision geospatial filter with a high-precision radius bound type based on the geo typed column geo_column with a radius of 10 meters :

  "filter": {
"type": "and",
"fields": [
{
"type": "spatial",
"dimension": "primary_coords",
"bound": {
"type": "radiusHP",
"coords": [
39.09497261, -84.51623447
],
"radius": 10,
"radiusUnit": "meters"
},
"filterTuning": {
"useBitmapIndex": true
}
}
]
}

Rectangular

The rectangularHP bound has the following elements:

PropertyDescriptionRequired
minCoordsThe list of minimum dimension coordinates in the form [x, y]yes
maxCoordsThe list of maximum dimension coordinates in the form [x, y]yes

Radius

The radiusHP bound has the following elements:

PropertyDescriptionRequired
coordsOrigin coordinates in the form [x, y]yes
radiusThe float radius valueyes
radiusUnitThe units to use when determining the radius. Can be any one of the following: meters, kilometers, miles, or euclidean (default).no

Polygon

The polygonHP bound has the following elements:

PropertyDescriptionRequired
abscissaHorizontal coordinates for the corners of the polygonyes
ordinateVertical coordinates for the corners of the polygonyes