High-precision geospatial filters
High-precision geospatial filters for native queries are available as a preview feature.
Apache Druid supports two query languages: Druid SQL and native queries. This document describes a feature that is only available in the native language.
High-precision geospatial filters use a geo
dimension to provide the same filters and bound types as spatial dimensions and filters but at a higher level of precision, offering more options for how you work with and utilize your geospatial data.
Prerequisites
To ingest high-precision geospatial data and use high-precision geospatial filters to query it, make sure you have the imply-utility-belt
extension loaded. Geospatial data ingested prior to high-precision geospatial feature being available (2024.04 STS) is not compatible with high-precision filters.
If you load the imply-utility-belt
extension, the high-precision geospatial filters replace Druid's out-of-the-box geospatial filters that are less precise.
Migrate to high-precision geospatial filters
Since high-precision geospatial filters are not compatible with the standard spatial dimensions, Imply recommends that you take the following steps when switching to high-precision:
- Ingest your geospatial data into a new column next to your existing geospatial data column. You can use JSON-based or SQL-based ingestion for this.
- Test your queries. Update them to use high-precision geospatial data and filters. Compare the results to your queries that used the lower-precision geospatial filters.
- If the results are as expected, switch to using the updated queries that use your high-precision
geo
dimension with high-precision filters. - Update your ingestions to remove the old lower-precision column and remove the column from newly ingested data.
- Optionally, delete the previous geospatial column that stored your lower-precision geospatial data in your existing data.
Ingest high-precision geospatial data
You can ingest high-precision geospatial data through classic JSON-based batch ingestion or SQL-based ingestion.
JSON-based ingestion
When creating your ingestion spec, you'll need to include the following:
The
geo
column type indimensionsSpec.dimensions
for the column that you'll store your high-precision geospatial data in:"dimensionsSpec": {
"dimensions": [
...
{
"name": "high_precision_coords",
"type": "geo"
}
...
]
}A transform that concatenates your latitude and longitude columns and assigns the result to the
geo
column type you defined in yourdimensions
:"transformSpec": {
"filter": null,
"transforms": [
{
"type": "expression",
"name": "high_precision_coords",
"expression": "concat(my_lat,',', my_lon)"
}
]
}
The following sample ingestion spec ingests data from a local CSV file named ingest.100
containing data that resembles this snippet with a timestamp, latitude, and longitude:
2023-12-04 19:34:38.414,39.09535002,-84.51699812
Show the ingestion spec
{
"type": "index_parallel",
"spec": {
"dataSchema": {
"dataSource": "geoTestHP",
"timestampSpec": {
"format": "auto",
"column": "ts"
},
"dimensionsSpec": {
"dimensions": [
{
"name": "status",
"type": "string"
},
{
"name": "primary_coords",
"type": "geo"
}
]
},
"metricsSpec": [
],
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "HOUR",
"queryGranularity": "MINUTE",
"rollup": true
},
"transformSpec": {
"filter": null,
"transforms": [
{
"type": "expression",
"name": "primary_coords",
"expression": "concat(primary_lat,',', primary_lon)"
}
]
}
},
"ioConfig": {
"type": "index_parallel",
"inputSource": {
"type": "local",
"baseDir": "/Users/geo_data/",
"filter": "ingest.100"
},
"inputFormat": {
"type": "csv",
"listDelimiter": "^A",
"columns": ["ts", "primary_lat", "primary_lon", "status"]
},
"appendToExisting": false
}
},
"tuningConfig": {
"type": "index_parallel",
"maxRowsPerSegment": 500000,
"maxNumConcurrentSubTasks": 5,
"indexSpec": {
"geoComplexIndexingSpecs": {}
}
}
}
SQL-based ingestion
For SQL-based ingestion, use the BUILD_GEO
function to create a high-precision geospatial data column from your latitude and longitude data:
BUILD_GEO(LATITUDE_COLUMN, LONGITUDE_COLUMN) as HIGH_PRECISION_COLUMN
The following example creates a high-precision column named geo_column
based on primary_lat
(a column that contains latitude data) and primary_lon
(longitude data) from a datasource named geo_data
SELECT
Build_GEO(primary_lat,primary_lon) AS geo_column,
FROM geo_data
The following sample ingestion query uses a local CSV filed named ingest.100
containing data that looks like the following snippet with a timestamp, latitude, and longitude:
2023-12-04 19:34:38.414,39.09535002,-84.51699812
The query ingests the CSV file into a datasource named geo_data
:
INSERT INTO "geo_data"
WITH geo_data AS (
SELECT * FROM TABLE(
EXTERN(
'{"type":"local", "baseDir": "/Users/geo_data/","filter": "ingest.100"}',
'{"type":"csv", "listDelimiter": "^A", "columns": ["ts", "primary_lat", "primary_lon"]}',
'[
{"name":"primary_lat","type":"string"},
{"name":"primary_lon","type":"string"}
]'
)
))
SELECT
Build_GEO(primary_lat,primary_lon) AS geo_column,
status as status
FROM geo_data
PARTITIONED BY ALL
High-precision spatial filters
A filter is a JSON object indicating which rows of data should be included in the computation for a query. You can filter on spatial structures, such as rectangles and polygons, using the spatial filter. The high-precision version of these spatial filters replace the lower-precision geospatial filters that come out-of-the-box.
Bounds let you filter on ranges of dimension values.
They have the following structure:
"filter": {
"type": "and",
"fields": [
{
"type": "spatial",
"dimension": NAME_OF_HI_PRECISION_GEO_DIM,
"bounds": {
"type": BOUND_TYPE,
BOUND_FIELDS: ...
...
...
}
}
]
...
}
}
You can define rectangular, radius, or polygon filter bounds.
The following example shows a high-precision geospatial filter with a high-precision radius bound type based on the geo
typed column geo_column
with a radius of 10 meters :
"filter": {
"type": "and",
"fields": [
{
"type": "spatial",
"dimension": "primary_coords",
"bound": {
"type": "radiusHP",
"coords": [
39.09497261, -84.51623447
],
"radius": 10,
"radiusUnit": "meters"
},
"filterTuning": {
"useBitmapIndex": true
}
}
]
}
Rectangular
The rectangularHP
bound has the following elements:
Property | Description | Required |
---|---|---|
minCoords | The list of minimum dimension coordinates in the form [x, y] | yes |
maxCoords | The list of maximum dimension coordinates in the form [x, y] | yes |
Radius
The radiusHP
bound has the following elements:
Property | Description | Required |
---|---|---|
coords | Origin coordinates in the form [x, y] | yes |
radius | The float radius value | yes |
radiusUnit | The units to use when determining the radius. Can be any one of the following: meters , kilometers , miles , or euclidean (default). | no |
Polygon
The polygonHP
bound has the following elements:
Property | Description | Required |
---|---|---|
abscissa | Horizontal coordinates for the corners of the polygon | yes |
ordinate | Vertical coordinates for the corners of the polygon | yes |