Skip to main content

Enhanced IP support

IPv6 support is a beta feature. For more information, see Preview features.

Enhanced IP support in Imply's distribution of Apache Druid® includes support for Internet Protocol version 6 (IPv6) in addition to Internet Protocol version 4 (IPv4). You can use the enhanced IP support feature to store a dimension as a complex data type representing an IP address or an IP prefix. Druid queries that use the functions in this topic require IP addresses and prefixes to be stored as a complex type.

Setup and ingestion

Support for IPv6 is disabled by default. To enable this feature, load the imply-utility-belt extension in the common.runtime.properties, for example:

druid.extensions.loadList=["imply-utility-belt"]

To configure a dimension as a complex IP data type, include a dimension object in the dimensionsSpec property of your ingestion spec. Specify one of the following complex IP data types:

  • ipAddress: Stores an IP address in a 128-bit binary format.
  • ipPrefix: Stores an IP prefix in a 136-bit binary format, comprised of a 128-bit address and an 8-bit prefix value.

For example, the following snippet of the ingestion spec instructs Druid to ingest the following:

  • page and language as string-typed dimensions
  • remote_address as a complex IP address dimension
  • network_prefix as a complex IP prefix dimension
"dimensionsSpec" : {
"dimensions": [
"page",
"language",
{"type": "ipAddress", "name": "remote_address"},
{"type": "ipPrefix", "name": "network_prefix"}
]
}

See Ingestion spec reference for more information on configuring dimensions during ingestion.

Querying on IP addresses

Apply these functions in Druid SQL or native queries.

Convert between string and complex types

The following functions translate between strings and complex IP data types in Druid.

FunctionNotes
IP_PARSE(expr)Parses the string expr into an IP address complex type. Druid doesn't retain any prefix length specified in expr. Raises ParseException if the input string is not a valid IP address.
IP_TRY_PARSE(expr)Parses the string expr into an IP address complex type. Druid doesn't retain any prefix length specified in expr. Returns null if the input string is not a valid IP address.
IP_PREFIX_PARSE(expr)Parses the string expr into an IP prefix complex type. Raises ParseException if the input string is not a valid IP prefix. If you don't specify a prefix length, Druid assigns the maximum prefix length—32 for IPv4 addresses and 128 for IPv6 addresses.
IP_PREFIX_TRY_PARSE(expr)Parses the string expr into an IP prefix complex type. Returns null if the input string is not a valid IP prefix. If you don't specify a prefix length, Druid assigns the maximum prefix length—32 for IPv4 addresses and 128 for IPv6 addresses.
IP_STRINGIFY(expr, [compact])Converts a complex IP data type expr to a string. expr may be an IP address complex type or an IP prefix complex type. The optional parameter compact defaults to true, providing a compact form of the IP address with zero compression applied. Specify false for compact to obtain the full, expanded IP address. IPv4 addresses are not affected by compact.

Transform IP addresses and prefixes

The following functions transform between complex IP addresses and complex IP prefixes.

FunctionNotes
IP_PREFIX(expr, prefixLength)Given an IP address expr, creates a new IP prefix complex type. The address portion of the complex IP prefix consists of prefixLength number of bits of the original address. The prefix value portion of the complex IP prefix contains the value prefixLength.
IP_HOST(expr)Truncates an IP prefix complex type expr. Druid removes the prefix and returns the corresponding host IP address in the form of an IP address complex data type.

Analyze IP addresses

The following functions calculate whether a specified IP address or IP prefix is within a given IP address range.

FunctionNotes
IP_MATCH(expr1, expr2)Checks whether the IP address expr1 is equal to or located within the IP address or range expr2.

expr1 may be a string or complex IP address type. expr2 may be a complex IP prefix, a string IP address, or a string CIDR range. Exactly one of the two parameters must be a complex IP data type.
IP_SEARCH(expr1, expr2)Checks whether the IP address expr1 is equal to or located within the IP range expr2.

expr1 may be a partial IP address string or an IP address complex type. expr2 may be a partial IP address string or IP prefix complex type. Exactly one of the two parameters must be a complex IP data type.

For IP address strings, specify IPv4 addresses in dotted decimal notation without leading zeroes. Specify IPv6 addresses in hexadecimal notation. Leading zeroes are accepted for IPv6 addresses.

Druid expands any partial IP address into all possible IP addresses that starts with the incomplete string. If the partial IP address string contains ., it is only expanded into the IPv4 range. If the partial IP string contains :, it is only expanded into the IPv6 range. If the partial IP address string contains neither . nor :, then it is expanded into all possible IPv4 and IPv6 ranges.

The difference between IP_MATCH and IP_SEARCH lies in the use of partial IP addresses in IP_SEARCH. IP_MATCH finds exact matches to a reference IP address and returns an error when supplied with a partial IP address. IP_SEARCH expands the IP address string argument unless the string ends with one of the following patterns:

  • /
  • /x for integer x
  • ::

To match the exact IP address of 0:0:0:0:0:0:0:1 against possible values expr2, the following three queries are equivalent:

  • IP_MATCH('0:0:0:0:0:0:0:1', expr2)
  • IP_SEARCH('0:0:0:0:0:0:0:1/', expr2)
  • IP_SEARCH('0:0:0:0:0:0:0:1/128', expr2)

In IP_SEARCH, Druid expands the IP address string into all possible IP addresses starting with the provided string. The following examples demonstrate how Druid expands a partial IP address string:

  • The string 0:0:0:0:0:0:0:1 is expanded to signify any address from 0:0:0:0:0:0:0:1000 to 0:0:0:0:0:0:0:1fff—a total of 4096 possible addresses with 16 possible hexadecimal values for each of the last three fields.
  • The string 2001:4d98:bffb::1 is expanded to signify any address from 2001:4d98:bffb::1000 to 2001:4d98:bffb::1fff. Druid represents :: in an IPv6 address with the maximum series of zeroes possible.
  • The string eeee:1122:0 is expanded with 0 fixed as the first value in the third group, resulting in addresses from eeee:1122:0000:0000:0000:0000:0000:0000 to eeee:1122:0fff:ffff:ffff:ffff:ffff:ffff.

Examples

The examples in this section use the following table that contains a mix of IPv4 addresses, IPv6 addresses, and null values:

timeipv4ipv6ipmixprefix
2014102000192.0.2.42001:db8:3:4:5:6:7:8192.0.2.4192.0.2.0/24
2014102000192.0.2.52001:db8:33:44:55:66:77:882001:db8:33:44:55:66:77:88192.0.2.0/24
2014102000198.51.100.32001:db8:3:4:50:60:70:80198.51.100.11192.0.2.0/24
2014102000192.0.2.72001:db8:6:5:4:3:2:12001:db8:6:5:5:6:7:8192.0.2.1/32
2014102000203.0.113.52001:db8:6:5:5:6:7:82001:db8:33:44:55:66:77:88192.0.2.1/32
2014102001192.0.2.42001:db8:3:4:5:6:7:8192.0.2.4192.0.2.2/32
20141020012001:db8:33:44:55:66:77:88192.0.2.2/32
2014102001198.51.100.22001:db8:3:4:50:60:70:80203.0.113.122001:db8:3:4::/64
2014102001192.0.2.42001:db8:33:44::/64
2014102001203.0.113.42001:db8:6:5:5:6:7:8192.0.2.52001:db8:6:5::/64

Ingest data with IP address types

The following ingestion spec loads the example data and specifies the columns ipv4, ipv6, and ipmix as IP address types. prefix is ingested as an IP prefix complex type.

{
"type": "index_parallel",
"spec": {
"ioConfig": {
"type": "index_parallel",
"inputSource": {
"type": "inline",
"data": "2014102000\t192.0.2.4\t2001:db8:3:4:5:6:7:8\t192.0.2.4\t192.0.2.0/24\n2014102000\t192.0.2.5\t2001:db8:33:44:55:66:77:88\t2001:db8:33:44:55:66:77:88\t192.0.2.0/24\n2014102000\t198.51.100.3\t2001:db8:3:4:50:60:70:80\t198.51.100.11\t192.0.2.0/24\n2014102000\t192.0.2.7\t2001:db8:6:5:4:3:2:1\t2001:db8:6:5:5:6:7:8\t192.0.2.1/32\n2014102000\t203.0.113.5\t2001:db8:6:5:5:6:7:8\t2001:db8:33:44:55:66:77:88\t192.0.2.1/32\n2014102001\t192.0.2.4\t2001:db8:3:4:5:6:7:8\t192.0.2.4\t192.0.2.2/32\n2014102001\t\t2001:db8:33:44:55:66:77:88\t\t192.0.2.2/32\n2014102001\t198.51.100.2\t2001:db8:3:4:50:60:70:80\t203.0.113.12\t2001:db8:3:4::/64\n2014102001\t192.0.2.4\t\t\t2001:db8:33:44::/64\n2014102001\t203.0.113.4\t2001:db8:6:5:5:6:7:8\t192.0.2.5\t2001:db8:6:5::/64"
},
"inputFormat": {
"type": "tsv",
"findColumnsFromHeader": false,
"columns": [
"time",
"ipv4",
"ipv6",
"ipmix",
"prefix"
]
}
},
"tuningConfig": {
"type": "index_parallel",
"partitionsSpec": {
"type": "dynamic"
}
},
"dataSchema": {
"dataSource": "ip_addresses",
"timestampSpec": {
"column": "time",
"format": "yyyyMMddHH"
},
"transformSpec": {},
"dimensionsSpec": {
"dimensions": [
{
"type": "ipAddress",
"name": "ipv4"
},
{
"type": "ipAddress",
"name": "ipv6"
},
{
"type": "ipAddress",
"name": "ipmix"
},
{
"type": "ipPrefix",
"name": "prefix"
}
]
},
"granularitySpec": {
"queryGranularity": "none",
"rollup": false,
"segmentGranularity": "hour"
}
}
}
}

Select and view IP address columns

IP address columns store the addresses in binary format. Convert them to string values to display a human-readable format, whether in dotted decimal form for IPv4 or colon-separated hexadecimal notation for IPv6.

Call IP_STRINGIFY with optional argument compact, which controls the formatting of the string representation of IPv6 addresses. compact defaults to true to return compressed IPv6 addresses, applying zero compression and removing leading zeroes. Set compact to false to return fully exapnded IPv6 addresses. IPv4 addresses are not affected by the compact parameter.

The following example displays the raw and string-converted values of ipmix and prefix.

Query:

SELECT ipmix,
prefix,
IP_STRINGIFY(ipmix) AS compactmix,
IP_STRINGIFY(ipmix, false) AS fullmix,
IP_STRINGIFY(prefix) AS compactprefix,
IP_STRINGIFY(prefix, false) AS fullprefix
FROM ip_addresses

Output:

ipmixprefixcompactmixfullmixcompactprefixfullprefix
AAAAAAAAAAAAAP//wAACBA==AAAAAAAAAAAAAP//wAACAHg=192.0.2.4192.0.2.4192.0.2.0/24192.0.2.0/24
IAENuAAzAEQAVQBmAHcAiA==AAAAAAAAAAAAAP//wAACAHg=2001:db8:33:44:55:66:77:882001:0db8:0033:0044:0055:0066:0077:0088192.0.2.0/24192.0.2.0/24
IAENuAAGAAUABQAGAAcACA==AAAAAAAAAAAAAP//wAACAYA=2001:db8:6:5:5:6:7:82001:0db8:0006:0005:0005:0006:0007:0008192.0.2.1/32192.0.2.1/32
AAAAAAAAAAAAAP//xjNkCw==AAAAAAAAAAAAAP//wAACAHg=198.51.100.11198.51.100.11192.0.2.0/24192.0.2.0/24
IAENuAAzAEQAVQBmAHcAiA==AAAAAAAAAAAAAP//wAACAYA=2001:db8:33:44:55:66:77:882001:0db8:0033:0044:0055:0066:0077:0088192.0.2.1/32192.0.2.1/32
nullAAAAAAAAAAAAAP//wAACAoA=nullnull192.0.2.2/32192.0.2.2/32
nullIAENuAAzAEQAAAAAAAAAAEA=nullnull2001:db8:33:44::/642001:0db8:0033:0044:0000:0000:0000:0000/64
AAAAAAAAAAAAAP//wAACBA==AAAAAAAAAAAAAP//wAACAoA=192.0.2.4192.0.2.4192.0.2.2/32192.0.2.2/32
AAAAAAAAAAAAAP//ywBxDA==IAENuAADAAQAAAAAAAAAAEA=203.0.113.12203.0.113.122001:db8:3:4::/642001:0db8:0003:0004:0000:0000:0000:0000/64
AAAAAAAAAAAAAP//wAACBQ==IAENuAAGAAUAAAAAAAAAAEA=192.0.2.5192.0.2.52001:db8:6:5::/642001:0db8:0006:0005:0000:0000:0000:0000/64

Count distinct values of an IP address column

Translate IP addresses into their string forms to perform operations such as grouping or distinct counting.

The following example calculates the unique values in ipv6.

Query:

SELECT COUNT(DISTINCT IP_STRINGIFY(ipv6)) AS approx_count
FROM ip_addresses

Output:

approx_count
6

Transform IP address and prefix types

Transform between complex IP address types and complex IP prefix types using IP_PREFIX and IP_HOST.

The following example generates a prefix for each ipv4 address and a host address for each prefix.

Query:

SELECT IP_STRINGIFY(IP_PREFIX(ipv4, 16)) as ipv4prefix,
IP_STRINGIFY(IP_HOST(prefix)) as host
FROM ip_addresses

Output:

ipv4prefixhost
192.0.0.0/16192.0.2.0
192.0.0.0/16192.0.2.0
192.0.0.0/16192.0.2.1
198.51.0.0/16192.0.2.0
203.0.0.0/16192.0.2.1
null192.0.2.2
192.0.0.0/162001:db8:33:44::
192.0.0.0/16192.0.2.2
198.51.0.0/162001:db8:3:4::
203.0.0.0/162001:db8:6:5::

Group IP addresses by prefix

Truncate IP addresses by a specified number of bits in the prefix with IP_PREFIX. Use the truncated addresses for grouping all addresses that fall within a certain range. Convert the results of IP_PREFIX to string type before use in other operations.

The following example generates a range for each ipv6 address and counts the number of ipv6 addresses within each range.

Query:

SELECT IP_STRINGIFY(IP_PREFIX(ipv6, 64))  AS prefix,
COUNT(DISTINCT IP_STRINGIFY(ipv6)) AS address_count
FROM ip_addresses
GROUP BY 1

Output:

prefixaddress_count
null1
2001:db8:33:44::/641
2001:db8:3:4::/642
2001:db8:6:5::/642

Match IP addresses to a subnet

Match an exact IP address to another address or CIDR range with IP_MATCH.

Match on IPv4

The following example lists all ipv4 addresses that are within the range 192.0.2.0/16.

SELECT __time,
IP_STRINGIFY(ipv4) AS stringv4
FROM ip_addresses
WHERE IP_MATCH(ipv4, '192.0.2.0/16')

Output:

__timestringv4
2014-10-20T00:00:00.000Z192.0.2.4
2014-10-20T00:00:00.000Z192.0.2.5
2014-10-20T00:00:00.000Z192.0.2.7
2014-10-20T01:00:00.000Z192.0.2.4
2014-10-20T01:00:00.000Z192.0.2.4

Match on IPv6

The following example lists all ipv6 addresses that are within the range 2001:db8:6::/48.

SELECT __time,
IP_STRINGIFY(ipv6) AS stringv6
FROM ip_addresses
WHERE IP_MATCH(ipv6, '2001:db8:6::/48')

Output:

__timestringv6
2014-10-20T00:00:00.000Z2001:db8:6:5:4:3:2:1
2014-10-20T00:00:00.000Z2001:db8:6:5:5:6:7:8
2014-10-20T01:00:00.000Z2001:db8:6:5:5:6:7:8

Search against partial IP addresses

Use IP_SEARCH to determine whether an exact or partial IP address falls within an IP address prefix.

The following example lists the prefix values that contain or are equal to a range of IP addresses starting with 2001:db8. In this example, 2001:db8 is a partial IP address string that is expanded into all possible IPv6 ranges that begin with 2001:db8.

Query:

SELECT __time,
IP_STRINGIFY(prefix) AS prefix
FROM ip_addresses
WHERE IP_SEARCH('2001:db8', prefix)

Output:

__timeprefix
2014-10-20T01:00:00.000Z2001:db8:33:44::/64
2014-10-20T01:00:00.000Z2001:db8:3:4::/64
2014-10-20T01:00:00.000Z2001:db8:6:5::/64

Known issues

  • Druid queries may throw a null pointer exception when there's unparseable data in the IP address complex type or the IP prefix complex type.
  • The Druid web console does not display tooltips with documentation for the IP address functions.
  • Druid displays the IP address complex type as Base64-encoded strings. Translate these strings to a human-readable format using the IP_STRINGIFY function described above.
  • Most Druid operations—including grouping, distinct count, and JOIN operations—require string-typed IP addresses. Call IP_STRINGIFY to convert an IP address data type to a string type.
  • The IPv4 address functions described in SQL scalar functions and Expressions do not support IP address columns.
  • Enhanced IP support has not been optimized for performance.