Enhanced IP support
IPv6 support is a beta feature. For more information, see Preview features.
Enhanced IP support in Imply's distribution of Apache Druid® includes support for Internet Protocol version 6 (IPv6) in addition to Internet Protocol version 4 (IPv4). You can use the enhanced IP support feature to store a dimension as a complex data type representing an IP address or an IP prefix. Druid queries that use the functions in this topic require IP addresses and prefixes to be stored as a complex type.
Setup and ingestion
Support for IPv6 is disabled by default.
To enable this feature, load the imply-utility-belt
extension in the common.runtime.properties
, for example:
druid.extensions.loadList=["imply-utility-belt"]
To configure a dimension as a complex IP data type, include a dimension object in the dimensionsSpec
property of your ingestion spec.
Specify one of the following complex IP data types:
ipAddress
: Stores an IP address in a 128-bit binary format.ipPrefix
: Stores an IP prefix in a 136-bit binary format, comprised of a 128-bit address and an 8-bit prefix value.
For example, the following snippet of the ingestion spec instructs Druid to ingest the following:
page
andlanguage
as string-typed dimensionsremote_address
as a complex IP address dimensionnetwork_prefix
as a complex IP prefix dimension
"dimensionsSpec" : {
"dimensions": [
"page",
"language",
{"type": "ipAddress", "name": "remote_address"},
{"type": "ipPrefix", "name": "network_prefix"}
]
}
See Ingestion spec reference for more information on configuring dimensions during ingestion.
Querying on IP addresses
Apply these functions in Druid SQL or native queries.
Convert between string and complex types
The following functions translate between strings and complex IP data types in Druid.
Function | Notes |
---|---|
IP_PARSE(expr) | Parses the string expr into an IP address complex type. Druid doesn't retain any prefix length specified in expr . Raises ParseException if the input string is not a valid IP address. |
IP_TRY_PARSE(expr) | Parses the string expr into an IP address complex type. Druid doesn't retain any prefix length specified in expr . Returns null if the input string is not a valid IP address. |
IP_PREFIX_PARSE(expr) | Parses the string expr into an IP prefix complex type. Raises ParseException if the input string is not a valid IP prefix. If you don't specify a prefix length, Druid assigns the maximum prefix length—32 for IPv4 addresses and 128 for IPv6 addresses. |
IP_PREFIX_TRY_PARSE(expr) | Parses the string expr into an IP prefix complex type. Returns null if the input string is not a valid IP prefix. If you don't specify a prefix length, Druid assigns the maximum prefix length—32 for IPv4 addresses and 128 for IPv6 addresses. |
IP_STRINGIFY(expr, [compact]) | Converts a complex IP data type expr to a string. expr may be an IP address complex type or an IP prefix complex type. The optional parameter compact defaults to true , providing a compact form of the IP address with zero compression applied. Specify false for compact to obtain the full, expanded IP address. IPv4 addresses are not affected by compact . |
Transform IP addresses and prefixes
The following functions transform between complex IP addresses and complex IP prefixes.
Function | Notes |
---|---|
IP_PREFIX(expr, prefixLength) | Given an IP address expr , creates a new IP prefix complex type. The address portion of the complex IP prefix consists of prefixLength number of bits of the original address. The prefix value portion of the complex IP prefix contains the value prefixLength . |
IP_HOST(expr) | Truncates an IP prefix complex type expr . Druid removes the prefix and returns the corresponding host IP address in the form of an IP address complex data type. |
Analyze IP addresses
The following functions calculate whether a specified IP address or IP prefix is within a given IP address range.
Function | Notes |
---|---|
IP_MATCH(expr1, expr2) | Checks whether the IP address expr1 is equal to or located within the IP address or range expr2 .expr1 may be a string or complex IP address type. expr2 may be a complex IP prefix, a string IP address, or a string CIDR range. Exactly one of the two parameters must be a complex IP data type. |
IP_SEARCH(expr1, expr2) | Checks whether the IP address expr1 is equal to or located within the IP range expr2 .expr1 may be a partial IP address string or an IP address complex type. expr2 may be a partial IP address string or IP prefix complex type. Exactly one of the two parameters must be a complex IP data type.For IP address strings, specify IPv4 addresses in dotted decimal notation without leading zeroes. Specify IPv6 addresses in hexadecimal notation. Leading zeroes are accepted for IPv6 addresses. Druid expands any partial IP address into all possible IP addresses that starts with the incomplete string. If the partial IP address string contains . , it is only expanded into the IPv4 range. If the partial IP string contains : , it is only expanded into the IPv6 range. If the partial IP address string contains neither . nor : , then it is expanded into all possible IPv4 and IPv6 ranges. |
The difference between IP_MATCH
and IP_SEARCH
lies in the use of partial IP addresses in IP_SEARCH
.
IP_MATCH
finds exact matches to a reference IP address and returns an error when supplied with a partial IP address.
IP_SEARCH
expands the IP address string argument unless the string ends with one of the following patterns:
/
/x
for integer x::
To match the exact IP address of 0:0:0:0:0:0:0:1
against possible values expr2
, the following three queries are equivalent:
IP_MATCH('0:0:0:0:0:0:0:1', expr2)
IP_SEARCH('0:0:0:0:0:0:0:1/', expr2)
IP_SEARCH('0:0:0:0:0:0:0:1/128', expr2)
In IP_SEARCH
, Druid expands the IP address string into all possible IP addresses starting with the provided string.
The following examples demonstrate how Druid expands a partial IP address string:
- The string
0:0:0:0:0:0:0:1
is expanded to signify any address from0:0:0:0:0:0:0:1000
to0:0:0:0:0:0:0:1fff
—a total of 4096 possible addresses with 16 possible hexadecimal values for each of the last three fields. - The string
2001:4d98:bffb::1
is expanded to signify any address from2001:4d98:bffb::1000
to2001:4d98:bffb::1fff
. Druid represents::
in an IPv6 address with the maximum series of zeroes possible. - The string
eeee:1122:0
is expanded with0
fixed as the first value in the third group, resulting in addresses fromeeee:1122:0000:0000:0000:0000:0000:0000
toeeee:1122:0fff:ffff:ffff:ffff:ffff:ffff
.
Examples
The examples in this section use the following table that contains a mix of IPv4 addresses, IPv6 addresses, and null values:
time | ipv4 | ipv6 | ipmix | prefix |
---|---|---|---|---|
2014102000 | 192.0.2.4 | 2001:db8:3:4:5:6:7:8 | 192.0.2.4 | 192.0.2.0/24 |
2014102000 | 192.0.2.5 | 2001:db8:33:44:55:66:77:88 | 2001:db8:33:44:55:66:77:88 | 192.0.2.0/24 |
2014102000 | 198.51.100.3 | 2001:db8:3:4:50:60:70:80 | 198.51.100.11 | 192.0.2.0/24 |
2014102000 | 192.0.2.7 | 2001:db8:6:5:4:3:2:1 | 2001:db8:6:5:5:6:7:8 | 192.0.2.1/32 |
2014102000 | 203.0.113.5 | 2001:db8:6:5:5:6:7:8 | 2001:db8:33:44:55:66:77:88 | 192.0.2.1/32 |
2014102001 | 192.0.2.4 | 2001:db8:3:4:5:6:7:8 | 192.0.2.4 | 192.0.2.2/32 |
2014102001 | 2001:db8:33:44:55:66:77:88 | 192.0.2.2/32 | ||
2014102001 | 198.51.100.2 | 2001:db8:3:4:50:60:70:80 | 203.0.113.12 | 2001:db8:3:4::/64 |
2014102001 | 192.0.2.4 | 2001:db8:33:44::/64 | ||
2014102001 | 203.0.113.4 | 2001:db8:6:5:5:6:7:8 | 192.0.2.5 | 2001:db8:6:5::/64 |
Ingest data with IP address types
The following ingestion spec loads the example data and specifies the columns ipv4
, ipv6
, and ipmix
as IP address types. prefix
is ingested as an IP prefix complex type.
{
"type": "index_parallel",
"spec": {
"ioConfig": {
"type": "index_parallel",
"inputSource": {
"type": "inline",
"data": "2014102000\t192.0.2.4\t2001:db8:3:4:5:6:7:8\t192.0.2.4\t192.0.2.0/24\n2014102000\t192.0.2.5\t2001:db8:33:44:55:66:77:88\t2001:db8:33:44:55:66:77:88\t192.0.2.0/24\n2014102000\t198.51.100.3\t2001:db8:3:4:50:60:70:80\t198.51.100.11\t192.0.2.0/24\n2014102000\t192.0.2.7\t2001:db8:6:5:4:3:2:1\t2001:db8:6:5:5:6:7:8\t192.0.2.1/32\n2014102000\t203.0.113.5\t2001:db8:6:5:5:6:7:8\t2001:db8:33:44:55:66:77:88\t192.0.2.1/32\n2014102001\t192.0.2.4\t2001:db8:3:4:5:6:7:8\t192.0.2.4\t192.0.2.2/32\n2014102001\t\t2001:db8:33:44:55:66:77:88\t\t192.0.2.2/32\n2014102001\t198.51.100.2\t2001:db8:3:4:50:60:70:80\t203.0.113.12\t2001:db8:3:4::/64\n2014102001\t192.0.2.4\t\t\t2001:db8:33:44::/64\n2014102001\t203.0.113.4\t2001:db8:6:5:5:6:7:8\t192.0.2.5\t2001:db8:6:5::/64"
},
"inputFormat": {
"type": "tsv",
"findColumnsFromHeader": false,
"columns": [
"time",
"ipv4",
"ipv6",
"ipmix",
"prefix"
]
}
},
"tuningConfig": {
"type": "index_parallel",
"partitionsSpec": {
"type": "dynamic"
}
},
"dataSchema": {
"dataSource": "ip_addresses",
"timestampSpec": {
"column": "time",
"format": "yyyyMMddHH"
},
"transformSpec": {},
"dimensionsSpec": {
"dimensions": [
{
"type": "ipAddress",
"name": "ipv4"
},
{
"type": "ipAddress",
"name": "ipv6"
},
{
"type": "ipAddress",
"name": "ipmix"
},
{
"type": "ipPrefix",
"name": "prefix"
}
]
},
"granularitySpec": {
"queryGranularity": "none",
"rollup": false,
"segmentGranularity": "hour"
}
}
}
}
Select and view IP address columns
IP address columns store the addresses in binary format. Convert them to string values to display a human-readable format, whether in dotted decimal form for IPv4 or colon-separated hexadecimal notation for IPv6.
Call IP_STRINGIFY
with optional argument compact
, which controls the formatting of the string representation of IPv6 addresses.
compact
defaults to true to return compressed IPv6 addresses, applying zero compression and removing leading zeroes.
Set compact
to false to return fully exapnded IPv6 addresses.
IPv4 addresses are not affected by the compact
parameter.
The following example displays the raw and string-converted values of ipmix
and prefix
.
Query:
SELECT ipmix,
prefix,
IP_STRINGIFY(ipmix) AS compactmix,
IP_STRINGIFY(ipmix, false) AS fullmix,
IP_STRINGIFY(prefix) AS compactprefix,
IP_STRINGIFY(prefix, false) AS fullprefix
FROM ip_addresses
Output:
ipmix | prefix | compactmix | fullmix | compactprefix | fullprefix |
---|---|---|---|---|---|
AAAAAAAAAAAAAP//wAACBA== | AAAAAAAAAAAAAP//wAACAHg= | 192.0.2.4 | 192.0.2.4 | 192.0.2.0/24 | 192.0.2.0/24 |
IAENuAAzAEQAVQBmAHcAiA== | AAAAAAAAAAAAAP//wAACAHg= | 2001:db8:33:44:55:66:77:88 | 2001:0db8:0033:0044:0055:0066:0077:0088 | 192.0.2.0/24 | 192.0.2.0/24 |
IAENuAAGAAUABQAGAAcACA== | AAAAAAAAAAAAAP//wAACAYA= | 2001:db8:6:5:5:6:7:8 | 2001:0db8:0006:0005:0005:0006:0007:0008 | 192.0.2.1/32 | 192.0.2.1/32 |
AAAAAAAAAAAAAP//xjNkCw== | AAAAAAAAAAAAAP//wAACAHg= | 198.51.100.11 | 198.51.100.11 | 192.0.2.0/24 | 192.0.2.0/24 |
IAENuAAzAEQAVQBmAHcAiA== | AAAAAAAAAAAAAP//wAACAYA= | 2001:db8:33:44:55:66:77:88 | 2001:0db8:0033:0044:0055:0066:0077:0088 | 192.0.2.1/32 | 192.0.2.1/32 |
null | AAAAAAAAAAAAAP//wAACAoA= | null | null | 192.0.2.2/32 | 192.0.2.2/32 |
null | IAENuAAzAEQAAAAAAAAAAEA= | null | null | 2001:db8:33:44::/64 | 2001:0db8:0033:0044:0000:0000:0000:0000/64 |
AAAAAAAAAAAAAP//wAACBA== | AAAAAAAAAAAAAP//wAACAoA= | 192.0.2.4 | 192.0.2.4 | 192.0.2.2/32 | 192.0.2.2/32 |
AAAAAAAAAAAAAP//ywBxDA== | IAENuAADAAQAAAAAAAAAAEA= | 203.0.113.12 | 203.0.113.12 | 2001:db8:3:4::/64 | 2001:0db8:0003:0004:0000:0000:0000:0000/64 |
AAAAAAAAAAAAAP//wAACBQ== | IAENuAAGAAUAAAAAAAAAAEA= | 192.0.2.5 | 192.0.2.5 | 2001:db8:6:5::/64 | 2001:0db8:0006:0005:0000:0000:0000:0000/64 |
Count distinct values of an IP address column
Translate IP addresses into their string forms to perform operations such as grouping or distinct counting.
The following example calculates the unique values in ipv6
.
Query:
SELECT COUNT(DISTINCT IP_STRINGIFY(ipv6)) AS approx_count
FROM ip_addresses
Output:
approx_count |
---|
6 |
Transform IP address and prefix types
Transform between complex IP address types and complex IP prefix types using IP_PREFIX
and IP_HOST
.
The following example generates a prefix for each ipv4
address and a host address for each prefix
.
Query:
SELECT IP_STRINGIFY(IP_PREFIX(ipv4, 16)) as ipv4prefix,
IP_STRINGIFY(IP_HOST(prefix)) as host
FROM ip_addresses
Output:
ipv4prefix | host |
---|---|
192.0.0.0/16 | 192.0.2.0 |
192.0.0.0/16 | 192.0.2.0 |
192.0.0.0/16 | 192.0.2.1 |
198.51.0.0/16 | 192.0.2.0 |
203.0.0.0/16 | 192.0.2.1 |
null | 192.0.2.2 |
192.0.0.0/16 | 2001:db8:33:44:: |
192.0.0.0/16 | 192.0.2.2 |
198.51.0.0/16 | 2001:db8:3:4:: |
203.0.0.0/16 | 2001:db8:6:5:: |
Group IP addresses by prefix
Truncate IP addresses by a specified number of bits in the prefix with IP_PREFIX
.
Use the truncated addresses for grouping all addresses that fall within a certain range.
Convert the results of IP_PREFIX
to string type before use in other operations.
The following example generates a range for each ipv6
address and counts the number of ipv6
addresses within each range.
Query:
SELECT IP_STRINGIFY(IP_PREFIX(ipv6, 64)) AS prefix,
COUNT(DISTINCT IP_STRINGIFY(ipv6)) AS address_count
FROM ip_addresses
GROUP BY 1
Output:
prefix | address_count |
---|---|
null | 1 |
2001:db8:33:44::/64 | 1 |
2001:db8:3:4::/64 | 2 |
2001:db8:6:5::/64 | 2 |
Match IP addresses to a subnet
Match an exact IP address to another address or CIDR range with IP_MATCH
.
Match on IPv4
The following example lists all ipv4
addresses that are within the range 192.0.2.0/16.
SELECT __time,
IP_STRINGIFY(ipv4) AS stringv4
FROM ip_addresses
WHERE IP_MATCH(ipv4, '192.0.2.0/16')
Output:
__time | stringv4 |
---|---|
2014-10-20T00:00:00.000Z | 192.0.2.4 |
2014-10-20T00:00:00.000Z | 192.0.2.5 |
2014-10-20T00:00:00.000Z | 192.0.2.7 |
2014-10-20T01:00:00.000Z | 192.0.2.4 |
2014-10-20T01:00:00.000Z | 192.0.2.4 |
Match on IPv6
The following example lists all ipv6
addresses that are within the range 2001:db8:6::/48.
SELECT __time,
IP_STRINGIFY(ipv6) AS stringv6
FROM ip_addresses
WHERE IP_MATCH(ipv6, '2001:db8:6::/48')
Output:
__time | stringv6 |
---|---|
2014-10-20T00:00:00.000Z | 2001:db8:6:5:4:3:2:1 |
2014-10-20T00:00:00.000Z | 2001:db8:6:5:5:6:7:8 |
2014-10-20T01:00:00.000Z | 2001:db8:6:5:5:6:7:8 |
Search against partial IP addresses
Use IP_SEARCH
to determine whether an exact or partial IP address falls within an IP address prefix.
The following example lists the prefix
values that contain or are equal to a range of IP addresses starting with 2001:db8.
In this example, 2001:db8 is a partial IP address string that is expanded into all possible IPv6 ranges that begin with 2001:db8
.
Query:
SELECT __time,
IP_STRINGIFY(prefix) AS prefix
FROM ip_addresses
WHERE IP_SEARCH('2001:db8', prefix)
Output:
__time | prefix |
---|---|
2014-10-20T01:00:00.000Z | 2001:db8:33:44::/64 |
2014-10-20T01:00:00.000Z | 2001:db8:3:4::/64 |
2014-10-20T01:00:00.000Z | 2001:db8:6:5::/64 |
Known issues
- Druid queries may throw a null pointer exception when there's unparseable data in the IP address complex type or the IP prefix complex type.
- The Druid web console does not display tooltips with documentation for the IP address functions.
- Druid displays the IP address complex type as Base64-encoded strings. Translate these strings to a human-readable format using the
IP_STRINGIFY
function described above. - Most Druid operations—including grouping, distinct count, and JOIN operations—require string-typed IP addresses. Call
IP_STRINGIFY
to convert an IP address data type to a string type. - The IPv4 address functions described in SQL scalar functions and Expressions do not support IP address columns.
- Enhanced IP support has not been optimized for performance.