Tutorial: Querying data
After you have ingested data into Imply, you can query the data using Druid SQL.
The Quickstart takes you through the steps to get started with Imply, add sample data, and run a simple query against it. This tutorial builds upon the quickstart introduction by showing you how to build more complex queries in the Druid console and run them in Pivot.
Before starting, make sure you have access to a running Imply instance with data loaded. This tutorial uses the Wikipedia data set introduced in the Quickstart.
Query SQL from the Druid console
The Druid console serves as the data management interface for an Imply cluster. It's where you can load and manage data. While its primary purpose is data management, it also gives data administrators an interface for building data queries, as you'll explore in the following steps.
You'll build the query in two steps, starting simple and then extending the query.
Build a simple query
To build a simple query in Druid console, follow these steps:
Open the Druid console in your web browser. From Pivot, you can open the console from the Data tab by clicking the Load Data button. For on-prem, navigate to http://localhost:8888.
Click the Query button in the top menu to open the Query page:
In the Query page, you can you enter Druid SQL queries directly in the query editor. Instead, however, you'll build a query using query builder tools.
Expand the wikipedia node in the data sources tree.
Scroll down until you see the
page
node and click it to show the query menu.Click Show:page:
A SELECT query appears in the query editor and runs automatically. In this case, the query should return no data, since the query filters for data from the last day, but our data is considerably older than that. Let's remove the filter.
In the datasource tree, click
__time
and Remove Filter.Click Run to run the query.
You should now see two columns of data, a page name and the count:
The console limits the results to about a hundred by default because of the Smart query limit feature. This limit helps you avoid inadvertently overwhelming the system by preventing queries from returning an excessive amounts of data.
Extend the query
Now you'll edit the query directly, and explore a few more query building features of the editor:
Click in the query edit pane and add a line after the first column,
"page"
, and start typing the name of a new column,"countryName"
.Notice that the autocomplete menu suggests column names, functions, keywords, and more.
Choose "countryName" and add the new column to the GROUP BY clause as well, either by name or by reference to its position,
2
.For readability, replace the
Count
column name withEdits
, since theCOUNT()
function actually returns the number of edits for the page. Make the same column name change in the ORDER BY clause as well.The `COUNT()` function is one of many functions available for use in Druid SQL queries. You can mouse over a function name
in the autocomplete menu to see a brief description of a function. Also, you can find more information in the Druid
documentation. For example, the `COUNT()` function is documented in
[Aggregation functions](/latest/druid/querying/sql-aggregations).
The query should now be:SELECT
"page",
"countryName",
COUNT(*) AS "Edits"
FROM "wikipedia"
GROUP BY 1, 2
ORDER BY "Edits" DESCWhen you run the query again, notice that you're getting the new dimension,
countryName
, but for most of the rows, its value is null. Show only rows with acountryName
value as follows.Click the countryName dimension and choose the first filtering option,
countryName IS NOT NULL
. The following clause should be added to the query:WHERE "countryName" IS NOT NULL
Run the query again. You should now see the top edits by country:
Under the covers, every Druid SQL query is translated into a query in the JSON-based Druid native query format before it runs on data nodes. You can view the native query for this query by clicking
...
and Explain SQL Query.While you can use Druid SQL for most purposes, familiarity with native query is useful for composing complex queries and for troubleshooting performance issues. For more information, see Native queries.
> Another way to view the explain plan is by adding EXPLAIN PLAN FOR to the front of your query, as follows:
>
>```sql
>EXPLAIN PLAN FOR
>SELECT
> "page",
> "countryName",
> COUNT(*) AS "Edits"
>FROM "wikipedia"
>WHERE "countryName" IS NOT NULL
>GROUP BY 1, 2
>ORDER BY "Edits" DESC
>```
>This is particularly useful when running queries
from the command line or over HTTP.
- Finally, click
...
and Edit context to see how you can add additional parameters controlling the execution of the query execution. In the field, enter query context options as JSON key-value pairs, as described in Context flags.
That's it! you've built a simple query using some of the query builder features built into the Druid Console. The following sections provide a few more example queries you can try. Also, see Other ways to invoke SQL queries to learn how to run Druid SQL from the command line or over HTTP.
Other ways to run SQL queries
The following sections describe a few other ways you can run queries with Imply.
Querying in Pivot
Any query you build in the Druid Console can be run in Pivot as well. The SQL page lets you build and run Druid SQL queries against data sources:
From there, you can download results and check query execution time, and save a named query (as shown above), making it available to other Pivot users.
Querying over HTTP
You can also submit queries directly to the Druid Broker over HTTP.
Given a file quickstart/tutorial/wikipedia-top-pages-sql.json
with the query above, run the query as follows:
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages-sql.json http://localhost:8888/druid/v2/sql
The results are:
[
{
"page": "Wikipedia:Vandalismusmeldung",
"Edits": 33
},
{
"page": "User:Cyde/List of candidates for speedy deletion/Subpage",
"Edits": 28
},
{
"page": "Jeremy Corbyn",
"Edits": 27
},
{
"page": "Wikipedia:Administrators' noticeboard/Incidents",
"Edits": 21
},
{
"page": "Flavia Pennetta",
"Edits": 20
},
{
"page": "Total Drama Presents: The Ridonculous Race",
"Edits": 18
},
{
"page": "User talk:Dudeperson176123",
"Edits": 18
},
{
"page": "Wikipédia:Le Bistro/12 septembre 2015",
"Edits": 18
},
{
"page": "Wikipedia:In the news/Candidates",
"Edits": 17
},
{
"page": "Wikipedia:Requests for page protection",
"Edits": 17
}
]
More information
See the Druid SQL documentation for more information on using Druid SQL queries.