Quickstart

In this quickstart, we'll spin up an Imply cluster, load some example data, and visualize the data.

Prerequisites

You will need an Imply account for https://cloud.imply.io/. Sign up for a free account if you do not have one.

Please note that the configurations used for this quickstart are tuned to be light on resource usage and are not meant for load testing large production data sets.

Launch a cluster

After you log into Imply cloud, you'll be presented with the Main Menu:

Main Menu

Select "Manager" from the list of options. You'll be taken to the Clusters view.

Clusters View

In this view, click on the "New cluster" button in the top right hand corner.

New Cluster

Choose a name for your cluster, and use the default values for the version and the instance role.

Let's spin up a basic cluster that uses one data server (used for storaging and aggregating data), one query server (used for merging partial results from data servers), and one master server (used for cluster coordination). We will only use t2.small instances.

The cluster we are creating in this quickstart is not highly available. A highly available cluster requires, at a minimum, 2 data servers, 2 query servers, and 3 master servers.

Click "Create cluster" to launch a cluster in your AWS VPC.

Load data file

It may take up to 30 minutes before the cluster is available. Once a cluster is in a "RUNNING" state, select it to begin loading data.

Load Data

We've included a sample of Wikipedia edits from Sept 12, 2015 to get you started with loading data files via batch ingestion.

Load Batch Wikipedia

Click on "Wikipedia Edits".

Imply's data loader wizard will guide you through the steps to load our sample data. On the first screen, you will input the cluster to load data onto, the name of the data source (database table name), the data location, and the data format.

In our example, you can use the default values and enter "Wikipedia" as the datasource name.

Load Batch Wikipedia

The next screen shows you a sample of the data.

Load Batch Wikipedia

Click "Yes, this is the data I wanted" to continue.

The next screen shows you various configuration options for loading your data.

Load Batch Wikipedia

Let's use the defaults and continue by clicking "Next".

Next, we will to tell Imply what our time column is in our data, the format of our time column, and how partition the data.

Load Batch Wikipedia

The defaults are fine and click "Next" to continue.

Next, we need to configure the data schema by setting the type of each column.

Load Batch Wikipedia

We can once again use the defaults and click "Next".

The final summary screen displays all the options we've set so far. Click "Start loading data" to begin loading data.

Load Batch Wikipedia

On the next screen, click on "Ingestion tasks" to view the status of data load.

Load Batch Wikipedia

Once the "STATUS" field changes to "Success", our ingest task is finished.

After your ingestion task finishes, the data will be available for querying within a minute or two.

This section showed you how to load data from files, but Druid also supports streaming ingestion. Druid's streaming ingestion can load data with virtually no delay between events occurring and being available for queries.

Query data

We've included several different ways you can interact with the data you've just ingested.

Pivot

Pivot is a web-based exploratory visualization UI for Druid built on top of Plywood.

home view

With Pivot, you explore a dataset by filtering and splitting it across any dimension. For each filtered split of your data, Pivot can show you the aggregate value of any of your measures. For example, on the wikiticker dataset, you can see the most frequently edited pages by splitting on "page" (drag "Page" to the "Split" bar) and sorting by "Edits" (this is the default sort; you can also click on any column to sort by it).

Pivot offers different visualizations based on how you split your data. If you split on a string column, you will generally see a table. If you split on time, you can see either a timeseries plot or a table.

For example try dragging the Time dimension into the visualization.

cube view

You can fine tune the available dimensions in Pivot by editing the data cube.

SQL

The SQL interface allows you to run DruidSQL queries and download or iterate on the results.

sql

Try running the query:

SELECT page, SUM("count") AS Edits
FROM wikiticker
WHERE TIMESTAMP '2016-06-27 00:00:00' <= __time AND __time < TIMESTAMP '2016-06-28 00:00:00'
GROUP BY page
ORDER BY Edits
DESC LIMIT 5

See the Druid SQL documentation for more details about making SQL queries with Druid.

Next steps

So far, you've loaded a sample data file into an Imply installation running on a single machine. Next, you can:

Cloud