In this getting started guide, we'll spin up an Imply cluster, load some example data, and visualize it.
Please note that the configurations used for this quickstart are tuned to be light on resource usage and are not meant for load testing large production data sets.
After you log into Imply cloud, you'll be presented with the main menu, select "Manager" from the list of options. You'll be taken to the Clusters view.
In this view, click on the "New cluster" button in the top right hand corner.
Choose a name for your cluster, and use the default values for the version and the instance role.
Let's spin up a basic cluster that uses one data server (used for storing and aggregating data), one query server (used for merging partial results from data servers), and one master server (used for cluster coordination). We will only use t2.small instances.
The cluster we are creating in this quickstart is not highly available. A highly available cluster requires, at a minimum, 2 data servers, 2 query servers, and 3 master servers.
Click "Create cluster" to launch a cluster in your AWS VPC.
It may take up to 30 minutes before the cluster is available. Once a cluster is in a "RUNNING" state, select it to begin loading data.
We've included a sample of Wikipedia edits from Sept 12, 2015 to get you started with loading data files via batch ingestion.
Click on "Wikipedia Edits".
Imply's data loader wizard will guide you through the steps to load our sample data. On the first screen, you will input the cluster to load data onto, the name of the data source (database table name), the data location, and the data format.
In our example, you can use the default values and enter "wikipedia" as the datasource name.
The next screen shows you a sample of the data.
Click "Yes, this is the data I wanted" to continue.
The next screen shows you various configuration options for loading your data.
Let's use the defaults and continue by clicking "Next".
Next, we will to tell Imply what our time column is in our data, the format of our time column, and how partition the data.
The defaults are fine and click "Next" to continue.
Next, we need to configure the data schema by setting the type of each column.
We can once again use the defaults and click "Next".
The final summary screen displays all the options we've set so far. Click "Start loading data" to begin loading data.
On the next screen, click on "Ingestion tasks" to view the status of data load.
Once the "STATUS" field changes to "Success", our ingest task is finished.
After your ingestion task finishes, the data will be available for querying within a minute or two.
This section showed you how to load data from files, but Druid also supports streaming ingestion. Druid's streaming ingestion can load data with virtually no delay between events occurring and being available for queries.
We've included several different ways you can interact with the data you've just ingested.
Imply includes a web-based, exploratory, visualization interface for Druid.
With the data cube view, you explore a dataset by filtering and splitting it across any dimension. For each filtered split of your data, Imply shows you the aggregate value of any of your measures. For example, on the wikipedia dataset, you can see the most frequently edited pages by splitting on "page" (drag "Page" to the "Show" bar) and sorting by "Edits" (this is the default sort; you can also click on any column to sort by it).
The data cube view offers different visualizations based on how you split your data. If you split on a string column, you will generally see a table. If you split on time, you can see either a timeseries plot or a table.
For example try dragging the
Time dimension into the visualization.
You can fine tune the available dimensions in the data cube view by editing the data cube.
The SQL interface allows you to run Druid SQL queries and download or iterate on the results.
Try running the query:
SELECT page, SUM("count") AS Edits FROM wikipedia WHERE TIMESTAMP '2016-06-27 00:00:00' <= __time AND __time < TIMESTAMP '2016-06-28 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 5
See the Druid SQL documentation for more details about making SQL queries with Druid.
So far, you've loaded a sample data file into an Imply installation running on a single machine. Next, you can: