This guide introduces you to Imply. Learn to install Imply, load and query sample data, and create visualizations.
There are multiple ways for trying out Imply. This quickstart covers two of them: Imply Enterprise Hybrid on AWS (formerly Imply Cloud) and unmanaged Imply Enterprise (formerly Imply Private).
Imply Hybrid is a managed service that deploys and manages scalable Imply clusters for you in your AWS account. To try Imply Hybrid, sign up for an Imply Enterprise Hybrid (AWS) Free Trial.
Unmanaged Imply Enterprise (formerly Imply Private) runs on a single machine using a quickstart configuration. This method is not managed by the Imply Manager. If you want to have a quickstart installation that uses Imply Manager, see the Kubernetes quickstart.
Don't use a quickstart instance for production. For information about installing Imply for production, see Production ready installation instructions.
Set up Imply
Decide on the method to try out Imply:
Regardless of which method you choose, your cluster must be able to reach
static.imply.io, where the sample data for the quickstart is hosted.
Use Imply Hybrid
To use Imply Hybrid, make sure you meet the following requirements:
- You have a free Imply Hybrid account.
- You have access to an AWS environment, specifically a VPC that you can deploy EC2 instances to.
Start Imply Hybrid
To create a new Imply Hybrid cluster, perform the following steps:
Show the steps
Log into Imply Hybrid. You start at the Clusters view.
Click New cluster.
Choose a name for your cluster, and use the default values for the remainder of the settings.
Click Create cluster to launch a cluster in your AWS VPC. Note that clusters can take 20-30 minutes to launch.
Although the Imply Hybrid quickstart is not meant for production use, you can configure it to be Highly Available (HA) to see how Imply Hybrid performs in HA scenarios. A minimal HA instance of the Imply Hybrid trial has the following hardware profile:
- Three m5.large instances for master servers
- Two c5.large instances for query servers
- Three i3.xlarge instances for data servers
Congratulations! Now it's time to load data.
Use unmanaged Imply Enterprise
This section describes how to install and start Imply on a single machine using the quickstart configuration.
To run a single-machine Imply instance with the quickstart configuration, make sure you can meet the following requirements:
- Java 8 (8u92 or higher). Imply builds and certifies its releases using OpenJDK. Select a distribution that provides long-term support and open-source licensing, such as Amazon Corretto or Azul Zulu.
- Linux, Mac OS X, or other Unix-like OS. Windows and ARM-based CPUs are not supported. For more information, see Running on a VM.
- At least 4 GB of RAM
Run the quickstart on a VM
Show the VM information
The quickstart isn't supported for Windows machines or ARM-based CPUs like Apple M1. If you want to run the quickstart in one of these environments, use a Virtual Machine (VM) that runs a supported OS and CPU architecture, such as an Ubuntu VM on EC2.
When using a VM, consider the following additional requirements:
- You need a way to transfer the Imply package to the VM, such as with
- The Imply UI, known as Pivot, uses port 9095 by default. Make sure that port is accessible.
Download Imply Enterprise
Show how to download Imply
A new, unlicensed Imply Enterprise installation comes with a free 30-day trial period.
Sign up and download Imply 2023.03.1 from imply.io/get-started.
Unpack the release archive:
tar -xzf imply-2023.03.1.tar.gz
Note the version number in the command. You may have to adjust it for your version of the download.
The package contains the following files:
bin/*- run scripts for included software
conf/*- template configurations for a clustered setup
conf-quickstart/*- configurations for this quickstart
dist/*- all included software
quickstart/*- files useful for this quickstart
If you have a license from Imply, apply it by adding the path to the license file to
conf-quickstart/pivot/config.yaml as the
licenseFile property, as follows:
Start Imply Enterprise
Next, start the Imply services, which include Druid , Pivot, and ZooKeeper.
supervise script starts Imply and other required services with a single command.
Show how to start Imply Enterprise
Go to the download you unpacked previously:
Run the start up script:
bin/supervise -c conf/supervise/quickstart.conf
If you encounter the error
/usr/bin/env: 'python' not found, create a symbolic link to point
/usr/bin/pythonto your Python installation. For example:
sudo ln -s /usr/bin/python3 /usr/bin/python
Then, run the script again.
Imply logs a message for each service that starts up. You can view detailed logs for any service in the
var/sv/directory using another terminal.
Optionally, verify that
var/sv/pivot/currentshows your license if you applied one.
To stop the Imply instance and its related services, interrupt the supervise program in your terminal with SIGINT (
control + c).
For a clean start after stopping the services, remove the
var/ directory before running the
supervise script again.
Now that Imply is running, load data.
Load a data file
This section walks you through loading data from an HTTP(s) source (
static.imply.io) using the Druid data loader. The sample data represents Wikipedia edits from June 27, 2016.
To access Pivot, the UI for Imply:
- Imply Hybrid: Click the Open button from the cluster list or cluster overview page.
- Imply Enterprise: Go to http://localhost:9095
If you get a connection refused error, your Imply cluster may not be ready yet. Wait a few seconds and refresh the page.
Open the Druid console data loader by clicking Load data.
The Druid Console, which is part of the Imply stack, lets you ingest data from static and streaming sources:
Note the row preceding the datasource icons. This row shows you what step you're on.
Select HTTP(s). This option allows you to load data from an online source like the sample data in this quickstart.
On the right side of the screen, select Connect data .
https://static.imply.io/data/wikipedia.json.gzand apply the change.
When you see data appear on the left, click Next: Parse data.
Review the data. The data loader automatically detects the parser type for the data and presents a preview of the parsed output. In this case, it suggests the
jsonparser based on dataset.
When ready, click Next: Parse time to continue.
Configure the time column parsing. Druid uses a timestamp column to partition data. On this page, identify which column should be used as the primary time column and how to format the timestamp.
In this case, the loader automatically detects the
timestampcolumn and chooses the
Click Next: Transform to continue.
Click next for the transform, filter, and configure schema steps until you reach the partition settings. The quickstart uses the defaults for transform and filters:
- Transform: modify columns at ingestion time and create new derived columns.
- Filters: exclude unwanted columns from ingested data.
- Configure the partition:
Primary partitioning (by time) > Segment granularity: Choose day for the granularity of the time intervals. Primary partitioning is always time based.
Secondary partitioning > Partitioning type: Choose dynamic, which configures secondary partitioning based on the number of rows in a segment.
For more information about partitioning in Imply and Druid, see partitionsSpec.
Click Next: Tune.
Use the defaults for the tune and publish steps. Click Next until you're on the Edit spec page.
The publish step is where you specify Datasource name. This name is used when managing or querying data like when you create a data cube.
- Review the final spec. The last page of the data loader provides an overview of the ingestion spec (in JSON) that gets submitted. Advanced users can manually adjust the spec to configure functionality not available through the data loader.
- When you are ready, click Submit to begin the ingestion.
Wait for the data to finish loading. The console opens the task screen where you can see your task run.
Wait for the task status to change to SUCCESS. You may need to scroll to the right to see the Status column.
After the task succeeds, you can start to define a data cube and visualize your data.
Create a data cube
Pivot's data cubes are highly configurable and give you the flexibility to represent your dataset, as well as derived and custom columns, in many different ways.
The documentation on dimensions and measures is a good starting point for learning how to configure a data cube.
Create a data cube from the Wikipedia data you ingested:
Go back to Pivot and make sure that your newly ingested datasource appears in the list. It might take a few seconds for it to show up.
Go to Visuals tab. From here, you can create data cubes to model your data, explore these cubes, and organize views into dashboards.
Create a new data cube:
- Pivot SQL: leave it selected
- Source: select wikipedia, which is the datasource name from the publish step in the Druid data loader.
- Auto-fill dimensions and measures: leave it selected to allow Imply to intelligently inspect the columns in your data source and determine possible dimensions and measures automatically.
Explore the Edit data cube pages.
From here, you can configure the aspects of your data cube, including defining and customizing the cube's dimensions and measures.
Click Save when you're ready.
Visualize a data cube
There is a 2.0 view and a Classic view for data cubes. This section uses the 2.0 view.
The data cube view for a new data cube automatically loads when you save cube. You can also view existing data cubes on the Visuals page.
With a data cube, you can explore a dataset by filtering and splitting it across any dimension. For each filtered split of your data, you can see the aggregate value of your selected measures.
On the Wikipedia dataset for the quickstart, you can see the most frequently edited pages by splitting on Page. Drag Page to the Show bar and keep the default sort (by Number of Events):
The data cube view suggests different visualizations based on how you split data. You can change the visualization manually by choosing your preferred visualization from the dropdown. If the shown dimensions are not appropriate for a particular visualization, the data cube view recommends alternative dimensions.
For more information on visualizing data, refer to Data cubes.
Imply includes an interface for issuing Druid SQL queries. To access the SQL editor, go to the SQL page from the Pivot home page. Once there, try running the following query that returns the most edited Wikipedia pages:
SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2016-06-27 00:00:00' AND TIMESTAMP '2016-06-28 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 5
You should see results like the following:
For more details on making SQL queries with Druid, see the Druid SQL documentation.
Congratulations! You deployed an Imply cluster, loaded a sample dataset, defined a data cube, explored some simple visualizations, and executed a query using Druid SQL.
Next, you can:
- Configure a data cube to customize dimensions and measures for your data cube.
- Create a dashboard with your favorite views and share it.
- Read more about supported query methods, including visualization or SQL.
Production-ready installation instructions
The configuration described in this quickstart is intended for exploring and learning about Imply. It's not meant for production workloads.
For information about using Imply in production, start by reviewing the Deployment overview to familiarize yourself with the options.
For information about installing Imply in a specific environment, refer to the following guides:
- For a distributed cluster that uses Kubernetes as the orchestration layer, see:
- For a distributed cluster environment without Kubernetes, see Install Imply without Kubernetes.