Skip to main content

Imply quickstart

This quickstart only covers unmanaged Imply Enterprise. Learn to install, load, query sample data, and create visualizations using Imply Enterprise. However, there are multiple ways to try Imply:

  • Imply Polaris: Easy to access through a browser.
  • Unmanaged Imply Enterprise: Download and run Imply software locally. You can try with your own data. Your data stays on your machine.
  • Imply Hybrid: Connect your data in your cloud account with Imply Hybrid. Contact Imply for a trial account. Once you have a trial account, you can start loading data.
  • K8s: Deploy using Helm chart. If you already have a Kubernetes cluster, get started with helm install after you add the Imply repository.

Don't use a quickstart instance for production. For information about installing Imply for production, see Production ready installation instructions.

Prerequisites

This section describes how to install and start Imply on a single machine using the quickstart configuration.

To run a single-machine Imply instance with the quickstart configuration, make sure you can meet the following requirements:

  • Java 8 (8u92 or higher). Imply builds and certifies its releases using OpenJDK. Select a distribution that provides long-term support and open source licensing, such as Amazon Corretto or Azul Zulu.
  • Python
  • Linux, Mac OS X, or other Unix-like OS. Windows and ARM-based CPUs are not supported. For more information, see Running on a VM.
  • At least 4 GB of RAM

Run the quickstart on a VM

The quickstart isn't supported for Windows machines or ARM-based CPUs like Apple M1. If you want to run the quickstart in one of these environments, use a Virtual Machine (VM) that runs a supported OS and CPU architecture, such as an Ubuntu VM on EC2.

When using a VM, consider the following additional requirements:

  • You need a way to transfer the Imply package to the VM, such as with scp.
  • The Imply UI, known as Pivot, uses port 9095 by default. Make sure that port is accessible.

Download Imply Enterprise

A new, unlicensed Imply Enterprise installation comes with a free 30-day trial period.

  1. Sign up and download Imply 2024.07 from imply.io/get-started.

  2. Unpack the release archive:

    tar -xzf imply-2024.07.tar.gz

    Note the version number in the command. You may have to adjust it for your version of the download.

The package contains the following files:

  • bin/* - run scripts for included software
  • conf/* - template configurations for a clustered setup
  • conf-quickstart/* - configurations for this quickstart
  • dist/* - all included software
  • quickstart/* - files useful for this quickstart

If you have a license from Imply, apply it by adding the path to the license file to conf-quickstart/pivot/config.yaml as the licenseFile property, as follows:

licenseFile: /Users/implyuser/Downloads/pivot-license

Start Imply Enterprise

Start the Imply services, which include Druid , Pivot, and ZooKeeper. The included supervise script starts Imply and other required services with a single command.

  1. Navigate to the directory where you unpacked the Imply files.

    cd imply-2024.07
  2. Run the start up script:

    bin/supervise -c conf/supervise/quickstart.conf

    If you encounter the error /usr/bin/env: 'python' not found, create a symbolic link to point /usr/bin/python to your Python installation. For example:

    sudo ln -s /usr/bin/python3 /usr/bin/python

    Then, run the script again.

    Imply logs a message for each service that starts up. You can view detailed logs for any service in the var/sv/ directory using another terminal.

  3. Optionally, verify that var/sv/pivot/current shows your license if you applied one.

To stop the Imply instance and its related services, interrupt the supervise program in your terminal with SIGINT (control + c).

For a clean start after stopping the services, remove the var/ directory before running the supervise script again.

Now that Imply is running, load data.

Load data

This section walks you through loading data from an HTTP source using the Druid data loader. The sample data represents Wikipedia edits from June 27, 2016.

The UI for Imply: quickstart 1

  1. Click the Open button from within the cluster list or cluster overview page.

    If you get a connection refused error, your Imply cluster may not be ready yet. Wait a few seconds and refresh the page.

  2. Click Load data to open the Druid console data loader.

    The Druid Console, which is part of the Imply stack, lets you ingest data from static and streaming sources: quickstart 2

  3. In the top navigation bar, click Load data > Batch - SQL.

  4. Select HTTP(s) to load data from an online source, like the sample data in this quickstart.

  5. In the URI field, enter https://static.imply.io/data/wikipedia.json.gz and Click Connect data.

    Leave HTTP auth username and HTTP auth password blank for this quickstart.

  6. The data loader automatically detects the parser type for the data and previews the parsed output in json format. When ready, click Next.

  7. The Configure schema page provides a user interface for you to modify your ingestion setup. You can:

    • Add, remove, and rename columns and metrics.
    • Apply filters to your incoming data.
    • Configure partitionining and clustering.
    • Enable rollup.
    • Specify the destination datasource.

    For this quickstart, accept the defaults. Click the SQL tab.

  8. The SQL tab shows ingestion statement for Druid SQL:

    quickstart SQL

    View the query
    REPLACE INTO "wikipedia" OVERWRITE ALL
    WITH "ext" AS (
    SELECT *
    FROM TABLE(
    EXTERN(
    '{"type":"http","uris":["https://static.imply.io/data/wikipedia.json.gz"]}',
    '{"type":"json"}'
    )
    ) EXTEND ("isRobot" VARCHAR, "channel" VARCHAR, "timestamp" VARCHAR, "flags" VARCHAR, "isUnpatrolled" VARCHAR, "page" VARCHAR, "diffUrl" VARCHAR, "added" BIGINT, "comment" VARCHAR, "commentLength" BIGINT, "isNew" VARCHAR, "isMinor" VARCHAR, "delta" BIGINT, "isAnonymous" VARCHAR, "user" VARCHAR, "deltaBucket" BIGINT, "deleted" BIGINT, "namespace" VARCHAR, "cityName" VARCHAR, "countryName" VARCHAR, "regionIsoCode" VARCHAR, "metroCode" BIGINT, "countryIsoCode" VARCHAR, "regionName" VARCHAR)
    )
    SELECT
    TIME_PARSE("timestamp") AS "__time",
    "isRobot",
    "channel",
    "flags",
    "isUnpatrolled",
    "page",
    "diffUrl",
    "added",
    "comment",
    "commentLength",
    "isNew",
    "isMinor",
    "delta",
    "isAnonymous",
    "user",
    "deltaBucket",
    "deleted",
    "namespace",
    "cityName",
    "countryName",
    "regionIsoCode",
    "metroCode",
    "countryIsoCode",
    "regionName"
    FROM "ext"
    PARTITIONED BY DAY

    Note the following:

    - The `REPLACE INTO` statement tells Druid to execute an ingestion task into the Druid datasource `wikipedia`.  
    - `OVERWRITE ALL` configures the ingestion task to replace all data in any existing `wikipedia` datasource with the result of the query.
    - The `EXTERN` statement assigns `https://static.imply.io/data/wikipedia.json.gz` as the HTTP input source and `json` as the input format.
    - The `EXTEND` statement describes the input schema.
    - The `SELECT` statement defines the transformations and schema of the resulting Druid table.
    - `PARTITIONED BY` sets the datasource partitioning to `DAY`.

    To learn more, see SQL-based ingestion reference.

  9. When ready, click Start loading data to ingest the data to Druid.

  10. Once the data has been loaded, click Query: wikipedia.

After this step succeeds, you can start to define a data cube and visualize your data.

Create a data cube

Pivot's data cubes are highly configurable and give you the flexibility to represent your dataset, as well as derived and custom columns, in many different ways.

The documentation on dimensions and measures is a good starting point for learning how to configure a data cube.

Create a data cube from the Wikipedia data you ingested:

  1. Go back to Pivot and make sure that your newly ingested datasource appears in the list. It might take a few seconds for it to show up.

    quickstart 10

  2. Go to Visuals tab. From here, you can create data cubes to model your data, explore these cubes, and organize views into dashboards.

  3. Create a new data cube:

    • Pivot SQL: leave it selected
    • Source: select wikipedia, which is the datasource name from the publish step in the Druid data loader.
    • Auto-fill dimensions and measures: leave it selected to allow Imply to intelligently inspect the columns in your datasource and determine possible dimensions and measures automatically.
  1. Explore the Edit data cube pages.

    From here, you can configure the aspects of your data cube, including defining and customizing the cube's dimensions and measures.

  2. Click Save when you're ready.

Visualize a data cube

There is a 2.0 view and a Classic view for data cubes. This section uses the 2.0 view.

The data cube view for a new data cube automatically loads when you save cube. You can also view existing data cubes on the Visuals page.

quickstart 12

With a data cube, you can explore a dataset by filtering and splitting it across any dimension. For each filtered split of your data, you can see the aggregate value of your selected measures.

On the Wikipedia dataset for the quickstart, you can see the most frequently edited pages by splitting on Page. Drag Page to the Show bar and keep the default sort (by Number of Events):

quickstart 13

The data cube view suggests different visualizations based on how you split data. You can change the visualization manually by choosing your preferred visualization from the dropdown. If the shown dimensions are not appropriate for a particular visualization, the data cube view recommends alternative dimensions.

For more information on visualizing data, refer to Data cubes.

Run SQL

Imply includes an interface for issuing Druid SQL queries. To access the SQL editor, go to the SQL page from the Pivot home page. Once there, try running the following query that returns the most edited Wikipedia pages:

SELECT page, COUNT(*) AS Edits
FROM wikipedia
WHERE "__time" BETWEEN TIMESTAMP '2016-06-27 00:00:00' AND TIMESTAMP '2016-06-28 00:00:00'
GROUP BY page
ORDER BY Edits
DESC LIMIT 5

You should see results like the following:

quickstart 15

For more details on making SQL queries with Druid, see the Druid SQL documentation.

Next steps

Congratulations! You deployed an Imply cluster, loaded a sample dataset, defined a data cube, explored some simple visualizations, and executed a query using Druid SQL.

Learn more

Next, you can:

Production-ready installation instructions

The configuration described in this quickstart is intended for exploring and learning about Imply. It's not meant for production workloads.

For information about using Imply in production, start by reviewing the Deployment overview to familiarize yourself with the options.

For information about installing Imply in a specific environment, refer to the following guides: