• Developer guide
  • API reference

›Getting started

Getting started

  • Introduction to Imply Polaris
  • Quickstart
  • Execute a POC
  • Create a dashboard
  • Navigate the console
  • Key concepts

Tables and data

  • Overview
  • Introduction to tables
  • Table schema
  • Create an ingestion job
  • Timestamp expressions
  • Data partitioning
  • Introduction to rollup
  • Approximation algorithms
  • Replace data

Ingestion sources

  • Ingestion sources overview
  • Supported data formats
  • Create a connection
  • Ingest from files
  • Ingest data from a table
  • Ingest from S3
  • Ingest from Kafka and MSK
  • Ingest from Kinesis
  • Ingest from Confluent Cloud
  • Kafka Connector for Imply Polaris
  • Push event data
  • Connect to Confluent Schema Registry

Analytics

  • Overview
  • Manage data cubes
  • Visualize data
  • Data cube dimensions
  • Data cube measures
  • Dashboards
  • Visualizations reference
  • Set up alerts
  • Set up reports
  • Embed visualizations
  • Query data

Monitoring

  • Overview

Management

  • Overview
  • Pause and resume a project

Billing

  • Overview
  • Polaris plans
  • Estimate project costs

Usage

  • Overview

Security

    Polaris access

    • Overview
    • Invite users to your organization
    • Manage users
    • Permissions reference
    • Manage user groups
    • Enable SSO
    • SSO settings reference
    • Map IdP groups

    Secure networking

    • Connect to AWS
    • Create AWS PrivateLink connection

Developer guide

  • Overview
  • Authentication

    • Overview
    • Authenticate with API keys
    • Authenticate with OAuth
  • Manage users and groups
  • Migrate deprecated resources
  • Create a table
  • Define a schema
  • Upload files
  • Create an ingestion job
  • Ingestion sources

    • Ingest from files
    • Ingest from a table
    • Get ARN for AWS access
    • Ingest from Amazon S3
    • Ingest from Kafka and MSK
    • Ingest from Amazon Kinesis
    • Ingest from Confluent Cloud
    • Push event data
    • Kafka Connector for Imply Polaris
    • Kafka Connector reference
  • Filter data to ingest
  • Ingest nested data
  • Ingest and query sketches
  • Specify data schema
  • Query data
  • Update a project
  • Link to BI tools
  • Connect over JDBC
  • Query parameters reference
  • API documentation

    • OpenAPI reference
    • Query API

Product info

  • Release notes
  • Known limitations
  • Druid extensions

Execute a Proof of Concept

Once you've signed up for the 30-day free trial of Imply Polaris and checked out the Quickstart, you might want to execute a Proof of Concept (POC) to evaluate whether Polaris meets your requirements and expectations.

This guide is designed to help you make the most of Polaris, and is aligned with key test milestones to maximize the 30 days of your trial. During the trial you can use Polaris continuously for the entire period without incurring any additional charges.

If you need more help to analyze your business case and requirements, contact Imply. You can also join Polaris support on Slack to get help and advice.

Define your POC requirements

A modern analytics application must provide all of the following:

  • Interactive analytics at any scale
  • High concurrency at best value
  • Insights on batch and/or streaming data

The following guidelines provide a blueprint on how to structure a POC to validate these requirements.

We recommend that you take the following approach:

  1. Define your success criteria
  2. Define the scope of the data
  3. Ingest a small dataset using streaming or batch ingestion
  4. Query and analyze the data
  5. Test and monitor the data
  6. Use the Polaris API
  7. Iterate
  8. Evaluate POC success
  9. Consider next steps

Define your success criteria

Specify the desired outcomes of the trial. Some examples of success criteria are as follows:

  • Streaming ingestion:
    • Ingest JSON and Avro data with latency less than x seconds
    • Consume data directly from Confluent Cloud
    • Surface ingested data in the Polaris UI within x seconds
  • Querying and analytics:
    • Execute x% of queries in x seconds or less
    • Run queries over x records for a variety of time periods
    • Create alerts for slow-running queries and ingestion
    • Set up resource-based access control at a row and column level
  • Monitoring:
    • Monitor for slow-running queries and ingestion
  • API:
    • Use the API to create and update users and groups

Define the scope of the data

Determine the attributes and formats of the data you want to test in Polaris. Create synthetic data or select production data based on these requirements.

During your trial, you can store up to 200 GB of data (input size can be up to 1 TB of JSON, CSV, and TSV data) in Polaris.

Ingest data

Choose your ingestion method: batch, streaming, or both.

Streaming ingestion

Evaluate the throughput needed to support your analytics applications. Consider the number of messages per second or per minute (x) and the size of messages (y).

Polaris supports realtime ingestion through the Push ingestion API, Apache Kafka (and Kafka-compatible APIs), and Amazon Kinesis. Use one of these methods to ingest real-time data and observe data appearing instantly in Polaris tables.

See the following pages to learn more about streaming ingestion options in Polaris:

  • Ingest events with the Kafka connector
  • Ingest data from Apache Kafka and Amazon MSK by API
  • Ingest data from Amazon Kinesis by API
  • Ingest data from Confluent Cloud by API
  • Push event data by API

Criteria for POC

Ingest x number of messages per second or per minute with an average message size of y bytes.

Batch ingestion

If ingesting data through batch is key to your application needs, consider the following questions:

  • What is the format of the data? (x)
  • How much data do you need to ingest in a batch? (y)
  • How often will you need to ingest each batch of data? (z)

See the following pages to learn more about batch ingestion options in Polaris:

  • Ingest data from files by API
  • Ingest data from Amazon S3 by API

Criteria for POC

Ingest data with format x and size y GB every z hours. Validate that the data appears in your Polaris tables.

Query and analyze the data

Interactive drill-down analytics are a key feature of an analytics application. Polaris gives you the ability to query, explore, and analyze your data with visually rich, intuitive tools. See Analytics overview for more information.

Before you validate this requirement, decide what's most important for your users to see, what kind of interactive exploration you want to enable, and a suitable period of time for which to query and analyze data.

Criteria for POC

Create data cubes, dashboards, and other resources as required to analyze data for a defined period of time. Investigate and troubleshoot anomalies.

Test and monitor the data

Develop a good understanding of the most frequently used query patterns for your application. You can obtain these queries from the monitoring pages in the Polaris UI—see Monitoring for details.

You might want to create a workload framework that includes a mix of your query patterns, in a tool such as Apache JMeter or Locust. These applications are designed to load test functional behavior and measure performance.

Define the number of queries (x) your application should complete in a specified time period, typically one or two seconds (y). Try to be realistic—it's common to overestimate the required query frequency, which impacts query patterns and data model design.

Your trial has limited resources. If you need to execute more than 10 queries per second, contact Imply.

Criteria for POC

Ensure that Polaris can support x representative queries per y seconds.

Use the Polaris API

APIs enable you to programmatically build an application around your database, and automate ingestion and other processes.

See the following pages to learn more about the Polaris API:

  • API overview
  • API authentication
  • Polaris API reference

Criteria for POC

Confirm that the Polaris API supports the tasks you need for your application.

Guidance on iteration

When you execute the POC, perform initial and subsequent iterations. Iterations should have the following charactertics:

First iteration:

  • Test Polaris functionality with a small amount of data.
  • Start with a sample dataset of 1 GB or less.

Second and subsequent iterations:

  • Ingest a significant portion of test data using your preferred ingestion methods—1-5% of your estimated production data volume is a good starting point.
  • Ingest data that covers the full timeframe of your production needs.
  • For batch ingestion:
    • Create incremental batch files.
    • Use cron, Apache Airflow, or another scheduling tool to upload the incremental files and schedule ingestion API requests.
    • Replace existing data, if that's one of your requirements. See Replace data for more information.
  • For streaming ingestion:
    • Increase your throughput by pushing event data by API, or ingest events from Kafka or Kinesis.
  • Test and tune queries produced in the first iteration until they meet your success criteria. See the following topics for details:
    • Data rollup for information on aggregating raw data during ingestion.
    • Data partitioning for information on partitioning, sorting, and clustering.
    • Approximation algorithms for information on using sketches to trade accuracy for reduced storage and improved performance.

Evalute POC success

Evaluate how well your completed POC has satisfied your success criteria.

Next steps

If you're satisfied with what Polaris can provide at the end of your free trial, you can continue to use your Polaris database with pay-as-you-go billing. You can continue to test larger datasets and workloads once you once you transfer to pay-as-you-go.

If you have questions or need more time to evaluate Polaris, contact Imply for help.

Learn more

See the following pages for further information:

  • Polaris documentation
  • Polaris billing plans
← QuickstartCreate a dashboard →
  • Define your POC requirements
  • Define your success criteria
  • Define the scope of the data
  • Ingest data
    • Streaming ingestion
    • Batch ingestion
  • Query and analyze the data
    • Criteria for POC
  • Test and monitor the data
    • Criteria for POC
  • Use the Polaris API
    • Criteria for POC
  • Guidance on iteration
  • Evalute POC success
  • Next steps
  • Learn more
Key links
Try ImplyApache Druid siteImply GitHub
Get help
Stack OverflowSupportContact us
Learn more
BlogApache Druid docs
Copyright © 2023 Imply Data, Inc