Execute a Proof of Concept
Once you've signed up for the 30-day free trial of Imply Polaris and checked out the Quickstart, you might want to execute a Proof of Concept (POC) to evaluate whether Polaris meets your requirements and expectations.
This guide is designed to help you make the most of Polaris, and is aligned with key test milestones to maximize the 30 days of your trial. During the trial you can use Polaris continuously for the entire period without incurring any additional charges.
If you need more help to analyze your business case and requirements, contact Imply. You can also join Polaris support on Slack to get help and advice.
Define your POC requirements
A modern analytics application must provide all of the following:
- Interactive analytics at any scale
- High concurrency at best value
- Insights on batch and/or streaming data
The following guidelines provide a blueprint on how to structure a POC to validate these requirements.
We recommend that you take the following approach:
- Define your success criteria
- Define the scope of the data
- Ingest a small dataset using streaming or batch ingestion
- Query and analyze the data
- Test and monitor the data
- Use the Polaris API
- Iterate
- Evaluate POC success
- Consider next steps
Define your success criteria
Specify the desired outcomes of the trial. Some examples of success criteria are as follows:
- Streaming ingestion:
- Ingest JSON and Avro data with latency less than
x
seconds - Consume data directly from Confluent Cloud
- Surface ingested data in the Polaris UI within
x
seconds
- Ingest JSON and Avro data with latency less than
- Querying and analytics:
- Execute
x
% of queries inx
seconds or less - Run queries over
x
records for a variety of time periods - Create alerts for slow-running queries and ingestion
- Set up resource-based access control at a row and column level
- Execute
- Monitoring:
- Monitor for slow-running queries and ingestion
- API:
- Use the API to create and update users and groups
Define the scope of the data
Determine the attributes and formats of the data you want to test in Polaris. Create synthetic data or select production data based on these requirements.
During your trial, you can store up to 200 GB of data (input size can be up to 1 TB of JSON, CSV, and TSV data) in Polaris.
Ingest data
Choose your ingestion method: batch, streaming, or both.
Streaming ingestion
Evaluate the throughput needed to support your analytics applications. Consider the number of messages per second or per minute (x
) and the size of messages (y
).
Polaris supports realtime ingestion through the Push ingestion API, Apache Kafka (and Kafka-compatible APIs), and Amazon Kinesis. Use one of these methods to ingest real-time data and observe data appearing instantly in Polaris tables.
See the following pages to learn more about streaming ingestion options in Polaris:
- Ingest events with the Kafka connector
- Ingest data from Apache Kafka by API
- Ingest data from Amazon Kinesis by API
- Ingest data from Confluent Cloud by API
- Push event data by API
Criteria for POC
Ingest x
number of messages per second or per minute with an average message size of y
bytes.
Batch ingestion
If ingesting data through batch is key to your application needs, consider the following questions:
- What is the format of the data? (
x
) - How much data do you need to ingest in a batch? (
y
) - How often will you need to ingest each batch of data? (
z
)
See the following pages to learn more about batch ingestion options in Polaris:
Criteria for POC
Ingest data with format x
and size y
GB every z
hours. Validate that the data appears in your Polaris tables.
Query and analyze the data
Interactive drill-down analytics are a key feature of an analytics application. Polaris gives you the ability to query, explore, and analyze your data with visually rich, intuitive tools. See Analytics overview for more information.
Before you validate this requirement, decide what's most important for your users to see, what kind of interactive exploration you want to enable, and a suitable period of time for which to query and analyze data.
Criteria for POC
Create data cubes, dashboards, and other resources as required to analyze data for a defined period of time. Investigate and troubleshoot anomalies.
Test and monitor the data
Develop a good understanding of the most frequently used query patterns for your application. You can obtain these queries from the monitoring pages in the Polaris UI—see Monitoring for details.
You might want to create a workload framework that includes a mix of your query patterns, in a tool such as Apache JMeter or Locust. These applications are designed to load test functional behavior and measure performance.
Define the number of queries (x
) your application should complete in a specified time period, typically one or two seconds (y
). Try to be realistic—it's common to overestimate the required query frequency, which impacts query patterns and data model design.
Your trial has limited resources. If you need to execute more than 10 queries per second, contact Imply.
Criteria for POC
Ensure that Polaris can support x
representative queries per y
seconds.
Use the Polaris API
APIs enable you to programmatically build an application around your database, and automate ingestion and other processes.
See the following pages to learn more about the Polaris API:
Criteria for POC
Confirm that the Polaris API supports the tasks you need for your application.
Guidance on iteration
When you execute the POC, perform initial and subsequent iterations. Iterations should have the following charactertics:
First iteration:
- Test Polaris functionality with a small amount of data.
- Start with a sample dataset of 1 GB or less.
Second and subsequent iterations:
- Ingest a significant portion of test data using your preferred ingestion methods—1-5% of your estimated production data volume is a good starting point.
- Ingest data that covers the full timeframe of your production needs.
- For batch ingestion:
- Create incremental batch files.
- Use cron, Apache Airflow, or another scheduling tool to upload the incremental files and schedule ingestion API requests.
- Replace existing data, if that's one of your requirements. See Replace data for more information.
- For streaming ingestion:
- Increase your throughput by pushing event data by API, or ingest events from Kafka or Kinesis.
- Test and tune queries produced in the first iteration until they meet your success criteria. See the following topics for details:
- Data rollup for information on aggregating raw data during ingestion.
- Data partitioning for information on partitioning, sorting, and clustering.
- Compute results with cardinality sketches for information on using sketches to trade accuracy for reduced storage and improved performance.
Evaluate POC success
Evaluate how well your completed POC has satisfied your success criteria.
Next steps
If you're satisfied with what Polaris can provide at the end of your free trial, you can continue to use your Polaris database with pay-as-you-go billing. You can continue to test larger datasets and workloads once you once you transfer to pay-as-you-go.
If you have questions or need more time to evaluate Polaris, contact Imply for help.
Learn more
See the following pages for further information: