Skip to main content

Lumi Enterprise

AI summary
Explains how to deploy Imply Lumi Enterprise in your AWS environment using Terraform for data sovereignty and compliance. Covers infrastructure setup, event data ingestion through MSK, storage management, and cluster monitoring. Includes deployment architecture and scaling options.

About AI summaries.

Imply Lumi Enterprise is a deployment of Lumi that lives in your AWS environment, keeping your data within your infrastructure. If you have strict data sovereignty, security, and compliance requirements, Lumi Enterprise may be a better fit than the Imply-hosted Lumi Cloud.

Imply provides a Terraform module that simplifies Lumi Enterprise deployment. The module handles all the deployment steps, including the AWS infrastructure and dependent PaaS Services, such as Imply's distribution of Apache® Druid, Amazon Managed Streaming for Apache® Kafka (Amazon MSK), Amazon Elastic Kubernetes Service (Amazon EKS), Amazon S3, and Amazon RDS Aurora MySQL. You don't need to perform any additional installation or configuration outside of the Terraform module.

Lumi uses Imply’s distribution of Druid to power its services and MSK to stream event data from various sources.

Lumi infrastructure

The following table provides a basic overview of Lumi nodes you deploy:

NodeDescription
CoordinatorManages data availability and distribution, enforces data retention rules, and oversees ingestion tasks.
BrokerRoutes queries from external clients to the appropriate nodes. Note that Lumi Brokers are unrelated to MSK brokers.
DataServes frequently queried data. Data nodes consist of hot storage, a persistent cache, and hot compute for serving low-latency, high-performance queries.
VirtualServes infrequently accessed data at lower cost. Virtual nodes consist of virtual storage, a transient cache, and virtual compute. When needed, Lumi loads data from object storage on demand, trading some query performance for cost savings.
IngestionReads data from source systems, transforms, indexes, and partitions the data into files. MSK streams event data to ingestion nodes, where the Indexer service processes it.

Event data ingestion

MSK streams your event data to Lumi. The Indexer service on ingestion nodes runs the tasks. After ingestion, Lumi stores data in object storage.

As part of the ingestion process, Lumi indexes and pre-aggregates data to help speed up queries.

Event data storage

Any data that you ingest into Lumi is stored in object storage and managed based on intervals of time, known as segments. During segment creation, Lumi pre-indexes the data to speed up query processing. Depending on your retention policies, Lumi either loads segments onto Data nodes or keeps them in object storage to be loaded onto Virtual nodes when needed.

Lumi stores frequently queried segments in hot storage, a persistent cache on Data nodes. The hot storage provides high performance for regularly accessed data.

When querying data not in hot storage, Lumi loads segments on demand from object storage into virtual storage, a transient cache on Virtual nodes.

This segment-based storage is one of the features that helps Lumi efficiently store and retrieve data you query. When serving queries, Lumi accesses only the data that falls within the requested time interval.

Queries

The Lumi Broker routes your queries to the Data nodes containing the relevant time-interval segment if the queried time interval falls within the hot data retention period.

Queries for data that fall outside of that range are routed to the Virtual nodes.

If you query data that's actively streaming to Lumi, the Broker routes your query to the Indexer.

Lumi Management Console

After you deploy Lumi Enterprise, you can monitor your deployment through the Lumi Management Console hosted by Imply.

As you monitor your deployment, use the Terraform module to scale your cluster to better meet your needs.

Get started

When you're ready to get started, learn how to install Lumi Enterprise using Terraform.