Skip to main content

Data tiers overview

AI summary
Describes data tiers in Imply Lumi for balancing query performance and cost. Explains the hot tier for fast access and the virtual tier for cost-efficient storage. Covers virtual compute sizing and how Lumi loads data on demand.

About AI summaries.
Early access

This is an early access feature. Contact your Imply support representative to request access.

By default, Imply Lumi copies ingested data to the hot tier that delivers consistent query performance with low latency and high concurrency. For price-performance balance, Lumi provides a virtual tier that loads data and spins up compute resources on demand at a lower cost.

This topic describes data tiers in Lumi. For details on how to use a virtual tier, see Configure tiering rules.

Hot tier

The hot tier is the default location where Lumi caches data. The hot tier consists of the following components:

  • Hot storage: High-performance persistent cache.
  • Hot compute: Persistent compute resources that serve queries against hot storage.

Virtual tier

The virtual tier is a cost-efficient alternative to the hot tier that is suitable for infrequently accessed data, such as historical investigations, audits, and archival data. To provision a virtual tier, you create a virtual storage rule and configure the size of your virtual compute pool. For more information, see Configure tiering rules.

Similar to the hot tier, the virtual tier caches data, so frequently accessed data may deliver performance comparable to the hot tier, depending on the eviction rate. To monitor the data eviction rate for your virtual tier, see the Virtual tier churn ratio metric.

Virtual storage

Transient cache that loads data from object storage on demand. Lumi caches the data only when a query needs it and evicts it when idle. Subsequent queries can read from the same cache.

Virtual compute

Virtual compute provides dedicated compute resources that serve queries against virtual storage. When a query targets data outside hot storage, Lumi spins up a virtual compute pool to serve the query. When you create a tiering rule, you set the maximum time the virtual compute pool can remain idle before shutting down. Instead of paying for continuous availability across all data, you pay for on demand compute when required to access data from virtual storage.

Size

Lumi measures the size of a virtual compute pool in t-shirt sizes, such as Small, Medium, and Large. The size determines query concurrency and latency. The default size is Large.

The following table shows the available t-shirt sizes and their corresponding query concurrency:

SizeMax concurrency
X-small, Small, Medium4
Large8
X-large, 2X-large, 3X-large12

We recommend starting with the Large size and scaling up or down based on observed performance. Consider the following factors when selecting a t-shirt size:

  • Number of concurrent queries: The more queries are running concurrently, the more compute resources you need.
  • Data size and composition: Larger and more complex datasets require more compute resources to query.
  • Query latency: If you need low latency, you might need to configure a larger t-shirt size.

Status

Because virtual compute resources spin up on demand and shut down when idle, the status of the virtual compute pool changes based on its current state:

  • Starting: Virtual compute pool is starting. Lumi is provisioning compute resources to serve queries.
  • Running: Virtual compute pool is running and serving queries.
  • Stopped: Virtual compute pool is inactive. All compute resources are shut down.

To see the current status of your virtual compute pool, go to Data > Virtual compute.

Data residency

Configuring tiering rules doesn't change where your data resides:

  • In Lumi Cloud, data resides in the Lumi AWS environment.
  • In Lumi Enterprise, data resides in your AWS environment.

Learn more

For more information, see the following topics: