Data tiers overview
AI summary
About AI summaries.
This is an early access feature. Contact your Imply support representative to request access.
By default, Imply Lumi copies ingested data to the hot tier that delivers consistent query performance with low latency and high concurrency. For price-performance balance, Lumi provides a virtual tier that loads data and spins up compute resources on demand at a lower cost.
This topic describes data tiers in Lumi. For details on how to use a virtual tier, see Configure tiering rules.
Hot tier
The hot tier is the default location where Lumi caches data. The hot tier consists of the following components:
- Hot storage: High-performance persistent cache.
- Hot compute: Persistent compute resources that serve queries against hot storage.
Virtual tier
The virtual tier is a cost-efficient alternative to the hot tier that is suitable for infrequently accessed data, such as historical investigations, audits, and archival data. To provision a virtual tier, you create a virtual storage rule and configure the size of your virtual compute pool. For more information, see Configure tiering rules.
Similar to the hot tier, the virtual tier caches data, so frequently accessed data may deliver performance comparable to the hot tier, depending on the eviction rate. To monitor the data eviction rate for your virtual tier, see the Virtual tier churn ratio metric.
Virtual storage
Transient cache that loads data from object storage on demand. Lumi caches the data only when a query needs it and evicts it when idle. Subsequent queries can read from the same cache.
Virtual compute
Virtual compute provides dedicated compute resources that serve queries against virtual storage. When a query targets data outside hot storage, Lumi spins up a virtual compute pool to serve the query. When you create a tiering rule, you set the maximum time the virtual compute pool can remain idle before shutting down. Instead of paying for continuous availability across all data, you pay for on demand compute when required to access data from virtual storage.
Size
Lumi measures the size of a virtual compute pool in t-shirt sizes, such as Small, Medium, and Large. The size determines query concurrency and latency. The default size is Large.
The following table shows the available t-shirt sizes and their corresponding query concurrency:
| Size | Max concurrency |
|---|---|
| X-small, Small, Medium | 4 |
| Large | 8 |
| X-large, 2X-large, 3X-large | 12 |
We recommend starting with the Large size and scaling up or down based on observed performance. Consider the following factors when selecting a t-shirt size:
- Number of concurrent queries: The more queries are running concurrently, the more compute resources you need.
- Data size and composition: Larger and more complex datasets require more compute resources to query.
- Query latency: If you need low latency, you might need to configure a larger t-shirt size.
Status
Because virtual compute resources spin up on demand and shut down when idle, the status of the virtual compute pool changes based on its current state:
- Starting: Virtual compute pool is starting. Lumi is provisioning compute resources to serve queries.
- Running: Virtual compute pool is running and serving queries.
- Stopped: Virtual compute pool is inactive. All compute resources are shut down.
To see the current status of your virtual compute pool, go to Data > Virtual compute.
Data residency
Configuring tiering rules doesn't change where your data resides:
- In Lumi Cloud, data resides in the Lumi AWS environment.
- In Lumi Enterprise, data resides in your AWS environment.
Learn more
For more information, see the following topics:
- Configure tiering rules to add and manage tiering rules.
- Configure deletion rules to set up automatic data deletion.
- Enterprise storage metrics to learn about storage metrics for Lumi Enterprise.
- Cloud storage metrics to learn about storage metrics for Lumi Cloud.