Cold Tier Hydration
Cold tier hydration promotes data from S3 Parquet (cold tier) to DuckDB (hot tier) for low-latency queries.
Overview
Streaming DuckLakes use two storage tiers:
| Tier | Storage | Query Latency |
|---|---|---|
| Hot | DuckDB | ~1 second |
| Cold | S3 Parquet | Seconds to minutes |
Hydration queues cold tier files for promotion to the hot tier, enabling sub-second queries over historical data.
CLI Commands
Service Status
bash
# Global hydration statistics
boilstream-admin hydration stats
# Queue status (queued, in-flight, failed counts)
boilstream-admin hydration queue
# Service health check
boilstream-admin hydration healthHydrate a Table
bash
# Hydrate table with default priority
boilstream-admin hydration table -d <ducklake_id> -n <table_name>
# Hydrate with high priority
boilstream-admin hydration table -d <ducklake_id> -n <table_name> -p highMonitor Progress
bash
# List hydrated tables for a DuckLake
boilstream-admin hydration list -d <ducklake_id>
# Check status of specific table (by topic_id)
boilstream-admin hydration status -d <ducklake_id> -t <topic_id>
# List pending hydration jobs
boilstream-admin hydration pending -d <ducklake_id>Eviction and Cleanup
bash
# Flush (evict) a specific table from hot tier
boilstream-admin hydration flush -d <ducklake_id> -t <topic_id>
# Flush all hydrated tables for a DuckLake
boilstream-admin hydration flush-all -d <ducklake_id> --yes
# Cancel pending hydration jobs for a table
boilstream-admin hydration cancel -d <ducklake_id> -t <topic_id>REST API
User Endpoints (authenticated, own DuckLakes)
| Method | Endpoint | Description |
|---|---|---|
| POST | /auth/api/hydration/table | Hydrate a table |
| GET | /auth/api/hydration/status | Get hydration status |
Admin Endpoints (superadmin)
| Method | Endpoint | Description |
|---|---|---|
| POST | /auth/api/admin/hydration/table | Hydrate any table |
| GET | /auth/api/admin/hydration/stats | Global statistics |
| GET | /auth/api/admin/hydration/queue | Queue statistics |
How It Works
- User requests table hydration via CLI or API
- System queries DuckLake catalog to find cold tier Parquet files
- Files are queued with specified priority
- Worker threads download and promote files to hot tier DuckDB
- Queries via pgwire automatically see hydrated data
Priorities
| Priority | Use Case |
|---|---|
low | Background hydration |
normal | Default |
high | User-initiated, time-sensitive |
Automatic Eviction
Hydrated data is evicted when:
- User logs out (DuckLake-level eviction)
- Manual flush via CLI/API
- Memory pressure (LRU eviction)
Next Steps
- DuckLake Integration - Hot/cold tier architecture
- boilstream-admin CLI - All CLI commands