Skip to content

Cold Tier Hydration

Cold tier hydration promotes data from S3 Parquet (cold tier) to DuckDB (hot tier) for low-latency queries.

Overview

Streaming DuckLakes use two storage tiers:

TierStorageQuery Latency
HotDuckDB~1 second
ColdS3 ParquetSeconds to minutes

Hydration queues cold tier files for promotion to the hot tier, enabling sub-second queries over historical data.

CLI Commands

Service Status

bash
# Global hydration statistics
boilstream-admin hydration stats

# Queue status (queued, in-flight, failed counts)
boilstream-admin hydration queue

# Service health check
boilstream-admin hydration health

Hydrate a Table

bash
# Hydrate table with default priority
boilstream-admin hydration table -d <ducklake_id> -n <table_name>

# Hydrate with high priority
boilstream-admin hydration table -d <ducklake_id> -n <table_name> -p high

Monitor Progress

bash
# List hydrated tables for a DuckLake
boilstream-admin hydration list -d <ducklake_id>

# Check status of specific table (by topic_id)
boilstream-admin hydration status -d <ducklake_id> -t <topic_id>

# List pending hydration jobs
boilstream-admin hydration pending -d <ducklake_id>

Eviction and Cleanup

bash
# Flush (evict) a specific table from hot tier
boilstream-admin hydration flush -d <ducklake_id> -t <topic_id>

# Flush all hydrated tables for a DuckLake
boilstream-admin hydration flush-all -d <ducklake_id> --yes

# Cancel pending hydration jobs for a table
boilstream-admin hydration cancel -d <ducklake_id> -t <topic_id>

REST API

User Endpoints (authenticated, own DuckLakes)

MethodEndpointDescription
POST/auth/api/hydration/tableHydrate a table
GET/auth/api/hydration/statusGet hydration status

Admin Endpoints (superadmin)

MethodEndpointDescription
POST/auth/api/admin/hydration/tableHydrate any table
GET/auth/api/admin/hydration/statsGlobal statistics
GET/auth/api/admin/hydration/queueQueue statistics

How It Works

  1. User requests table hydration via CLI or API
  2. System queries DuckLake catalog to find cold tier Parquet files
  3. Files are queued with specified priority
  4. Worker threads download and promote files to hot tier DuckDB
  5. Queries via pgwire automatically see hydrated data

Priorities

PriorityUse Case
lowBackground hydration
normalDefault
highUser-initiated, time-sensitive

Automatic Eviction

Hydrated data is evicted when:

  • User logs out (DuckLake-level eviction)
  • Manual flush via CLI/API
  • Memory pressure (LRU eviction)

Next Steps