Skip to content

Cluster Mode

BoilStream supports horizontal scaling through S3-based leader election and distributed catalog management.

Beta

Cluster mode is in beta. The API and behavior may change in future releases.

Architecture

Leader Election

Leader election uses S3 conditional writes for distributed consensus:

  1. Nodes compete to write leader claim to S3
  2. Winner becomes leader, manages catalog operations
  3. Other nodes become brokers, redirect admin requests to leader
  4. Heartbeats maintain leadership; stale leaders are replaced

Configuration

yaml
cluster_mode:
  enabled: true
  node_id_path: "./node_id"              # Persistent ID (over restarts)
  advertised_host: "node1.example.com"   # Externally reachable hostname
  internal_api_port: 8443                # Inter-node communication
  leader_heartbeat_interval_secs: 30
  leader_stale_threshold_secs: 120
  broker_heartbeat_interval_secs: 30
  broker_stale_threshold_secs: 120

Node Roles

Leader Node

  • Handles all catalog management operations (create/delete DuckLakes, user management)
  • Coordinates cluster state
  • Performs catalog backups to S3
  • Processes admin API requests

Broker Nodes

  • Handle data ingestion (FlightRPC, Kafka, HTTP/2)
  • Serve read queries via PostgreSQL interface
  • Redirect admin requests to leader
  • Can be promoted to leader if current leader fails

Broker States

StateDescription
activeNormal operation
drainingFinishing work, rejecting new connections (for rolling updates)
retiringPreparing for shutdown
shutdownNode shutting down

Operations

View Cluster Status

bash
boilstream-admin cluster status

Rolling Updates

bash
# 1. Drain node
boilstream-admin cluster broker update node-123 --state draining

# 2. Wait for connections to finish, then stop/update/restart

# 3. Re-activate
boilstream-admin cluster broker update node-123 --state active

Leader Stepdown

bash
# Graceful stepdown (automatic leader election)
boilstream-admin cluster leader stepdown

# Stepdown with preferred successor
boilstream-admin cluster leader stepdown --preferred node-456

High Availability

  • Automatic failover: If leader becomes stale, brokers compete for leadership
  • No single point of failure: Any node can become leader
  • Consistent state: S3 provides durable cluster state storage
  • Graceful degradation: Ingestion continues even during leader transitions

Cluster Identity

Nodes belong to the same cluster when they share the same primary storage backend configuration. The cluster state is stored in S3 at:

s3://{storage.bucket}/{storage.prefix}cluster_state/

Configure in config.yaml:

yaml
storage:
  primary_backend:
    type: s3
    bucket: "my-cluster-bucket"
    prefix: "prod/"                    # Cluster state at: prod/cluster_state/
    region: "us-east-1"
    access_key: "..."
    secret_key: "..."

All nodes must use identical storage backend settings to join the same cluster.

S3 State Structure

s3://{bucket}/{prefix}cluster_state/
├── leader.json           # Current leader info
├── cluster_secret.json   # Shared auth secret (auto-generated)
├── brokers/
│   ├── node-abc.json     # Broker heartbeats
│   ├── node-def.json
│   └── node-ghi.json
└── catalogs/
    └── backups/          # DuckLake catalog backups

Circuit Breaker

Admin operations are protected by a circuit breaker that blocks changes if:

  • Backup system is unhealthy
  • S3 is unreachable
  • Leader election is in progress

This prevents data loss during infrastructure issues.

Next Steps