Configuration
BoilStream supports flexible configuration through YAML files and environment variables. This page covers all configuration options and how to use them.
Configuration Loading Priority
Configuration is loaded in the following priority order (highest to lowest):
- Command line
--config
file CONFIG_FILE
environment variable- Built-in defaults (if no config file specified)
Environment variables always override YAML file settings regardless of the config file source.
Specifying Configuration Files
Command Line Option
# Specify config file via command line
boilstream --config my-config.yaml
# Alternative syntax
boilstream --config=my-config.yaml
Environment Variable
# Specify config file via environment variable
CONFIG_FILE=my-config.yaml boilstream
No Configuration File
# Use only built-in defaults + environment variables
S3_BUCKET=my-bucket boilstream
YAML Configuration Format
Here's a complete example configuration file:
# AWS Configuration
aws:
region: "eu-west-1"
# access_key_id: "your-access-key" # Optional - can use AWS CLI/IAM roles
# secret_access_key: "your-secret-key" # Optional - can use AWS CLI/IAM roles
https_conn_pool_size: 100
# DuckDB Persistence Configuration (Optional High-Performance Local Storage)
duckdb_persistence:
enabled: true # Enable high-performance local DuckDB persistence (10M+ rows/s)
storage_path: "/tmp/duckdb/topics" # Directory for shared DuckDB database files
max_writers: 10 # Number of concurrent database writers for optimal performance
# Storage Configuration
storage:
# Multiple storage backends can be configured simultaneously
backends:
- name: "primary-s3"
backend_type: "s3"
enabled: true
primary: true # Primary backend - operations must succeed here
# S3-specific configuration
endpoint: "http://localhost:9000" # For MinIO/custom S3
bucket: "ingestion-data"
prefix: "/"
access_key: "minioadmin"
secret_key: "minioadmin"
region: "us-east-1"
use_path_style: true # Required for MinIO
max_concurrent_uploads: 10
upload_id_pool_capacity: 100
max_retries: 3
initial_backoff_ms: 100
max_retry_attempts: 3
flush_interval_ms: 250
max_multipart_object_size: 104857600 # 100 MB
- name: "backup-filesystem"
backend_type: "filesystem"
enabled: true
primary: false # Secondary backend - failures are logged but not fatal
# Filesystem-specific configuration
prefix: "/tmp/storage"
# - name: "debug-noop"
# backend_type: "noop"
# enabled: false
# primary: false # For testing/benchmarking without actual storage
# Server Configuration
server:
# tokio_worker_threads: 16 # Optional - defaults to system CPU count
flight_thread_count: 1
flight_base_port: 50050
admin_flight_port: 50160
consumer_flight_port: 50250
valkey_url: "redis://localhost:6379"
# Data Processing Configuration
processing:
data_processing_threads: 8
buffer_pool_max_size: 50
window_queue_capacity: 30000
window_ms: 10000
include_metadata_columns: true
schema_validation_enabled: true
parquet:
compression: "ZSTD"
dictionary_enabled: true
# Rate Limiting Configuration
rate_limiting:
disabled: false
max_requests: 15000000
burst_limit: 20000000
global_limit: 150000000
base_size_bytes: 4096
# TLS Configuration
tls:
disabled: true # Disabled for development
# cert_path: "/path/to/cert.pem"
# key_path: "/path/to/key.pem"
# cert_pem: "-----BEGIN CERTIFICATE-----\n..."
# key_pem: "-----BEGIN PRIVATE KEY-----\n..."
# grpc_default_ssl_roots_file_path: "/path/to/ca-certificates.crt"
# Authentication Configuration
auth:
providers: [] # Empty for development - no auth
authorization_enabled: false
admin_groups: []
read_only_groups: []
cognito:
# user_pool_id: "us-east-1_example"
# region: "us-east-1"
# audience: "client-id"
azure:
# tenant_id: "tenant-id"
# client_id: "client-id"
allow_multi_tenant: false
gcp:
# client_id: "client-id"
# project_id: "project-id"
require_workspace_domain: false
auth0:
# tenant: "your-tenant.auth0.com"
# audience: "your-api-identifier"
# groups_namespace: "https://your-app.com/groups"
okta:
# org_domain: "your-org.okta.com"
# audience: "api://your-audience"
# auth_server_id: "your-auth-server"
# Metrics Configuration
metrics:
port: 8081
flush_interval_ms: 1000
# Logging Configuration
logging:
rust_log: "info"
Configuration Sections
AWS Configuration
Configure AWS credentials for S3 backends:
Field | Type | Default | Description |
---|---|---|---|
aws.region | string | "us-east-1" | AWS region |
aws.access_key_id | string | null | AWS access key (optional) |
aws.secret_access_key | string | null | AWS secret key (optional) |
aws.https_conn_pool_size | number | 100 | HTTP connection pool size |
Note: S3 configuration is now done per-backend in the storage.backends
section.
DuckDB Persistence Configuration
BoilStream provides optional high-performance local DuckDB persistence alongside its diskless S3 pipeline. When enabled, data is simultaneously written to both S3 (diskless) and local DuckDB databases (shared across topics).
Field | Type | Default | Description |
---|---|---|---|
duckdb_persistence.enabled | boolean | false | Enable DuckDB persistence (10M+ rows/s) |
duckdb_persistence.storage_path | string | "/tmp/duckdb/topics" | Directory for shared DuckDB database files |
duckdb_persistence.max_writers | number | 10 | Number of concurrent database writers |
duckdb_persistence.dry_run | boolean | false | Process Arrow data but skip actual writes |
duckdb_persistence.super_dry_run | boolean | false | Completely skip DuckDB processing |
DuckDB Persistence Benefits
High Performance:
- 10+ million rows/second ingestion rate into local databases
- Shared database files allow cross-topic queries and joins
- No backup infrastructure needed - S3 provides automatic replication
Architecture Integration:
- Roadmap: Source for window queries and time-series analysis
- Roadmap: FlightSQL integration for direct BI tool connectivity
- Roadmap: Live queries over past N hours of ingested data
Example Configuration:
# High-performance dual storage: S3 (diskless) + DuckDB (local)
duckdb_persistence:
enabled: true
storage_path: "/data/duckdb"
max_writers: 16 # Scale with CPU cores
# Continues to write to S3 backends simultaneously
storage:
backends:
- name: "primary-s3"
backend_type: "s3"
enabled: true
# ... S3 configuration
DuckLake Configuration
BoilStream integrates with DuckLake. DuckLake automatically registers Parquet files in catalogs after successful upload to storage.
Field | Type | Default | Description |
---|---|---|---|
ducklake[].name | string | - | Unique identifier for this DuckLake catalog |
ducklake[].data_path | string | - | S3 path where Parquet files are stored |
ducklake[].attach | string | - | SQL statements for DuckLake attachment and setup |
ducklake[].topics | array | all topics | Optional: Specify which topics to include |
ducklake[].reconciliation.on_startup | boolean | true | Run reconciliation when application starts |
ducklake[].reconciliation.interval_minutes | number | 60 | Check for missing files every N minutes |
ducklake[].reconciliation.max_concurrent_registrations | number | 10 | Parallel registration limit |
Example Configuration:
ducklake:
- name: my_ducklake
data_path: "s3://ingestion-data/"
attach: |
INSTALL ducklake; INSTALL postgres; INSTALL aws;
LOAD ducklake; LOAD postgres; LOAD aws;
CREATE SECRET s3_access (TYPE S3, KEY_ID 'key', SECRET 'secret');
CREATE SECRET postgres (TYPE POSTGRES, HOST 'localhost', DATABASE 'catalog');
CREATE SECRET pg_secret (TYPE DUCKLAKE, DATA_PATH 's3://ingestion-data/',
METADATA_PARAMETERS MAP {'TYPE': 'postgres', 'SECRET': 'postgres'});
ATTACH 'ducklake:pg_secret' AS my_ducklake;
reconciliation:
on_startup: true
interval_minutes: 60
DuckLake Integration with Storage Backends:
Storage backends can automatically register files with DuckLake catalogs:
storage:
backends:
- name: "primary-s3"
backend_type: "s3"
# ... S3 configuration
ducklake: ["my_ducklake"] # Auto-register files with this catalog
See the DuckLake Integration guide for detailed setup instructions.
Storage Configuration
BoilStream supports multiple concurrent storage backends, allowing you to write data to several destinations simultaneously. This enables scenarios like:
- Primary + backup storage (S3 + filesystem)
- Multi-cloud redundancy (S3 + another cloud provider)
- Testing and auditing (production storage + debug/noop storage)
- DuckLake integration (automatic catalog registration)
Storage Backends
Configure multiple storage backends in the storage.backends
array:
Field | Type | Default | Description |
---|---|---|---|
storage.backends[].name | string | Unique identifier for this backend | |
storage.backends[].backend_type | string | Backend type: "s3" , "filesystem" , "noop" | |
storage.backends[].enabled | boolean | Whether this backend is active | |
storage.backends[].primary | boolean | If true, operations must succeed on this backend | |
storage.backends[].ducklake | array | [] | List of DuckLake catalogs to register files with |
Backend Types:
s3
- AWS S3 or S3-compatible storage (MinIO, etc.)filesystem
- Local or network filesystem storagenoop
- No-operation storage for testing/benchmarking
Primary vs Secondary Backends:
- Primary backends (
primary: true
) must succeed for the operation to be considered successful - Secondary backends (
primary: false
) are best-effort; failures are logged but don't fail the operation
Backend-Specific Configuration
S3 Backend Configuration:
Field | Type | Default | Description |
---|---|---|---|
endpoint | string | null | S3 endpoint URL (required for S3 backends) |
bucket | string | null | S3 bucket name (required for S3 backends) |
prefix | string | "" | Base prefix for S3 uploads (optional) |
access_key | string | null | S3 access key (required for S3 backends) |
secret_key | string | null | S3 secret key (required for S3 backends) |
region | string | "us-east-1" | AWS region (optional for S3 backends) |
use_path_style | boolean | auto-detected | Use path-style addressing (auto-detects MinIO) |
max_concurrent_uploads | number | 10 | Maximum concurrent uploads |
upload_id_pool_capacity | number | 100 | Upload ID pool capacity |
max_retries | number | 3 | Maximum retry attempts |
initial_backoff_ms | number | 100 | Initial backoff in milliseconds |
max_retry_attempts | number | 3 | Maximum retry attempts |
flush_interval_ms | number | 250 | Data sync interval in milliseconds |
max_multipart_object_size | number | 104857600 | Maximum multipart object size (100MB) |
Filesystem Backend Configuration:
Field | Type | Default | Description |
---|---|---|---|
prefix | string | "./storage" | Base directory path prefix for filesystem storage (required for filesystem backends) |
MinIO Configuration
MinIO is supported through the S3 backend type. To configure MinIO, use backend_type: "s3"
with these specific settings:
storage:
backends:
- name: "minio-storage"
backend_type: "s3" # Use S3 backend type for MinIO
enabled: true
primary: true
endpoint: "http://localhost:9000" # MinIO endpoint
bucket: "your-bucket-name"
prefix: "/"
access_key: "minioadmin"
secret_key: "minioadmin"
region: "us-east-1"
use_path_style: true # Required for MinIO
MinIO-specific notes:
- Always set
use_path_style: true
for MinIO compatibility - Use
backend_type: "s3"
(not a separate MinIO type) - The system automatically detects MinIO endpoints and sets path-style addressing if not explicitly configured
Environment Variable Configuration
You can configure multiple backends via the STORAGE_BACKENDS
environment variable:
# Enable S3 and filesystem backends (S3 primary, filesystem secondary)
STORAGE_BACKENDS="s3,filesystem" boilstream
# Enable only filesystem storage
STORAGE_BACKENDS="filesystem" boilstream
# Enable S3, filesystem, and noop for testing
STORAGE_BACKENDS="s3,filesystem,noop" boilstream
Example Configurations
Primary S3 + Backup Filesystem:
storage:
backends:
- name: "primary-s3"
backend_type: "s3"
enabled: true
primary: true
endpoint: "https://s3.amazonaws.com"
bucket: "my-production-bucket"
prefix: ""
access_key: "${AWS_ACCESS_KEY_ID}"
secret_key: "${AWS_SECRET_ACCESS_KEY}"
region: "us-east-1"
use_path_style: false
- name: "backup-filesystem"
backend_type: "filesystem"
enabled: true
primary: false
prefix: "/backup/storage"
MinIO Development Setup:
storage:
backends:
- name: "minio-dev"
backend_type: "s3"
enabled: true
primary: true
endpoint: "http://localhost:9000"
bucket: "ingestion-data"
prefix: "/"
access_key: "minioadmin"
secret_key: "minioadmin"
region: "us-east-1"
use_path_style: true # Required for MinIO
Development with NoOp for Performance Testing:
storage:
backends:
- name: "main-s3"
backend_type: "s3"
enabled: true
primary: true
endpoint: "http://localhost:9000"
bucket: "test-bucket"
prefix: ""
access_key: "minioadmin"
secret_key: "minioadmin"
use_path_style: true
- name: "perf-test"
backend_type: "noop"
enabled: true
primary: false
Filesystem Only (Local Development):
storage:
backends:
- name: "local-dev"
backend_type: "filesystem"
enabled: true
primary: true
prefix: "./local-storage"
Server Configuration
Configure server ports and threading:
Field | Type | Default | Description |
---|---|---|---|
server.tokio_worker_threads | number | null | Number of Tokio worker threads |
server.flight_thread_count | number | 1 | Number of FlightRPC threads |
server.flight_base_port | number | 50050 | Base port for FlightRPC servers |
server.admin_flight_port | number | 50160 | Admin service port |
server.consumer_flight_port | number | 50250 | Consumer service port |
server.valkey_url | string | "redis://localhost:6379" | Valkey/Redis connection URL |
Processing Configuration
Configure data processing behavior:
Field | Type | Default | Description |
---|---|---|---|
processing.data_processing_threads | number | 8 | Number of data processing threads |
processing.buffer_pool_max_size | number | 50 | Maximum buffer pool size |
processing.window_queue_capacity | number | 30000 | Window queue capacity |
processing.window_ms | number | 10000 | Window duration in milliseconds |
processing.dry_run | boolean | false | Enable dry run mode |
processing.include_metadata_columns | boolean | true | Include metadata columns |
processing.schema_validation_enabled | boolean | true | Enable schema validation |
Parquet Configuration
Field | Type | Default | Description |
---|---|---|---|
processing.parquet.compression | string | "ZSTD" | Parquet compression algorithm |
processing.parquet.dictionary_enabled | boolean | true | Enable dictionary encoding |
Rate Limiting Configuration
Configure request rate limiting:
Field | Type | Default | Description |
---|---|---|---|
rate_limiting.disabled | boolean | false | Disable rate limiting |
rate_limiting.max_requests | number | 15000000 | Max requests per second per producer |
rate_limiting.burst_limit | number | 20000000 | Burst limit |
rate_limiting.global_limit | number | 150000000 | Global requests per second |
rate_limiting.base_size_bytes | number | 4096 | Base size for rate limiting tokens |
TLS Configuration
Configure TLS encryption (Pro tier only):
Field | Type | Default | Description |
---|---|---|---|
tls.disabled | boolean | false | Disable TLS |
tls.cert_path | string | null | Path to certificate file |
tls.key_path | string | null | Path to private key file |
tls.cert_pem | string | null | Certificate as PEM string |
tls.key_pem | string | null | Private key as PEM string |
Authentication Configuration
Configure authentication providers (Pro tier only):
Field | Type | Default | Description |
---|---|---|---|
auth.providers | array | [] | List of authentication providers |
auth.authorization_enabled | boolean | false | Enable authorization |
auth.admin_groups | array | [] | Admin group names |
auth.read_only_groups | array | [] | Read-only group names |
See the Authentication & Authorization section for detailed provider configuration.
Metrics Configuration
Configure metrics collection:
Field | Type | Default | Description |
---|---|---|---|
metrics.port | number | 8081 | Metrics server port |
metrics.flush_interval_ms | number | 1000 | Metrics flush interval |
PGWire Server Configuration
BoilStream includes a built-in PostgreSQL wire protocol server that enables BI tools like DBeaver, Tableau, and psql to connect directly to your streaming data through the standard PostgreSQL protocol.
Field | Type | Default | Description |
---|---|---|---|
pgwire.enabled | boolean | true | Enable PGWire PostgreSQL protocol server |
pgwire.port | number | 5432 | Port for PostgreSQL protocol connections |
pgwire.username | string | "boilstream" | Username for PostgreSQL authentication |
pgwire.password | string | "boilstream" | Password for PostgreSQL authentication |
pgwire.refresh_interval_seconds | number | 5 | Database refresh interval in seconds |
pgwire.initialization_sql | string | "" | SQL commands to execute on DuckDB init |
pgwire.tls.enabled | boolean | false | Enable TLS for PostgreSQL connections (Pro tier only) |
pgwire.tls.cert_path | string | null | Path to TLS certificate file (Pro tier only) |
pgwire.tls.key_path | string | null | Path to TLS private key file (Pro tier only) |
pgwire.tls.cert_pem | string | null | TLS certificate as PEM string (Pro tier only) |
pgwire.tls.key_pem | string | null | TLS private key as PEM string (Pro tier only) |
Key Features:
- Full PostgreSQL Protocol Support: Compatible with any PostgreSQL client
- Cursor Support: Handles large result sets efficiently through extended query protocol
- Text and Binary Encoding: Supports both text and binary data formats
- Prepared Statements: Full prepared statement support with parameter binding
- Query Cancellation: Standard PostgreSQL query cancellation support
- TLS Encryption: Optional TLS encryption for secure connections (Pro tier only)
Example Configuration:
# PostgreSQL Protocol Server
pgwire:
enabled: true
port: 5432
username: "analyst"
password: "secure_password"
refresh_interval_seconds: 10
initialization_sql: |
INSTALL icu;
LOAD icu;
SET timezone = 'UTC';
tls:
enabled: true # Pro tier only
cert_path: "/etc/ssl/certs/pgwire.crt" # Pro tier only
key_path: "/etc/ssl/private/pgwire.key" # Pro tier only
Integration with DuckDB Persistence:
The PGWire server automatically integrates with DuckDB persistence when enabled, providing:
- Live Query Access: Query streaming data through PostgreSQL protocol
- Cross-Topic Joins: Join data across different topics using standard SQL
- BI Tool Compatibility: Connect any PostgreSQL-compatible BI tool directly
Environment Variable Overrides:
# Enable PGWire server
PGWIRE_ENABLED=true
PGWIRE_PORT=5432
PGWIRE_USERNAME=analyst
PGWIRE_PASSWORD=secure_password
# TLS Configuration (Pro tier only)
PGWIRE_TLS_ENABLED=true
PGWIRE_TLS_CERT_PATH=/etc/ssl/certs/pgwire.crt
PGWIRE_TLS_KEY_PATH=/etc/ssl/private/pgwire.key
# Or use PEM strings directly (Pro tier only)
PGWIRE_TLS_CERT_PEM="-----BEGIN CERTIFICATE-----..."
PGWIRE_TLS_KEY_PEM="-----BEGIN PRIVATE KEY-----..."
See the PGWire Server Guide for detailed setup instructions and BI tool integration examples.
Logging Configuration
Configure logging levels:
Field | Type | Default | Description |
---|---|---|---|
logging.rust_log | string | "info" | Log level configuration |
Environment Variable Override
All YAML configuration fields can be overridden with environment variables. The environment variable names follow this pattern:
- Nested fields are joined with underscores
- All uppercase
- Boolean values:
"true"
,"false"
,"1"
,"0"
- Arrays: comma-separated values
Examples
# Override AWS region
AWS_REGION=us-west-2
# Override S3 bucket
S3_BUCKET=my-production-bucket
# Override server port
FLIGHT_BASE_PORT=8080
# Override processing settings
DATA_PROCESSING_THREADS=16
INCLUDE_METADATA_COLUMNS=false
# Override storage backends (comma-separated)
STORAGE_BACKENDS=s3,filesystem
STORAGE_FILESYSTEM_PREFIX=/data/storage
# Override Valkey/Redis connection
VALKEY_URL=redis://production-redis:6379
# Override authentication providers (comma-separated)
AUTH_PROVIDERS=cognito,azure
ADMIN_GROUPS=admin,superuser
Development vs Production
Development Configuration
For local development, create a dev-config.yaml
:
aws:
region: "us-east-1"
s3:
bucket: "my-dev-bucket"
storage:
backends:
- name: "local-filesystem"
backend_type: "filesystem"
enabled: true
primary: true
prefix: "./dev-storage"
# Optional: Add S3 for testing cloud integration
# - name: "dev-s3"
# backend_type: "s3"
# enabled: false
# primary: false
# endpoint: "http://localhost:9000"
# bucket: "dev-bucket"
# prefix: ""
# access_key: "minioadmin"
# secret_key: "minioadmin"
# use_path_style: true
server:
tokio_worker_threads: 16
valkey_url: "redis://localhost:6379"
tls:
disabled: true
auth:
providers: []
logging:
rust_log: "debug"
Production Configuration
For production, create a prod-config.yaml
:
aws:
region: "eu-west-1"
storage:
backends:
- name: "primary-s3"
backend_type: "s3"
enabled: true
primary: true
endpoint: "https://s3.amazonaws.com"
bucket: "my-production-bucket"
prefix: ""
access_key: "${AWS_ACCESS_KEY_ID}"
secret_key: "${AWS_SECRET_ACCESS_KEY}"
region: "eu-west-1"
use_path_style: false
- name: "backup-filesystem"
backend_type: "filesystem"
enabled: true
primary: false # Secondary for backup/audit
prefix: "/data/backup-storage"
server:
tokio_worker_threads: 16
valkey_url: "redis://localhost:6379"
processing:
data_processing_threads: 16
window_queue_capacity: 100000
rate_limiting:
max_requests: 50000000
burst_limit: 75000000
tls:
disabled: false
cert_path: "/etc/ssl/certs/server.crt"
key_path: "/etc/ssl/private/server.key"
auth:
providers: ["cognito"]
authorization_enabled: true
admin_groups: ["admin"]
logging:
rust_log: "info"
Usage Examples
Basic Development Setup
# Create config file
cat > dev-config.yaml << EOF
aws:
region: "us-east-1"
storage:
backends:
- name: "dev-s3"
backend_type: "s3"
enabled: true
primary: true
endpoint: "http://localhost:9000"
bucket: "my-dev-bucket"
prefix: ""
access_key: "minioadmin"
secret_key: "minioadmin"
region: "us-east-1"
use_path_style: true
server:
valkey_url: "redis://localhost:6379"
logging:
rust_log: "debug"
EOF
# Run with config file
boilstream --config dev-config.yaml
Production with Environment Overrides
# Use production config but override bucket via environment
S3_BUCKET=production-bucket-2024 boilstream --config prod-config.yaml
Environment Variables Only
# No config file, all via environment
AWS_REGION=eu-west-1 \
S3_BUCKET=my-bucket \
TLS_DISABLED=true \
boilstream
Multi-Backend Examples
# Primary S3 + backup filesystem
STORAGE_BACKENDS="s3,filesystem" \
STORAGE_FILESYSTEM_PREFIX="/backup" \
S3_BUCKET=my-bucket \
VALKEY_URL=redis://localhost:6379 \
boilstream
# Filesystem only for local development
STORAGE_BACKENDS="filesystem" \
STORAGE_FILESYSTEM_PREFIX="./local-storage" \
VALKEY_URL=redis://localhost:6379 \
boilstream
# S3 + NoOp for performance testing
STORAGE_BACKENDS="s3,noop" \
S3_BUCKET=perf-test-bucket \
boilstream
Validation
BoilStream validates configuration on startup and will exit with an error if:
- Required fields are missing (e.g.,
S3_BUCKET
) - Invalid values are provided (e.g., port 0)
- Referenced files don't exist (e.g., TLS certificates)
Check the logs for detailed validation error messages.