Configuration
BoilStream uses a YAML configuration file for all settings. On first run, it automatically generates a config.yaml file with sensible defaults, making it easy to get started.
Auto-Generated Configuration
When you run BoilStream for the first time:
# First run - automatically generates config.yaml
./boilstreamThis creates a config.yaml file in the current directory with:
- Default ports for all interfaces
- Local filesystem storage backend
- Development-friendly settings (no TLS, no auth)
- Sensible performance defaults
Using Configuration Files
Default Configuration
# Uses config.yaml from current directory
./boilstreamCustom Configuration File
# Use a specific configuration file
./boilstream --config production.yaml
# Or use environment variable
CONFIG_FILE=production.yaml ./boilstreamConfiguration Priority
Settings are applied in this order (later overrides earlier):
- Built-in defaults
- Configuration file (YAML)
- Environment variables (for specific overrides)
Configuration File Structure
The config.yaml file is organized into logical sections. Here's a complete example with all available options:
# AWS Configuration
aws:
region: "eu-west-1"
# access_key_id: "your-access-key" # Optional - can use AWS CLI/IAM roles
# secret_access_key: "your-secret-key" # Optional - can use AWS CLI/IAM roles
https_conn_pool_size: 100
# DuckDB Persistence Configuration (Optional High-Performance Local Storage)
duckdb_persistence:
enabled: true # Enable high-performance local DuckDB persistence (10M+ rows/s)
storage_path: "/tmp/duckdb/topics" # Directory for shared DuckDB database files
max_writers: 10 # Number of concurrent database writers for optimal performance
# Storage Configuration
storage:
# Multiple storage backends can be configured simultaneously
backends:
- name: "primary-s3"
backend_type: "s3"
enabled: true
primary: true # Primary backend - operations must succeed here
# S3-specific configuration
endpoint: "http://localhost:9000" # For MinIO/custom S3
bucket: "ingestion-data"
prefix: "/"
access_key: "minioadmin"
secret_key: "minioadmin"
region: "us-east-1"
use_path_style: true # Required for MinIO
max_concurrent_uploads: 10
upload_id_pool_capacity: 100
max_retries: 3
initial_backoff_ms: 100
max_retry_attempts: 3
flush_interval_ms: 250
max_multipart_object_size: 104857600 # 100 MB
- name: "backup-filesystem"
backend_type: "filesystem"
enabled: true
primary: false # Secondary backend - failures are logged but not fatal
# Filesystem-specific configuration
prefix: "/tmp/storage"
# - name: "debug-noop"
# backend_type: "noop"
# enabled: false
# primary: false # For testing/benchmarking without actual storage
# Server Configuration
server:
# tokio_worker_threads: 16 # Optional - defaults to system CPU count
flight_thread_count: 1
flight_base_port: 50050
admin_flight_port: 50160
consumer_flight_port: 50250
# Data Processing Configuration
processing:
data_processing_threads: 8
buffer_pool_max_size: 50
window_queue_capacity: 30000
window_ms: 10000
include_metadata_columns: true
schema_validation_enabled: true
parquet:
compression: "ZSTD"
dictionary_enabled: true
# Rate Limiting Configuration
rate_limiting:
disabled: false
max_requests: 15000000
burst_limit: 20000000
global_limit: 150000000
base_size_bytes: 4096
# TLS Configuration
tls:
disabled: true # Disabled for development
# cert_path: "/path/to/cert.pem"
# key_path: "/path/to/key.pem"
# cert_pem: "-----BEGIN CERTIFICATE-----\n..."
# key_pem: "-----BEGIN PRIVATE KEY-----\n..."
# grpc_default_ssl_roots_file_path: "/path/to/ca-certificates.crt"
# Authentication Configuration
auth:
providers: [] # Empty for development - no auth
authorization_enabled: false
admin_groups: []
read_only_groups: []
cognito:
# user_pool_id: "us-east-1_example"
# region: "us-east-1"
# audience: "client-id"
azure:
# tenant_id: "tenant-id"
# client_id: "client-id"
allow_multi_tenant: false
gcp:
# client_id: "client-id"
# project_id: "project-id"
require_workspace_domain: false
auth0:
# tenant: "your-tenant.auth0.com"
# audience: "your-api-identifier"
# groups_namespace: "https://your-app.com/groups"
okta:
# org_domain: "your-org.okta.com"
# audience: "api://your-audience"
# auth_server_id: "your-auth-server"
# Metrics Configuration
metrics:
port: 8081
flush_interval_ms: 1000
# Logging Configuration
logging:
rust_log: "info"Configuration Sections
AWS Configuration
Configure AWS credentials for S3 backends:
| Field | Type | Default | Description |
|---|---|---|---|
aws.region | string | "us-east-1" | AWS region |
aws.access_key_id | string | null | AWS access key (optional) |
aws.secret_access_key | string | null | AWS secret key (optional) |
aws.https_conn_pool_size | number | 100 | HTTP connection pool size |
Note: S3 configuration is now done per-backend in the storage.backends section.
DuckDB Persistence Configuration
BoilStream provides optional high-performance local DuckDB persistence alongside its diskless S3 pipeline. When enabled, data is simultaneously written to both S3 (diskless) and local DuckDB databases (shared across topics).
| Field | Type | Default | Description |
|---|---|---|---|
duckdb_persistence.enabled | boolean | false | Enable DuckDB persistence (10M+ rows/s) |
duckdb_persistence.storage_path | string | "/tmp/duckdb/topics" | Directory for shared DuckDB database files |
duckdb_persistence.max_writers | number | 10 | Number of concurrent database writers |
duckdb_persistence.dry_run | boolean | false | Process Arrow data but skip actual writes |
duckdb_persistence.super_dry_run | boolean | false | Completely skip DuckDB processing |
DuckDB Persistence Benefits
High Performance:
- 10+ million rows/second ingestion rate into local databases
- Shared database files allow cross-topic queries and joins
- No backup infrastructure needed - S3 provides automatic replication
Architecture Integration:
- BI Tool Integration: PostgreSQL interface and FlightSQL for direct connectivity
- Future Roadmap: Source for window queries and time-series analysis
- Future Roadmap: Live queries over past N hours of ingested data
Example Configuration:
# High-performance dual storage: S3 (diskless) + DuckDB (local)
duckdb_persistence:
enabled: true
storage_path: "/data/duckdb"
max_writers: 16 # Scale with CPU cores
# Continues to write to S3 backends simultaneously
storage:
backends:
- name: "primary-s3"
backend_type: "s3"
enabled: true
# ... S3 configurationDuckLake Configuration
BoilStream integrates with DuckLake. DuckLake automatically registers Parquet files in catalogs after successful upload to storage.
| Field | Type | Default | Description |
|---|---|---|---|
ducklake[].name | string | - | Unique identifier for this DuckLake catalog |
ducklake[].data_path | string | - | S3 path where Parquet files are stored |
ducklake[].attach | string | - | SQL statements for DuckLake attachment and setup |
ducklake[].topics | array | all topics | Optional: Specify which topics to include |
ducklake[].reconciliation.on_startup | boolean | true | Run reconciliation when application starts |
ducklake[].reconciliation.interval_minutes | number | 60 | Check for missing files every N minutes |
ducklake[].reconciliation.max_concurrent_registrations | number | 10 | Parallel registration limit |
Example Configuration:
ducklake:
- name: my_ducklake
data_path: "s3://ingestion-data/"
attach: |
INSTALL ducklake; INSTALL postgres; INSTALL aws;
LOAD ducklake; LOAD postgres; LOAD aws;
CREATE SECRET s3_access (TYPE S3, KEY_ID 'key', SECRET 'secret');
CREATE SECRET postgres (TYPE POSTGRES, HOST 'localhost', DATABASE 'catalog');
CREATE SECRET pg_secret (TYPE DUCKLAKE, DATA_PATH 's3://ingestion-data/',
METADATA_PARAMETERS MAP {'TYPE': 'postgres', 'SECRET': 'postgres'});
ATTACH 'ducklake:pg_secret' AS my_ducklake;
reconciliation:
on_startup: true
interval_minutes: 60DuckLake Integration with Storage Backends:
Storage backends can automatically register files with DuckLake catalogs:
storage:
backends:
- name: "primary-s3"
backend_type: "s3"
# ... S3 configuration
ducklake: ["my_ducklake"] # Auto-register files with this catalogSee the DuckLake Integration guide for detailed setup instructions.
Storage Configuration
BoilStream supports multiple concurrent storage backends, allowing you to write data to several destinations simultaneously. This enables scenarios like:
- Primary + backup storage (S3 + filesystem)
- Multi-cloud redundancy (S3 + another cloud provider)
- Testing and auditing (production storage + debug/noop storage)
- DuckLake integration (automatic catalog registration)
Storage Backends
Configure multiple storage backends in the storage.backends array:
| Field | Type | Default | Description |
|---|---|---|---|
storage.backends[].name | string | Unique identifier for this backend | |
storage.backends[].backend_type | string | Backend type: "s3", "filesystem", "noop" | |
storage.backends[].enabled | boolean | Whether this backend is active | |
storage.backends[].primary | boolean | If true, operations must succeed on this backend | |
storage.backends[].ducklake | array | [] | List of DuckLake catalogs to register files with |
Backend Types:
s3- AWS S3 or S3-compatible storage (MinIO, etc.)filesystem- Local or network filesystem storagenoop- No-operation storage for testing/benchmarking
Primary vs Secondary Backends:
- Primary backends (
primary: true) must succeed for the operation to be considered successful - Secondary backends (
primary: false) are best-effort; failures are logged but don't fail the operation
Backend-Specific Configuration
S3 Backend Configuration:
| Field | Type | Default | Description |
|---|---|---|---|
endpoint | string | null | S3 endpoint URL (required for S3 backends) |
bucket | string | null | S3 bucket name (required for S3 backends) |
prefix | string | "" | Base prefix for S3 uploads (optional) |
access_key | string | null | S3 access key (required for S3 backends) |
secret_key | string | null | S3 secret key (required for S3 backends) |
region | string | "us-east-1" | AWS region (optional for S3 backends) |
use_path_style | boolean | auto-detected | Use path-style addressing (auto-detects MinIO) |
max_concurrent_uploads | number | 10 | Maximum concurrent uploads |
upload_id_pool_capacity | number | 100 | Upload ID pool capacity |
max_retries | number | 3 | Maximum retry attempts |
initial_backoff_ms | number | 100 | Initial backoff in milliseconds |
max_retry_attempts | number | 3 | Maximum retry attempts |
flush_interval_ms | number | 250 | Data sync interval in milliseconds |
max_multipart_object_size | number | 104857600 | Maximum multipart object size (100MB) |
Filesystem Backend Configuration:
| Field | Type | Default | Description |
|---|---|---|---|
prefix | string | "./storage" | Base directory path prefix for filesystem storage (required for filesystem backends) |
MinIO Configuration
MinIO is supported through the S3 backend type. To configure MinIO, use backend_type: "s3" with these specific settings:
storage:
backends:
- name: "minio-storage"
backend_type: "s3" # Use S3 backend type for MinIO
enabled: true
primary: true
endpoint: "http://localhost:9000" # MinIO endpoint
bucket: "your-bucket-name"
prefix: "/"
access_key: "minioadmin"
secret_key: "minioadmin"
region: "us-east-1"
use_path_style: true # Required for MinIOMinIO-specific notes:
- Always set
use_path_style: truefor MinIO compatibility - Use
backend_type: "s3"(not a separate MinIO type) - The system automatically detects MinIO endpoints and sets path-style addressing if not explicitly configured
Environment Variable Configuration
You can configure multiple backends via the STORAGE_BACKENDS environment variable:
# Enable S3 and filesystem backends (S3 primary, filesystem secondary)
STORAGE_BACKENDS="s3,filesystem" boilstream
# Enable only filesystem storage
STORAGE_BACKENDS="filesystem" boilstream
# Enable S3, filesystem, and noop for testing
STORAGE_BACKENDS="s3,filesystem,noop" boilstreamExample Configurations
Primary S3 + Backup Filesystem:
storage:
backends:
- name: "primary-s3"
backend_type: "s3"
enabled: true
primary: true
endpoint: "https://s3.amazonaws.com"
bucket: "my-production-bucket"
prefix: ""
access_key: "${AWS_ACCESS_KEY_ID}"
secret_key: "${AWS_SECRET_ACCESS_KEY}"
region: "us-east-1"
use_path_style: false
- name: "backup-filesystem"
backend_type: "filesystem"
enabled: true
primary: false
prefix: "/backup/storage"MinIO Development Setup:
storage:
backends:
- name: "minio-dev"
backend_type: "s3"
enabled: true
primary: true
endpoint: "http://localhost:9000"
bucket: "ingestion-data"
prefix: "/"
access_key: "minioadmin"
secret_key: "minioadmin"
region: "us-east-1"
use_path_style: true # Required for MinIODevelopment with NoOp for Performance Testing:
storage:
backends:
- name: "main-s3"
backend_type: "s3"
enabled: true
primary: true
endpoint: "http://localhost:9000"
bucket: "test-bucket"
prefix: ""
access_key: "minioadmin"
secret_key: "minioadmin"
use_path_style: true
- name: "perf-test"
backend_type: "noop"
enabled: true
primary: falseFilesystem Only (Local Development):
storage:
backends:
- name: "local-dev"
backend_type: "filesystem"
enabled: true
primary: true
prefix: "./local-storage"Server Configuration
Configure server ports and threading:
| Field | Type | Default | Description |
|---|---|---|---|
server.tokio_worker_threads | number | null | Number of Tokio worker threads |
server.flight_thread_count | number | 1 | Number of FlightRPC threads |
server.flight_base_port | number | 50050 | Base port for FlightRPC servers |
Processing Configuration
Configure data processing behavior:
| Field | Type | Default | Description |
|---|---|---|---|
processing.data_processing_threads | number | 8 | Number of data processing threads |
processing.buffer_pool_max_size | number | 50 | Maximum buffer pool size |
processing.window_queue_capacity | number | 30000 | Window queue capacity |
processing.window_ms | number | 10000 | Window duration in milliseconds |
processing.dry_run | boolean | false | Enable dry run mode |
processing.include_metadata_columns | boolean | true | Include metadata columns |
processing.schema_validation_enabled | boolean | true | Enable schema validation |
Parquet Configuration
| Field | Type | Default | Description |
|---|---|---|---|
processing.parquet.compression | string | "ZSTD" | Parquet compression algorithm |
processing.parquet.dictionary_enabled | boolean | true | Enable dictionary encoding |
Rate Limiting Configuration
Configure request rate limiting:
| Field | Type | Default | Description |
|---|---|---|---|
rate_limiting.disabled | boolean | false | Disable rate limiting |
rate_limiting.max_requests | number | 15000000 | Max requests per second per producer |
rate_limiting.burst_limit | number | 20000000 | Burst limit |
rate_limiting.global_limit | number | 150000000 | Global requests per second |
rate_limiting.base_size_bytes | number | 4096 | Base size for rate limiting tokens |
TLS Configuration
Configure TLS encryption (Pro tier only):
| Field | Type | Default | Description |
|---|---|---|---|
tls.disabled | boolean | false | Disable TLS |
tls.cert_path | string | null | Path to certificate file |
tls.key_path | string | null | Path to private key file |
tls.cert_pem | string | null | Certificate as PEM string |
tls.key_pem | string | null | Private key as PEM string |
Authentication Configuration
Configure authentication providers (Pro tier only):
| Field | Type | Default | Description |
|---|---|---|---|
auth.providers | array | [] | List of authentication providers |
auth.authorization_enabled | boolean | false | Enable authorization |
auth.admin_groups | array | [] | Admin group names |
auth.read_only_groups | array | [] | Read-only group names |
PostgreSQL Web Authentication Server
Configure the built-in web authentication server for PostgreSQL access:
| Field | Type | Default | Description |
|---|---|---|---|
auth_server.enabled | boolean | false | Enable PostgreSQL web authentication server |
auth_server.port | number | 443 | HTTPS port for web UI |
auth_server.static_dir | string | "static/auth" | Directory containing web UI files |
auth_server.session_ttl_hours | number | 8 | PostgreSQL session TTL in hours |
auth_server.users_db_path | string | "data/users.duckdb" | Path to encrypted users database |
auth_server.encryption_key_path | string | null | Path to encryption key file (optional) |
auth_server.tls_cert | string | null | Path to TLS certificate (falls back to http_ingestion) |
auth_server.tls_key | string | null | Path to TLS private key (falls back to http_ingestion) |
auth_server.webauthn_rp_id | string | "localhost" | WebAuthn Relying Party ID (domain) |
auth_server.webauthn_rp_origin | string | "https://localhost" | WebAuthn origin URL |
auth_server.cors_allowed_origins | array | [] | CORS allowed origins for web UI |
auth_server.users_backup_backend | string | null | Storage backend name for automatic backups |
auth_server.users_backup_interval_seconds | number | 60 | Minimum seconds between automatic backups |
auth_server.users_backup_path | string | "auth/users.duckdb" | Path in storage backend for backup file |
auth_server.email_encryption_pgp_public_key_path | string | null | Path to PGP public key for email encryption (optional) |
auth_server.email_encryption_pgp_public_key | string | null | PGP public key as inline string (optional) |
Initial Setup Flow
On first startup with auth_server.enabled: true:
Encryption Key Prompt:
- If
encryption_key_pathis set and file doesn't exist: Prompts for key, then saves it - If
encryption_key_pathis set and file exists: Loads key automatically (no prompt) - If
encryption_key_pathnot set: Prompts for key every startup (not saved) - If stdin is piped: Reads key from pipe (never saved)
- If
Superadmin Password Prompt:
- Only on first run after encryption key is set
- Creates superadmin account with username
"boilstream" - Subsequent runs: No prompt (password stored encrypted)
Automated Startup:
- For production deployments: Set
encryption_key_pathto existing file - Server starts automatically after initial setup (no manual intervention)
- For production deployments: Set
Database Encryption
User databases are always encrypted when auth_server is enabled. Encryption key is required to decrypt users.duckdb and superadmin.duckdb.
Key Management:
- Development:
encryption_key_path: "encryption.key"(auto-generates on first run) - Production: Store key in secrets manager, mount as file or pipe via stdin
- High Security: No
encryption_key_path(manual entry every startup)
See PostgreSQL Web Authentication for detailed setup instructions.
OAuth Providers Configuration
Configure OAuth providers for PostgreSQL web authentication. This section is separate from FlightRPC JWT authentication.
GitHub OAuth
oauth_providers:
github:
client_id: "your-github-app-client-id"
client_secret: "your-github-app-client-secret"
redirect_uri: "https://your-domain/auth/callback"
# Organization access control
allowed_orgs:
- "your-company"
- "partner-org"
# GitHub team to database role mappings
team_role_mappings:
"your-company/platform-admins": "admin"
"your-company/data-engineers": "write"
"your-company/analysts": "read"
"partner-org/integration-team": "write"
# Audit team membership even without RBAC
audit_org_teams: false| Field | Type | Default | Description |
|---|---|---|---|
client_id | string | null | GitHub OAuth App Client ID |
client_secret | string | null | GitHub OAuth App Client Secret |
redirect_uri | string | null | OAuth callback URL (must match GitHub app settings) |
allowed_orgs | array | [] | GitHub organizations allowed to login (empty = all) |
team_role_mappings | object | {} | Map GitHub teams to roles (admin/write/read) |
audit_org_teams | boolean | false | Fetch org/team membership for audit logging |
Role Mapping Format: "org-name/team-slug": "role"
- Admin role: Full database access (DDL + DML)
- Write role: Create topics, insert data, query
- Read role: Query data only (no DDL)
Google OAuth
oauth_providers:
google:
client_id: "your-client-id.apps.googleusercontent.com"
client_secret: "your-google-client-secret"
redirect_uri: "https://your-domain/auth/callback"
# Email domain restrictions
allowed_domains:
- "yourcompany.com"
- "partner.com"| Field | Type | Default | Description |
|---|---|---|---|
client_id | string | null | Google OAuth Client ID |
client_secret | string | null | Google OAuth Client Secret |
redirect_uri | string | null | OAuth callback URL |
allowed_domains | array | [] | Allowed email domains (empty = all domains) |
SAML SSO
oauth_providers:
saml:
- name: "aws-sso"
enabled: true
sp_entity_id: "https://your-domain"
sp_acs_url: "https://your-domain/auth/saml/acs"
sp_slo_url: "https://your-domain/auth/saml/logout"
idp_entity_id: "arn:aws:siam::123456789012:saml-provider/YourProvider"
idp_sso_url: "https://portal.sso.region.amazonaws.com/saml/assertion/..."
idp_certificate: |
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
sp_certificate: "/path/to/sp-cert.pem"
sp_private_key: "/path/to/sp-key.pem"| Field | Type | Default | Description |
|---|---|---|---|
name | string | null | SAML provider identifier |
enabled | boolean | false | Enable this SAML provider |
sp_entity_id | string | null | Service Provider entity ID (your domain) |
sp_acs_url | string | null | Assertion Consumer Service URL |
idp_entity_id | string | null | Identity Provider entity ID |
idp_sso_url | string | null | Identity Provider SSO endpoint |
idp_certificate | string | null | IDP certificate (PEM format) |
sp_certificate | string | null | SP certificate path or PEM string |
sp_private_key | string | null | SP private key path or PEM string |
PostgreSQL Web Auth vs FlightRPC Auth
These OAuth providers are for PostgreSQL web authentication only (web UI login → temporary PostgreSQL credentials).
For FlightRPC/DuckDB Airport authentication, use the auth section with JWT providers (AWS Cognito, Azure AD, etc.).
See Authentication Systems for the distinction.
Complete Configuration Example:
# PostgreSQL Web Authentication
auth_server:
enabled: true
port: 443
session_ttl_hours: 8
users_db_path: "data/users.duckdb"
encryption_key_path: "/etc/boilstream/encryption.key"
webauthn_rp_id: "boilstream.example.com"
webauthn_rp_origin: "https://boilstream.example.com"
cors_allowed_origins:
- "https://boilstream.example.com"
users_backup_backend: "primary-s3"
users_backup_interval_seconds: 300
# Optional PGP encryption for deleted account emails (GDPR compliance)
email_encryption_pgp_public_key_path: "/etc/boilstream/pgp/public.asc"
# Or provide key directly:
# email_encryption_pgp_public_key: |
# -----BEGIN PGP PUBLIC KEY BLOCK-----
# ...
# -----END PGP PUBLIC KEY BLOCK-----
oauth_providers:
github:
client_id: "${GITHUB_CLIENT_ID}"
client_secret: "${GITHUB_CLIENT_SECRET}"
redirect_uri: "https://boilstream.example.com/auth/callback"
allowed_orgs: ["mycompany"]
team_role_mappings:
"mycompany/admins": "admin"
"mycompany/engineers": "write"
"mycompany/analysts": "read"
google:
client_id: "${GOOGLE_CLIENT_ID}"
client_secret: "${GOOGLE_CLIENT_SECRET}"
redirect_uri: "https://boilstream.example.com/auth/callback"
allowed_domains: ["mycompany.com"]See PostgreSQL Web Authentication for detailed setup instructions and usage examples.
Metrics Configuration
Configure metrics collection:
| Field | Type | Default | Description |
|---|---|---|---|
metrics.port | number | 8081 | Metrics server port |
metrics.flush_interval_ms | number | 1000 | Metrics flush interval |
PGWire Server Configuration
BoilStream includes a built-in PostgreSQL wire protocol server that enables BI tools like DBeaver, Tableau, and psql to connect directly to your streaming data through the standard PostgreSQL protocol.
| Field | Type | Default | Description |
|---|---|---|---|
pgwire.enabled | boolean | true | Enable PGWire PostgreSQL protocol server |
pgwire.port | number | 5432 | Port for PostgreSQL protocol connections |
pgwire.username | string | "boilstream" | Username for PostgreSQL authentication |
pgwire.password | string | "boilstream" | Password for PostgreSQL authentication |
pgwire.refresh_interval_seconds | number | 5 | Database refresh interval in seconds |
pgwire.initialization_sql | string | "" | SQL commands to execute on DuckDB init |
pgwire.tls.enabled | boolean | false | Enable TLS for PostgreSQL connections (Pro tier only) |
pgwire.tls.cert_path | string | null | Path to TLS certificate file (Pro tier only) |
pgwire.tls.key_path | string | null | Path to TLS private key file (Pro tier only) |
pgwire.tls.cert_pem | string | null | TLS certificate as PEM string (Pro tier only) |
pgwire.tls.key_pem | string | null | TLS private key as PEM string (Pro tier only) |
Key Features:
- Full PostgreSQL Protocol Support: Compatible with any PostgreSQL client
- Cursor Support: Handles large result sets efficiently through extended query protocol
- Text and Binary Encoding: Supports both text and binary data formats
- Prepared Statements: Full prepared statement support with parameter binding
- Query Cancellation: Standard PostgreSQL query cancellation support
- TLS Encryption: Optional TLS encryption for secure connections (Pro tier only)
Example Configuration:
# PostgreSQL Protocol Server
pgwire:
enabled: true
port: 5432
username: "analyst"
password: "secure_password"
refresh_interval_seconds: 10
initialization_sql: |
INSTALL icu;
LOAD icu;
SET timezone = 'UTC';
tls:
enabled: true # Pro tier only
cert_path: "/etc/ssl/certs/pgwire.crt" # Pro tier only
key_path: "/etc/ssl/private/pgwire.key" # Pro tier onlyIntegration with DuckDB Persistence:
The PGWire server automatically integrates with DuckDB persistence when enabled, providing:
- Live Query Access: Query streaming data through PostgreSQL protocol
- Cross-Topic Joins: Join data across different topics using standard SQL
- BI Tool Compatibility: Connect any PostgreSQL-compatible BI tool directly
Environment Variable Overrides:
# Enable PGWire server
PGWIRE_ENABLED=true
PGWIRE_PORT=5432
PGWIRE_USERNAME=analyst
PGWIRE_PASSWORD=secure_password
# TLS Configuration (Pro tier only)
PGWIRE_TLS_ENABLED=true
PGWIRE_TLS_CERT_PATH=/etc/ssl/certs/pgwire.crt
PGWIRE_TLS_KEY_PATH=/etc/ssl/private/pgwire.key
# Or use PEM strings directly (Pro tier only)
PGWIRE_TLS_CERT_PEM="-----BEGIN CERTIFICATE-----..."
PGWIRE_TLS_KEY_PEM="-----BEGIN PRIVATE KEY-----..."See the PostgreSQL Interface Guide for detailed setup instructions and BI tool integration examples.
Logging Configuration
Configure logging levels:
| Field | Type | Default | Description |
|---|---|---|---|
logging.rust_log | string | "info" | Log level configuration |
Environment Variables (Advanced)
While the YAML configuration file is the recommended approach, environment variables can be used for specific overrides, particularly useful for:
- Secrets that shouldn't be stored in files
- Container deployments where environment injection is standard
- Quick testing of different values
Environment variables follow the pattern: uppercase field names joined with underscores.
Recommendation
Use the YAML configuration file for most settings and environment variables only for secrets or deployment-specific overrides.
Common Overrides
# Override sensitive credentials (don't store in config.yaml)
export AWS_ACCESS_KEY_ID="your-key"
export AWS_SECRET_ACCESS_KEY="your-secret"
# Override storage bucket for different environments
export S3_BUCKET="production-bucket"Development vs Production
Development Configuration
The auto-generated config.yaml is already optimized for development. You can customize it further:
aws:
region: "us-east-1"
s3:
bucket: "my-dev-bucket"
storage:
backends:
- name: "local-filesystem"
backend_type: "filesystem"
enabled: true
primary: true
prefix: "./dev-storage"
# Optional: Add S3 for testing cloud integration
# - name: "dev-s3"
# backend_type: "s3"
# enabled: false
# primary: false
# endpoint: "http://localhost:9000"
# bucket: "dev-bucket"
# prefix: ""
# access_key: "minioadmin"
# secret_key: "minioadmin"
# use_path_style: true
server:
tokio_worker_threads: 16
tls:
disabled: true
auth:
providers: []
logging:
rust_log: "debug"Production Configuration
For production, copy and modify the auto-generated config:
aws:
region: "eu-west-1"
storage:
backends:
- name: "primary-s3"
backend_type: "s3"
enabled: true
primary: true
endpoint: "https://s3.amazonaws.com"
bucket: "my-production-bucket"
prefix: ""
access_key: "${AWS_ACCESS_KEY_ID}"
secret_key: "${AWS_SECRET_ACCESS_KEY}"
region: "eu-west-1"
use_path_style: false
- name: "backup-filesystem"
backend_type: "filesystem"
enabled: true
primary: false # Secondary for backup/audit
prefix: "/data/backup-storage"
server:
tokio_worker_threads: 16
processing:
data_processing_threads: 16
window_queue_capacity: 100000
rate_limiting:
max_requests: 50000000
burst_limit: 75000000
tls:
disabled: false
cert_path: "/etc/ssl/certs/server.crt"
key_path: "/etc/ssl/private/server.key"
auth:
providers: ["cognito"]
authorization_enabled: true
admin_groups: ["admin"]
logging:
rust_log: "info"Usage Examples
Basic Development Setup
# Create config file
cat > dev-config.yaml << EOF
aws:
region: "us-east-1"
storage:
backends:
- name: "dev-s3"
backend_type: "s3"
enabled: true
primary: true
endpoint: "http://localhost:9000"
bucket: "my-dev-bucket"
prefix: ""
access_key: "minioadmin"
secret_key: "minioadmin"
region: "us-east-1"
use_path_style: true
server:
logging:
rust_log: "debug"
EOF
# Run with config file
boilstream --config dev-config.yamlProduction with Environment Overrides
# Use production config but override bucket via environment
S3_BUCKET=production-bucket-2024 boilstream --config prod-config.yamlEnvironment Variables Only
# No config file, all via environment
AWS_REGION=eu-west-1 \
S3_BUCKET=my-bucket \
TLS_DISABLED=true \
boilstreamMulti-Backend Examples
# Primary S3 + backup filesystem
STORAGE_BACKENDS="s3,filesystem" \
STORAGE_FILESYSTEM_PREFIX="/backup" \
S3_BUCKET=my-bucket \
boilstream
# Filesystem only for local development
STORAGE_BACKENDS="filesystem" \
STORAGE_FILESYSTEM_PREFIX="./local-storage" \
boilstream
# S3 + NoOp for performance testing
STORAGE_BACKENDS="s3,noop" \
S3_BUCKET=perf-test-bucket \
boilstreamValidation
BoilStream validates configuration on startup and will exit with an error if:
- Required fields are missing (e.g.,
S3_BUCKET) - Invalid values are provided (e.g., port 0)
- Referenced files don't exist (e.g., TLS certificates)
Check the logs for detailed validation error messages.