Skip to content

Configuration

BoilStream uses a YAML configuration file for all settings. On first run, it automatically generates a config.yaml file with sensible defaults, making it easy to get started.

Auto-Generated Configuration

When you run BoilStream for the first time:

bash
# First run - automatically generates config.yaml
./boilstream

This creates a config.yaml file in the current directory with:

  • Default ports for all interfaces
  • Local filesystem storage backend
  • Development-friendly settings (no TLS, no auth)
  • Sensible performance defaults

Using Configuration Files

Default Configuration

bash
# Uses config.yaml from current directory
./boilstream

Custom Configuration File

bash
# Use a specific configuration file
./boilstream --config production.yaml

# Or use environment variable
CONFIG_FILE=production.yaml ./boilstream

Configuration Priority

Settings are applied in this order (later overrides earlier):

  1. Built-in defaults
  2. Configuration file (YAML)
  3. Environment variables (for specific overrides)

Configuration File Structure

The config.yaml file is organized into logical sections. Here's a complete example with all available options:

yaml
# AWS Configuration
aws:
  region: "eu-west-1"
  # access_key_id: "your-access-key"  # Optional - can use AWS CLI/IAM roles
  # secret_access_key: "your-secret-key"  # Optional - can use AWS CLI/IAM roles
  https_conn_pool_size: 100

# DuckDB Persistence Configuration (Optional High-Performance Local Storage)
duckdb_persistence:
  enabled: true # Enable high-performance local DuckDB persistence (10M+ rows/s)
  storage_path: "/tmp/duckdb/topics" # Directory for shared DuckDB database files
  max_writers: 10 # Number of concurrent database writers for optimal performance

# Storage Configuration
storage:
  # Multiple storage backends can be configured simultaneously
  backends:
    - name: "primary-s3"
      backend_type: "s3"
      enabled: true
      primary: true # Primary backend - operations must succeed here
      # S3-specific configuration
      endpoint: "http://localhost:9000" # For MinIO/custom S3
      bucket: "ingestion-data"
      prefix: "/"
      access_key: "minioadmin"
      secret_key: "minioadmin"
      region: "us-east-1"
      use_path_style: true # Required for MinIO
      max_concurrent_uploads: 10
      upload_id_pool_capacity: 100
      max_retries: 3
      initial_backoff_ms: 100
      max_retry_attempts: 3
      flush_interval_ms: 250
      max_multipart_object_size: 104857600 # 100 MB
    - name: "backup-filesystem"
      backend_type: "filesystem"
      enabled: true
      primary: false # Secondary backend - failures are logged but not fatal
      # Filesystem-specific configuration
      prefix: "/tmp/storage"
    # - name: "debug-noop"
    #   backend_type: "noop"
    #   enabled: false
    #   primary: false  # For testing/benchmarking without actual storage

# Server Configuration
server:
  # tokio_worker_threads: 16  # Optional - defaults to system CPU count
  flight_thread_count: 1
  flight_base_port: 50050
  admin_flight_port: 50160
  consumer_flight_port: 50250

# Data Processing Configuration
processing:
  data_processing_threads: 8
  buffer_pool_max_size: 50
  window_queue_capacity: 30000
  window_ms: 10000
  include_metadata_columns: true
  schema_validation_enabled: true
  parquet:
    compression: "ZSTD"
    dictionary_enabled: true

# Rate Limiting Configuration
rate_limiting:
  disabled: false
  max_requests: 15000000
  burst_limit: 20000000
  global_limit: 150000000
  base_size_bytes: 4096

# TLS Configuration
tls:
  disabled: true # Disabled for development
  # cert_path: "/path/to/cert.pem"
  # key_path: "/path/to/key.pem"
  # cert_pem: "-----BEGIN CERTIFICATE-----\n..."
  # key_pem: "-----BEGIN PRIVATE KEY-----\n..."
  # grpc_default_ssl_roots_file_path: "/path/to/ca-certificates.crt"

# Authentication Configuration
auth:
  providers: [] # Empty for development - no auth
  authorization_enabled: false
  admin_groups: []
  read_only_groups: []
  cognito:
    # user_pool_id: "us-east-1_example"
    # region: "us-east-1"
    # audience: "client-id"
  azure:
    # tenant_id: "tenant-id"
    # client_id: "client-id"
    allow_multi_tenant: false
  gcp:
    # client_id: "client-id"
    # project_id: "project-id"
    require_workspace_domain: false
  auth0:
    # tenant: "your-tenant.auth0.com"
    # audience: "your-api-identifier"
    # groups_namespace: "https://your-app.com/groups"
  okta:
    # org_domain: "your-org.okta.com"
    # audience: "api://your-audience"
    # auth_server_id: "your-auth-server"

# Metrics Configuration
metrics:
  port: 8081
  flush_interval_ms: 1000

# Logging Configuration
logging:
  rust_log: "info"

Configuration Sections

AWS Configuration

Configure AWS credentials for S3 backends:

FieldTypeDefaultDescription
aws.regionstring"us-east-1"AWS region
aws.access_key_idstringnullAWS access key (optional)
aws.secret_access_keystringnullAWS secret key (optional)
aws.https_conn_pool_sizenumber100HTTP connection pool size

Note: S3 configuration is now done per-backend in the storage.backends section.

DuckDB Persistence Configuration

BoilStream provides optional high-performance local DuckDB persistence alongside its diskless S3 pipeline. When enabled, data is simultaneously written to both S3 (diskless) and local DuckDB databases (shared across topics).

FieldTypeDefaultDescription
duckdb_persistence.enabledbooleanfalseEnable DuckDB persistence (10M+ rows/s)
duckdb_persistence.storage_pathstring"/tmp/duckdb/topics"Directory for shared DuckDB database files
duckdb_persistence.max_writersnumber10Number of concurrent database writers
duckdb_persistence.dry_runbooleanfalseProcess Arrow data but skip actual writes
duckdb_persistence.super_dry_runbooleanfalseCompletely skip DuckDB processing

DuckDB Persistence Benefits

High Performance:

  • 10+ million rows/second ingestion rate into local databases
  • Shared database files allow cross-topic queries and joins
  • No backup infrastructure needed - S3 provides automatic replication

Architecture Integration:

  • BI Tool Integration: PostgreSQL interface and FlightSQL for direct connectivity
  • Future Roadmap: Source for window queries and time-series analysis
  • Future Roadmap: Live queries over past N hours of ingested data

Example Configuration:

yaml
# High-performance dual storage: S3 (diskless) + DuckDB (local)
duckdb_persistence:
  enabled: true
  storage_path: "/data/duckdb"
  max_writers: 16 # Scale with CPU cores

# Continues to write to S3 backends simultaneously
storage:
  backends:
    - name: "primary-s3"
      backend_type: "s3"
      enabled: true
      # ... S3 configuration

DuckLake Configuration

BoilStream integrates with DuckLake. DuckLake automatically registers Parquet files in catalogs after successful upload to storage.

FieldTypeDefaultDescription
ducklake[].namestring-Unique identifier for this DuckLake catalog
ducklake[].data_pathstring-S3 path where Parquet files are stored
ducklake[].attachstring-SQL statements for DuckLake attachment and setup
ducklake[].topicsarrayall topicsOptional: Specify which topics to include
ducklake[].reconciliation.on_startupbooleantrueRun reconciliation when application starts
ducklake[].reconciliation.interval_minutesnumber60Check for missing files every N minutes
ducklake[].reconciliation.max_concurrent_registrationsnumber10Parallel registration limit

Example Configuration:

yaml
ducklake:
  - name: my_ducklake
    data_path: "s3://ingestion-data/"
    attach: |
      INSTALL ducklake; INSTALL postgres; INSTALL aws;
      LOAD ducklake; LOAD postgres; LOAD aws;
      CREATE SECRET s3_access (TYPE S3, KEY_ID 'key', SECRET 'secret');
      CREATE SECRET postgres (TYPE POSTGRES, HOST 'localhost', DATABASE 'catalog');
      CREATE SECRET pg_secret (TYPE DUCKLAKE, DATA_PATH 's3://ingestion-data/', 
                               METADATA_PARAMETERS MAP {'TYPE': 'postgres', 'SECRET': 'postgres'});
      ATTACH 'ducklake:pg_secret' AS my_ducklake;
    reconciliation:
      on_startup: true
      interval_minutes: 60

DuckLake Integration with Storage Backends:

Storage backends can automatically register files with DuckLake catalogs:

yaml
storage:
  backends:
    - name: "primary-s3"
      backend_type: "s3"
      # ... S3 configuration
      ducklake: ["my_ducklake"] # Auto-register files with this catalog

See the DuckLake Integration guide for detailed setup instructions.

Storage Configuration

BoilStream supports multiple concurrent storage backends, allowing you to write data to several destinations simultaneously. This enables scenarios like:

  • Primary + backup storage (S3 + filesystem)
  • Multi-cloud redundancy (S3 + another cloud provider)
  • Testing and auditing (production storage + debug/noop storage)
  • DuckLake integration (automatic catalog registration)

Storage Backends

Configure multiple storage backends in the storage.backends array:

FieldTypeDefaultDescription
storage.backends[].namestringUnique identifier for this backend
storage.backends[].backend_typestringBackend type: "s3", "filesystem", "noop"
storage.backends[].enabledbooleanWhether this backend is active
storage.backends[].primarybooleanIf true, operations must succeed on this backend
storage.backends[].ducklakearray[]List of DuckLake catalogs to register files with

Backend Types:

  • s3 - AWS S3 or S3-compatible storage (MinIO, etc.)
  • filesystem - Local or network filesystem storage
  • noop - No-operation storage for testing/benchmarking

Primary vs Secondary Backends:

  • Primary backends (primary: true) must succeed for the operation to be considered successful
  • Secondary backends (primary: false) are best-effort; failures are logged but don't fail the operation

Backend-Specific Configuration

S3 Backend Configuration:

FieldTypeDefaultDescription
endpointstringnullS3 endpoint URL (required for S3 backends)
bucketstringnullS3 bucket name (required for S3 backends)
prefixstring""Base prefix for S3 uploads (optional)
access_keystringnullS3 access key (required for S3 backends)
secret_keystringnullS3 secret key (required for S3 backends)
regionstring"us-east-1"AWS region (optional for S3 backends)
use_path_stylebooleanauto-detectedUse path-style addressing (auto-detects MinIO)
max_concurrent_uploadsnumber10Maximum concurrent uploads
upload_id_pool_capacitynumber100Upload ID pool capacity
max_retriesnumber3Maximum retry attempts
initial_backoff_msnumber100Initial backoff in milliseconds
max_retry_attemptsnumber3Maximum retry attempts
flush_interval_msnumber250Data sync interval in milliseconds
max_multipart_object_sizenumber104857600Maximum multipart object size (100MB)

Filesystem Backend Configuration:

FieldTypeDefaultDescription
prefixstring"./storage"Base directory path prefix for filesystem storage (required for filesystem backends)

MinIO Configuration

MinIO is supported through the S3 backend type. To configure MinIO, use backend_type: "s3" with these specific settings:

yaml
storage:
  backends:
    - name: "minio-storage"
      backend_type: "s3" # Use S3 backend type for MinIO
      enabled: true
      primary: true
      endpoint: "http://localhost:9000" # MinIO endpoint
      bucket: "your-bucket-name"
      prefix: "/"
      access_key: "minioadmin"
      secret_key: "minioadmin"
      region: "us-east-1"
      use_path_style: true # Required for MinIO

MinIO-specific notes:

  • Always set use_path_style: true for MinIO compatibility
  • Use backend_type: "s3" (not a separate MinIO type)
  • The system automatically detects MinIO endpoints and sets path-style addressing if not explicitly configured

Environment Variable Configuration

You can configure multiple backends via the STORAGE_BACKENDS environment variable:

bash
# Enable S3 and filesystem backends (S3 primary, filesystem secondary)
STORAGE_BACKENDS="s3,filesystem" boilstream

# Enable only filesystem storage
STORAGE_BACKENDS="filesystem" boilstream

# Enable S3, filesystem, and noop for testing
STORAGE_BACKENDS="s3,filesystem,noop" boilstream

Example Configurations

Primary S3 + Backup Filesystem:

yaml
storage:
  backends:
    - name: "primary-s3"
      backend_type: "s3"
      enabled: true
      primary: true
      endpoint: "https://s3.amazonaws.com"
      bucket: "my-production-bucket"
      prefix: ""
      access_key: "${AWS_ACCESS_KEY_ID}"
      secret_key: "${AWS_SECRET_ACCESS_KEY}"
      region: "us-east-1"
      use_path_style: false
    - name: "backup-filesystem"
      backend_type: "filesystem"
      enabled: true
      primary: false
      prefix: "/backup/storage"

MinIO Development Setup:

yaml
storage:
  backends:
    - name: "minio-dev"
      backend_type: "s3"
      enabled: true
      primary: true
      endpoint: "http://localhost:9000"
      bucket: "ingestion-data"
      prefix: "/"
      access_key: "minioadmin"
      secret_key: "minioadmin"
      region: "us-east-1"
      use_path_style: true # Required for MinIO

Development with NoOp for Performance Testing:

yaml
storage:
  backends:
    - name: "main-s3"
      backend_type: "s3"
      enabled: true
      primary: true
      endpoint: "http://localhost:9000"
      bucket: "test-bucket"
      prefix: ""
      access_key: "minioadmin"
      secret_key: "minioadmin"
      use_path_style: true
    - name: "perf-test"
      backend_type: "noop"
      enabled: true
      primary: false

Filesystem Only (Local Development):

yaml
storage:
  backends:
    - name: "local-dev"
      backend_type: "filesystem"
      enabled: true
      primary: true
      prefix: "./local-storage"

Server Configuration

Configure server ports and threading:

FieldTypeDefaultDescription
server.tokio_worker_threadsnumbernullNumber of Tokio worker threads
server.flight_thread_countnumber1Number of FlightRPC threads
server.flight_base_portnumber50050Base port for FlightRPC servers

Processing Configuration

Configure data processing behavior:

FieldTypeDefaultDescription
processing.data_processing_threadsnumber8Number of data processing threads
processing.buffer_pool_max_sizenumber50Maximum buffer pool size
processing.window_queue_capacitynumber30000Window queue capacity
processing.window_msnumber10000Window duration in milliseconds
processing.dry_runbooleanfalseEnable dry run mode
processing.include_metadata_columnsbooleantrueInclude metadata columns
processing.schema_validation_enabledbooleantrueEnable schema validation

Parquet Configuration

FieldTypeDefaultDescription
processing.parquet.compressionstring"ZSTD"Parquet compression algorithm
processing.parquet.dictionary_enabledbooleantrueEnable dictionary encoding

Rate Limiting Configuration

Configure request rate limiting:

FieldTypeDefaultDescription
rate_limiting.disabledbooleanfalseDisable rate limiting
rate_limiting.max_requestsnumber15000000Max requests per second per producer
rate_limiting.burst_limitnumber20000000Burst limit
rate_limiting.global_limitnumber150000000Global requests per second
rate_limiting.base_size_bytesnumber4096Base size for rate limiting tokens

TLS Configuration

Configure TLS encryption (Pro tier only):

FieldTypeDefaultDescription
tls.disabledbooleanfalseDisable TLS
tls.cert_pathstringnullPath to certificate file
tls.key_pathstringnullPath to private key file
tls.cert_pemstringnullCertificate as PEM string
tls.key_pemstringnullPrivate key as PEM string

Authentication Configuration

Configure authentication providers (Pro tier only):

FieldTypeDefaultDescription
auth.providersarray[]List of authentication providers
auth.authorization_enabledbooleanfalseEnable authorization
auth.admin_groupsarray[]Admin group names
auth.read_only_groupsarray[]Read-only group names

PostgreSQL Web Authentication Server

Configure the built-in web authentication server for PostgreSQL access:

FieldTypeDefaultDescription
auth_server.enabledbooleanfalseEnable PostgreSQL web authentication server
auth_server.portnumber443HTTPS port for web UI
auth_server.static_dirstring"static/auth"Directory containing web UI files
auth_server.session_ttl_hoursnumber8PostgreSQL session TTL in hours
auth_server.users_db_pathstring"data/users.duckdb"Path to encrypted users database
auth_server.encryption_key_pathstringnullPath to encryption key file (optional)
auth_server.tls_certstringnullPath to TLS certificate (falls back to http_ingestion)
auth_server.tls_keystringnullPath to TLS private key (falls back to http_ingestion)
auth_server.webauthn_rp_idstring"localhost"WebAuthn Relying Party ID (domain)
auth_server.webauthn_rp_originstring"https://localhost"WebAuthn origin URL
auth_server.cors_allowed_originsarray[]CORS allowed origins for web UI
auth_server.users_backup_backendstringnullStorage backend name for automatic backups
auth_server.users_backup_interval_secondsnumber60Minimum seconds between automatic backups
auth_server.users_backup_pathstring"auth/users.duckdb"Path in storage backend for backup file
auth_server.email_encryption_pgp_public_key_pathstringnullPath to PGP public key for email encryption (optional)
auth_server.email_encryption_pgp_public_keystringnullPGP public key as inline string (optional)

Initial Setup Flow

On first startup with auth_server.enabled: true:

  1. Encryption Key Prompt:

    • If encryption_key_path is set and file doesn't exist: Prompts for key, then saves it
    • If encryption_key_path is set and file exists: Loads key automatically (no prompt)
    • If encryption_key_path not set: Prompts for key every startup (not saved)
    • If stdin is piped: Reads key from pipe (never saved)
  2. Superadmin Password Prompt:

    • Only on first run after encryption key is set
    • Creates superadmin account with username "boilstream"
    • Subsequent runs: No prompt (password stored encrypted)
  3. Automated Startup:

    • For production deployments: Set encryption_key_path to existing file
    • Server starts automatically after initial setup (no manual intervention)

Database Encryption

User databases are always encrypted when auth_server is enabled. Encryption key is required to decrypt users.duckdb and superadmin.duckdb.

Key Management:

  • Development: encryption_key_path: "encryption.key" (auto-generates on first run)
  • Production: Store key in secrets manager, mount as file or pipe via stdin
  • High Security: No encryption_key_path (manual entry every startup)

See PostgreSQL Web Authentication for detailed setup instructions.

OAuth Providers Configuration

Configure OAuth providers for PostgreSQL web authentication. This section is separate from FlightRPC JWT authentication.

GitHub OAuth

yaml
oauth_providers:
  github:
    client_id: "your-github-app-client-id"
    client_secret: "your-github-app-client-secret"
    redirect_uri: "https://your-domain/auth/callback"

    # Organization access control
    allowed_orgs:
      - "your-company"
      - "partner-org"

    # GitHub team to database role mappings
    team_role_mappings:
      "your-company/platform-admins": "admin"
      "your-company/data-engineers": "write"
      "your-company/analysts": "read"
      "partner-org/integration-team": "write"

    # Audit team membership even without RBAC
    audit_org_teams: false
FieldTypeDefaultDescription
client_idstringnullGitHub OAuth App Client ID
client_secretstringnullGitHub OAuth App Client Secret
redirect_uristringnullOAuth callback URL (must match GitHub app settings)
allowed_orgsarray[]GitHub organizations allowed to login (empty = all)
team_role_mappingsobject{}Map GitHub teams to roles (admin/write/read)
audit_org_teamsbooleanfalseFetch org/team membership for audit logging

Role Mapping Format: "org-name/team-slug": "role"

  • Admin role: Full database access (DDL + DML)
  • Write role: Create topics, insert data, query
  • Read role: Query data only (no DDL)

Google OAuth

yaml
oauth_providers:
  google:
    client_id: "your-client-id.apps.googleusercontent.com"
    client_secret: "your-google-client-secret"
    redirect_uri: "https://your-domain/auth/callback"

    # Email domain restrictions
    allowed_domains:
      - "yourcompany.com"
      - "partner.com"
FieldTypeDefaultDescription
client_idstringnullGoogle OAuth Client ID
client_secretstringnullGoogle OAuth Client Secret
redirect_uristringnullOAuth callback URL
allowed_domainsarray[]Allowed email domains (empty = all domains)

SAML SSO

yaml
oauth_providers:
  saml:
    - name: "aws-sso"
      enabled: true
      sp_entity_id: "https://your-domain"
      sp_acs_url: "https://your-domain/auth/saml/acs"
      sp_slo_url: "https://your-domain/auth/saml/logout"
      idp_entity_id: "arn:aws:siam::123456789012:saml-provider/YourProvider"
      idp_sso_url: "https://portal.sso.region.amazonaws.com/saml/assertion/..."
      idp_certificate: |
        -----BEGIN CERTIFICATE-----
        ...
        -----END CERTIFICATE-----
      sp_certificate: "/path/to/sp-cert.pem"
      sp_private_key: "/path/to/sp-key.pem"
FieldTypeDefaultDescription
namestringnullSAML provider identifier
enabledbooleanfalseEnable this SAML provider
sp_entity_idstringnullService Provider entity ID (your domain)
sp_acs_urlstringnullAssertion Consumer Service URL
idp_entity_idstringnullIdentity Provider entity ID
idp_sso_urlstringnullIdentity Provider SSO endpoint
idp_certificatestringnullIDP certificate (PEM format)
sp_certificatestringnullSP certificate path or PEM string
sp_private_keystringnullSP private key path or PEM string

PostgreSQL Web Auth vs FlightRPC Auth

These OAuth providers are for PostgreSQL web authentication only (web UI login → temporary PostgreSQL credentials).

For FlightRPC/DuckDB Airport authentication, use the auth section with JWT providers (AWS Cognito, Azure AD, etc.).

See Authentication Systems for the distinction.

Complete Configuration Example:

yaml
# PostgreSQL Web Authentication
auth_server:
  enabled: true
  port: 443
  session_ttl_hours: 8
  users_db_path: "data/users.duckdb"
  encryption_key_path: "/etc/boilstream/encryption.key"
  webauthn_rp_id: "boilstream.example.com"
  webauthn_rp_origin: "https://boilstream.example.com"
  cors_allowed_origins:
    - "https://boilstream.example.com"
  users_backup_backend: "primary-s3"
  users_backup_interval_seconds: 300

  # Optional PGP encryption for deleted account emails (GDPR compliance)
  email_encryption_pgp_public_key_path: "/etc/boilstream/pgp/public.asc"
  # Or provide key directly:
  # email_encryption_pgp_public_key: |
  #   -----BEGIN PGP PUBLIC KEY BLOCK-----
  #   ...
  #   -----END PGP PUBLIC KEY BLOCK-----

oauth_providers:
  github:
    client_id: "${GITHUB_CLIENT_ID}"
    client_secret: "${GITHUB_CLIENT_SECRET}"
    redirect_uri: "https://boilstream.example.com/auth/callback"
    allowed_orgs: ["mycompany"]
    team_role_mappings:
      "mycompany/admins": "admin"
      "mycompany/engineers": "write"
      "mycompany/analysts": "read"

  google:
    client_id: "${GOOGLE_CLIENT_ID}"
    client_secret: "${GOOGLE_CLIENT_SECRET}"
    redirect_uri: "https://boilstream.example.com/auth/callback"
    allowed_domains: ["mycompany.com"]

See PostgreSQL Web Authentication for detailed setup instructions and usage examples.

Metrics Configuration

Configure metrics collection:

FieldTypeDefaultDescription
metrics.portnumber8081Metrics server port
metrics.flush_interval_msnumber1000Metrics flush interval

PGWire Server Configuration

BoilStream includes a built-in PostgreSQL wire protocol server that enables BI tools like DBeaver, Tableau, and psql to connect directly to your streaming data through the standard PostgreSQL protocol.

FieldTypeDefaultDescription
pgwire.enabledbooleantrueEnable PGWire PostgreSQL protocol server
pgwire.portnumber5432Port for PostgreSQL protocol connections
pgwire.usernamestring"boilstream"Username for PostgreSQL authentication
pgwire.passwordstring"boilstream"Password for PostgreSQL authentication
pgwire.refresh_interval_secondsnumber5Database refresh interval in seconds
pgwire.initialization_sqlstring""SQL commands to execute on DuckDB init
pgwire.tls.enabledbooleanfalseEnable TLS for PostgreSQL connections (Pro tier only)
pgwire.tls.cert_pathstringnullPath to TLS certificate file (Pro tier only)
pgwire.tls.key_pathstringnullPath to TLS private key file (Pro tier only)
pgwire.tls.cert_pemstringnullTLS certificate as PEM string (Pro tier only)
pgwire.tls.key_pemstringnullTLS private key as PEM string (Pro tier only)

Key Features:

  • Full PostgreSQL Protocol Support: Compatible with any PostgreSQL client
  • Cursor Support: Handles large result sets efficiently through extended query protocol
  • Text and Binary Encoding: Supports both text and binary data formats
  • Prepared Statements: Full prepared statement support with parameter binding
  • Query Cancellation: Standard PostgreSQL query cancellation support
  • TLS Encryption: Optional TLS encryption for secure connections (Pro tier only)

Example Configuration:

yaml
# PostgreSQL Protocol Server
pgwire:
  enabled: true
  port: 5432
  username: "analyst"
  password: "secure_password"
  refresh_interval_seconds: 10
  initialization_sql: |
    INSTALL icu;
    LOAD icu;
    SET timezone = 'UTC';
  tls:
    enabled: true # Pro tier only
    cert_path: "/etc/ssl/certs/pgwire.crt" # Pro tier only
    key_path: "/etc/ssl/private/pgwire.key" # Pro tier only

Integration with DuckDB Persistence:

The PGWire server automatically integrates with DuckDB persistence when enabled, providing:

  • Live Query Access: Query streaming data through PostgreSQL protocol
  • Cross-Topic Joins: Join data across different topics using standard SQL
  • BI Tool Compatibility: Connect any PostgreSQL-compatible BI tool directly

Environment Variable Overrides:

bash
# Enable PGWire server
PGWIRE_ENABLED=true
PGWIRE_PORT=5432
PGWIRE_USERNAME=analyst
PGWIRE_PASSWORD=secure_password

# TLS Configuration (Pro tier only)
PGWIRE_TLS_ENABLED=true
PGWIRE_TLS_CERT_PATH=/etc/ssl/certs/pgwire.crt
PGWIRE_TLS_KEY_PATH=/etc/ssl/private/pgwire.key

# Or use PEM strings directly (Pro tier only)
PGWIRE_TLS_CERT_PEM="-----BEGIN CERTIFICATE-----..."
PGWIRE_TLS_KEY_PEM="-----BEGIN PRIVATE KEY-----..."

See the PostgreSQL Interface Guide for detailed setup instructions and BI tool integration examples.

Logging Configuration

Configure logging levels:

FieldTypeDefaultDescription
logging.rust_logstring"info"Log level configuration

Environment Variables (Advanced)

While the YAML configuration file is the recommended approach, environment variables can be used for specific overrides, particularly useful for:

  • Secrets that shouldn't be stored in files
  • Container deployments where environment injection is standard
  • Quick testing of different values

Environment variables follow the pattern: uppercase field names joined with underscores.

Recommendation

Use the YAML configuration file for most settings and environment variables only for secrets or deployment-specific overrides.

Common Overrides

bash
# Override sensitive credentials (don't store in config.yaml)
export AWS_ACCESS_KEY_ID="your-key"
export AWS_SECRET_ACCESS_KEY="your-secret"

# Override storage bucket for different environments
export S3_BUCKET="production-bucket"

Development vs Production

Development Configuration

The auto-generated config.yaml is already optimized for development. You can customize it further:

yaml
aws:
  region: "us-east-1"
  s3:
    bucket: "my-dev-bucket"

storage:
  backends:
    - name: "local-filesystem"
      backend_type: "filesystem"
      enabled: true
      primary: true
      prefix: "./dev-storage"
    # Optional: Add S3 for testing cloud integration
    # - name: "dev-s3"
    #   backend_type: "s3"
    #   enabled: false
    #   primary: false
    #   endpoint: "http://localhost:9000"
    #   bucket: "dev-bucket"
    #   prefix: ""
    #   access_key: "minioadmin"
    #   secret_key: "minioadmin"
    #   use_path_style: true

server:
  tokio_worker_threads: 16

tls:
  disabled: true

auth:
  providers: []

logging:
  rust_log: "debug"

Production Configuration

For production, copy and modify the auto-generated config:

yaml
aws:
  region: "eu-west-1"

storage:
  backends:
    - name: "primary-s3"
      backend_type: "s3"
      enabled: true
      primary: true
      endpoint: "https://s3.amazonaws.com"
      bucket: "my-production-bucket"
      prefix: ""
      access_key: "${AWS_ACCESS_KEY_ID}"
      secret_key: "${AWS_SECRET_ACCESS_KEY}"
      region: "eu-west-1"
      use_path_style: false
    - name: "backup-filesystem"
      backend_type: "filesystem"
      enabled: true
      primary: false # Secondary for backup/audit
      prefix: "/data/backup-storage"

server:
  tokio_worker_threads: 16

processing:
  data_processing_threads: 16
  window_queue_capacity: 100000

rate_limiting:
  max_requests: 50000000
  burst_limit: 75000000

tls:
  disabled: false
  cert_path: "/etc/ssl/certs/server.crt"
  key_path: "/etc/ssl/private/server.key"

auth:
  providers: ["cognito"]
  authorization_enabled: true
  admin_groups: ["admin"]

logging:
  rust_log: "info"

Usage Examples

Basic Development Setup

bash
# Create config file
cat > dev-config.yaml << EOF
aws:
  region: "us-east-1"
storage:
  backends:
    - name: "dev-s3"
      backend_type: "s3"
      enabled: true
      primary: true
      endpoint: "http://localhost:9000"
      bucket: "my-dev-bucket"
      prefix: ""
      access_key: "minioadmin"
      secret_key: "minioadmin"
      region: "us-east-1"
      use_path_style: true
server:
logging:
  rust_log: "debug"
EOF

# Run with config file
boilstream --config dev-config.yaml

Production with Environment Overrides

bash
# Use production config but override bucket via environment
S3_BUCKET=production-bucket-2024 boilstream --config prod-config.yaml

Environment Variables Only

bash
# No config file, all via environment
AWS_REGION=eu-west-1 \
S3_BUCKET=my-bucket \
TLS_DISABLED=true \
boilstream

Multi-Backend Examples

bash
# Primary S3 + backup filesystem
STORAGE_BACKENDS="s3,filesystem" \
STORAGE_FILESYSTEM_PREFIX="/backup" \
S3_BUCKET=my-bucket \
boilstream

# Filesystem only for local development
STORAGE_BACKENDS="filesystem" \
STORAGE_FILESYSTEM_PREFIX="./local-storage" \
boilstream

# S3 + NoOp for performance testing
STORAGE_BACKENDS="s3,noop" \
S3_BUCKET=perf-test-bucket \
boilstream

Validation

BoilStream validates configuration on startup and will exit with an error if:

  • Required fields are missing (e.g., S3_BUCKET)
  • Invalid values are provided (e.g., port 0)
  • Referenced files don't exist (e.g., TLS certificates)

Check the logs for detailed validation error messages.