Skip to content

Web Auth GUI

BoilStream includes a built-in Web Auth GUI at https://your-domain/auth for user authentication, credential vending, and administration.

User features:

  • Authenticate via OAuth (GitHub, Google), SAML SSO (Entra ID, Okta), or email/password
  • PostgreSQL credentials for DuckLake access
  • Bootstrap tokens for DuckDB extension authentication
  • HTTP/2 ingestion tokens (JWT) for data ingestion APIs
  • Role selection for multi-role users

Admin features (Superadmin GUI):

  • S3 bucket and access role management
  • User role assignments
  • SAML/OAuth provider configuration
  • DuckLake catalog provisioning

Quick Start

1. Enable Auth Server

yaml
# config.yaml
auth_server:
  enabled: true
  port: 443
  # Optional: path to save/load encryption key
  encryption_key_path: "encryption.key"
  session_ttl_hours: 8
  tls_cert: "/path/to/cert.pem"
  tls_key: "/path/to/key.pem"

2. First Startup - Database Encryption

BoilStream encrypts user and superadmin databases with AES-256-GCM (industry standard authenticated encryption).

First run behavior:

bash
./boilstream --config config.yaml

# Prompt 1: Encryption key for user/superadmin databases
Enter encryption key (press Enter to generate random):
# [Press Enter to auto-generate 32-byte key]
# ✓ Generated encryption key
# ✓ Saved to: encryption.key

# Prompt 2: Superadmin password
Set superadmin password (min 12 characters):
# [Type password - not echoed]
Confirm password:
# ✓ Superadmin account created (username: boilstream)

Key storage options:

ConfigBehavior
encryption_key_path: "encryption.key"Auto-generate on first run, save to file. Subsequent starts load from file (no prompt)
No encryption_key_path specifiedPrompt on every startup. Key never saved to disk (highest security)
Pipe via stdinecho $KEY | ./boilstream - For CI/CD, K8s secrets

Critical

Losing the encryption key means permanent data loss. User and superadmin databases cannot be decrypted.

  • Back up encryption.key to secure location (vault, secrets manager)
  • User database: data/users.duckdb (encrypted)
  • Superadmin database: data/superadmin.duckdb (encrypted)

3. Configure Authentication

SAML SSO: Configure via Superadmin GUI at /auth → SAML Providers (Entra ID guide)

GitHub OAuth:

yaml
oauth_providers:
  github:
    client_id: "${GITHUB_CLIENT_ID}"
    client_secret: "${GITHUB_CLIENT_SECRET}"
    allowed_orgs: ["your-org"]

Google OAuth:

yaml
oauth_providers:
  google:
    client_id: "${GOOGLE_CLIENT_ID}"
    client_secret: "${GOOGLE_CLIENT_SECRET}"
    allowed_domains: ["yourcompany.com"]

Email/Password: Enabled when no SAML provider is configured. Users can sign up at /auth.

SAML SSO Mode

When SAML SSO is enabled, local email/password authentication is disabled for regular users. Only the superadmin account can use local login. New user registration is also disabled - users must authenticate via SAML.

4. User Access

  1. Navigate to https://your-domain/auth
  2. Login with chosen method (SAML/GitHub/Google/Email)
  3. Dashboard shows:
    • PostgreSQL credentials (host, port, username, password)
    • JWT token (for HTTP ingestion API)

Authorization (RBAC)

BoilStream uses a hierarchical RBAC model managed through the Superadmin GUI at /auth. Access control is based on three core components:

Core Components

ComponentDescription
S3 BucketsStorage locations for DuckLake data
Access RolesLink cloud IAM credentials to S3 buckets
Role AssignmentsMap users or SAML groups to access roles

Access Modes

DuckLake catalogs support hierarchical access levels:

ModePermissions
readerRead-only access to catalog data
writerRead + write access
adminFull control over catalog
ownerCatalog creator (full access)

Setting Up Access (Superadmin GUI)

  1. Create S3 Bucket (/auth → S3 Buckets → Add)

    • Configure bucket name, region, cloud provider
    • Link to cloud account credentials
  2. Create Access Role (/auth → Access Roles → Add)

    • Link cloud IAM role ARN to S3 bucket
    • Optionally map to PostgreSQL user for write access
  3. Assign Roles to Users (/auth → Role Assignments)

    • Assign to individual users by email
    • Assign to SAML groups for automatic provisioning

SAML Group Mapping

When users authenticate via SAML (Entra ID, Okta), their group memberships are automatically mapped to role assignments:

  • User logs in via SAML SSO
  • SAML assertion includes group claims
  • BoilStream matches groups to role assignments
  • User receives access to DuckLakes linked to their roles

DuckLake Catalog Grants (Coming Soon)

Planned Feature

Fine-grained catalog grants are not yet available in the Superadmin GUI. The backend support exists but admin API endpoints are pending implementation.

For fine-grained access control, superadmins will be able to grant specific users access to individual catalogs:

  • Grant access: Specify user email, access mode, optional expiration
  • Revoke access: Remove previously granted access
  • Time-limited grants: Automatic expiration for temporary access

Security Features

Encrypted Storage (Industry Standard)

  • AES-256-GCM authenticated encryption for users.duckdb and superadmin.duckdb
  • 32-byte encryption key generated on first run (or manually provided)
  • Key storage options: Save to file path, manual entry, or pipe via stdin
  • No plaintext credentials - All passwords stored as SCRAM-SHA-256 hashes

MFA Support

  • TOTP: Authenticator apps (Google Authenticator, Authy)
  • Passkeys: WebAuthn (Face ID, Touch ID, YubiKey)
  • Backup codes: 10 one-time codes

Session Security

  • HttpOnly secure cookies
  • CSRF protection
  • Configurable TTL (default: 8 hours)
  • Automatic cleanup

Password Security

  • SCRAM-SHA-256 hashing (PBKDF2, 4096 iterations)
  • Minimum 12 characters
  • Salted hashes

Using Credentials

PostgreSQL Clients

bash
# psql
psql -h your-domain -p 5432 -U user@company.com -d boilstream

# DBeaver: New Connection → PostgreSQL → Use credentials from dashboard
# Power BI: Get Data → PostgreSQL → Enter connection details
# Tableau: Connect → PostgreSQL → Use dashboard credentials

HTTP Ingestion (JWT Token)

javascript
// Flechette.js browser ingestion
const JWT_TOKEN = '...'; // From dashboard

fetch('https://your-domain/ingest/events', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${JWT_TOKEN}`,
    'Content-Type': 'application/vnd.apache.arrow.stream'
  },
  body: arrowData
});

Troubleshooting

"Encryption key required": Specify encryption_key_path in config or pipe key via stdin

GitHub OAuth "Redirect URI mismatch": Verify redirect URI in GitHub App matches https://your-domain/auth/github/callback

"User not authorized": Check user has been assigned an access role via Superadmin GUI (/auth → Role Assignments)

PostgreSQL "Invalid password": Credentials expire after session_ttl_hours - re-login to get fresh credentials

"Cannot create DuckLake": User's assigned role must have write access (linked to a PostgreSQL user in the access role configuration)

Configuration Reference

yaml
auth_server:
  enabled: true
  port: 443
  encryption_key_path: "encryption.key"  # Optional: auto-generate if omitted
  session_ttl_hours: 8  # Session lifetime
  tls_cert: "/path/to/cert.pem"
  tls_key: "/path/to/key.pem"
  app_domain: "boilstream.company.com"  # For OAuth callbacks

oauth_providers:
  github:
    client_id: "${GITHUB_CLIENT_ID}"
    client_secret: "${GITHUB_CLIENT_SECRET}"
    redirect_uri: "https://your-domain/auth/github/callback"
    allowed_orgs: ["org1", "org2"]  # Optional: restrict to specific orgs

  google:
    client_id: "${GOOGLE_CLIENT_ID}"
    client_secret: "${GOOGLE_CLIENT_SECRET}"
    redirect_uri: "https://your-domain/auth/google/callback"
    allowed_domains: ["yourcompany.com"]  # Optional: restrict to specific domains

Role-Based Access Control

RBAC configuration (S3 buckets, access roles, role assignments) is managed through the Superadmin GUI at /auth, not via YAML configuration. See Authorization (RBAC) section above.

Next Steps