PostgreSQL Interface
BoilStream provides a complete PostgreSQL interface implementation that enables any PostgreSQL-compatible client to connect directly to your streaming data. This comprehensive implementation includes full catalog support and both text/binary type encodings, making it compatible with enterprise BI tools like Tableau and Power BI.
Overview
The PostgreSQL interface provides a complete PostgreSQL server implementation (except COPY protocol):
- Full PostgreSQL Wire Protocol: Complete implementation of Simple and Extended query protocols
- PostgreSQL System Catalogs: Full
pg_catalogandinformation_schemasupport for BI tools - Text and Binary Type Support: All PostgreSQL data types with both encoding formats
- Power BI & Tableau Compatible: Extensive catalog and type system support required by enterprise BI tools
- Real-time Data Access: Query streaming data as it arrives
- Cross-Topic Analysis: Join data across different topics using standard SQL
- High Performance: Leverages DuckDB's analytical query engine
Authentication
All PostgreSQL connections authenticate through the Web Auth GUI.
Getting Credentials
- Navigate to
https://your-server/auth - Log in with your OAuth provider (GitHub, Google), SAML SSO, or email/password
- Your PostgreSQL credentials are displayed on the dashboard:
- Username: Your email address
- Password: Temporary password (copy from dashboard)
Credential Types
The Web Auth GUI provides several credential types:
- PostgreSQL credentials: For DuckLake/PGWire access (this page)
- Ingest tokens: For Kafka SASL authentication
- Bootstrap tokens: For DuckDB extension authentication
- HTTP/2 tokens: For Arrow ingestion API
Superadmin Access
The superadmin account is created during first startup:
- Username:
boilstream - Password: Set during initial setup
Superadmin credentials are also managed through the Web Auth GUI at /auth.
Key Features
Protocol Support
- Simple Query Protocol: Standard SQL queries with immediate execution
- Extended Query Protocol: Prepared statements with parameter binding
- Cursor Support: Efficient handling of large result sets
- Text and Binary Encoding: Full support for both data formats
Data Types
- Comprehensive Type Mapping: All PostgreSQL data types supported
- UUID Support: Native UUID handling with proper metadata
- Array Types: Full array support including complex nested arrays
- JSON Support: Query JSON data with PostgreSQL JSON functions
Advanced Features
- Prepared Statements: Full parameter binding and type inference
- Query Cancellation: Standard PostgreSQL CancelRequest support
- Connection Pooling: Efficient per-connection resource management
- Automatic Schema Discovery: Dynamic database and table discovery
Quick Start
Basic Configuration
Enable the PGWire server in your configuration:
pgwire:
enabled: true
port: 5432No Static Credentials
Unlike traditional PostgreSQL, BoilStream doesn't use static username/password in config. All users authenticate through the Web Auth GUI.
Environment Variables
# Enable PGWire server
PGWIRE_ENABLED=true
PGWIRE_PORT=5432
# Start BoilStream
boilstream --config your-config.yamlBI Tool Compatibility
Why BoilStream Works with Enterprise BI Tools
BoilStream's PostgreSQL interface is specifically designed to meet the extensive requirements of enterprise BI tools:
- Complete System Catalogs: Both
pg_catalogandinformation_schemaare fully implemented - Type System Compatibility: All PostgreSQL types with proper OID mappings
- Text and Binary Encodings: Both formats supported for all data types
- Metadata Discovery: Full support for schema, table, and column introspection
- Prepared Statements: Complete parameter binding and type inference
- Cursor Support: Efficient handling of large result sets
This comprehensive implementation ensures compatibility with:
- Power BI: Full DirectQuery and Import modes
- Tableau: Complete data source connectivity
- DBeaver: Full IDE functionality
- Grafana: PostgreSQL data source plugin
- Metabase: Native PostgreSQL connector
- Any JDBC/ODBC PostgreSQL driver
Connecting with BI Tools
Power BI
Power BI requires extensive PostgreSQL catalog support, which BoilStream fully provides:
- Get Data → Database → PostgreSQL database
- Enter connection details:
- Server:
localhost:5432 - Database:
your_ducklake_name
- Server:
- Enter credentials from Web Auth GUI:
- Username: Your email address
- Password: Temporary password from dashboard
- Choose DirectQuery for real-time data or Import for snapshots
- Browse and select tables from the navigator
Power BI specific features supported:
- Schema discovery via
information_schema - Type mapping for all PostgreSQL types
- Query folding for performance
- Incremental refresh policies
Tableau
Tableau's PostgreSQL connector works seamlessly with BoilStream:
- Connect → To a Server → PostgreSQL
- Enter connection details:
- Server:
localhost - Port:
5432 - Database:
your_ducklake_name - Authentication: Username/Password from Web Auth GUI
- Server:
- Browse schemas and tables in the data source tab
Tableau specific features supported:
- Initial SQL support
- Custom SQL queries
- Live connections and extracts
- Calculated fields with PostgreSQL functions
DBeaver (Recommended for Managing BoilStream)
DBeaver is the recommended tool for managing BoilStream since all administrative operations are performed through SQL commands. BoilStream has been extensively tested with DBeaver, providing full schema browsing and query execution capabilities.
Why DBeaver is Recommended:
- Topic Management: Create and manage topics using SQL DDL commands
- Schema Evolution: Modify topic schemas through ALTER TABLE statements
- Data Exploration: Browse topics, view schemas, and query data
- SQL IDE Features: Syntax highlighting, auto-completion, query history
- Visual Tools: ER diagrams, data export, query explain plans
- Cross-Platform: Works on Windows, macOS, and Linux
Connection Setup:
- Create a new PostgreSQL connection
- Set the following parameters:
- Host:
localhost(or your server address) - Port:
5432(or your configured port) - Database:
your_ducklake_name - Username: Your email from Web Auth GUI
- Password: Temporary password from Web Auth GUI
- Host:
Features Available:
- Schema browsing and table discovery
- Query execution with syntax highlighting
- Data export and visualization
- Prepared statement support
- Large result set handling with cursors
psql Command Line
Connect using the standard PostgreSQL command-line client:
# Get your credentials from Web Auth GUI first, then:
# Basic connection with your email as username
psql -h localhost -p 5432 -U your.email@company.com -d your_ducklake
# With password prompt
psql -h localhost -p 5432 -U your.email@company.com -d your_ducklake -W
# Connection string format
psql "postgresql://your.email%40company.com:YOUR_PASSWORD@localhost:5432/your_ducklake"URL Encoding
When using email addresses in connection strings, encode the @ symbol as %40.
JDBC/ODBC Connections
JDBC Connection String:
// Get credentials from Web Auth GUI
String url = "jdbc:postgresql://localhost:5432/your_ducklake";
Properties props = new Properties();
props.setProperty("user", "your.email@company.com");
props.setProperty("password", "password_from_dashboard");
Connection conn = DriverManager.getConnection(url, props);ODBC Connection String:
Driver=PostgreSQL Unicode;Server=localhost;Port=5432;Database=your_ducklake;Uid=your.email@company.com;Pwd=password_from_dashboard;Managing BoilStream with SQL
All BoilStream administrative operations are performed through SQL commands via the PostgreSQL interface. This makes DBeaver or any PostgreSQL client a complete management interface.
Topic Management
-- Create a new topic with schema definition
CREATE TABLE my_topic (
timestamp TIMESTAMP,
event_type VARCHAR,
user_id BIGINT,
value DOUBLE
);
-- Create a derived view for streaming transformations (filtering and column transformations)
CREATE TABLE filtered_events AS
SELECT
timestamp,
event_type,
user_id,
value * 1.1 as adjusted_value
FROM my_topic
WHERE event_type = 'purchase';
-- Drop a topic
DROP TABLE my_topic;
-- List all topics (through DuckDB)
SHOW ALL TABLES;Schema Discovery
-- View table schema via information_schema
SELECT column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_name = 'my_topic';
-- List all tables
SELECT table_name, table_schema
FROM information_schema.tables
WHERE table_schema = 'public';
-- Check table metadata via pg_catalog
SELECT * FROM pg_catalog.pg_class
WHERE relname = 'my_topic';Querying Your Data
Basic Queries
Once connected, you can query your streaming data using standard SQL:
-- List all available topics (tables)
SELECT table_name FROM information_schema.tables
WHERE table_schema = 'public';
-- Query a specific topic
SELECT * FROM your_topic_name
ORDER BY event_time DESC
LIMIT 100;
-- Aggregate analysis
SELECT
date_trunc('hour', event_time) as hour,
count(*) as event_count,
avg(value) as avg_value
FROM sensor_data
WHERE event_time >= now() - interval '24 hours'
GROUP BY hour
ORDER BY hour;Cross-Topic Joins
Join data across different topics:
-- Join sensor data with device metadata
SELECT
s.device_id,
d.device_name,
s.temperature,
s.humidity,
s.event_time
FROM sensor_readings s
JOIN device_metadata d ON s.device_id = d.device_id
WHERE s.event_time >= now() - interval '1 hour'
ORDER BY s.event_time DESC;Advanced Analysis
-- Time-series analysis with window functions
SELECT
device_id,
temperature,
event_time,
avg(temperature) OVER (
PARTITION BY device_id
ORDER BY event_time
ROWS BETWEEN 10 PRECEDING AND CURRENT ROW
) as moving_avg
FROM sensor_readings
WHERE event_time >= now() - interval '6 hours'
ORDER BY device_id, event_time;TLS Configuration
Pro Tier Feature
TLS encryption for the PGWire server is available in the Pro tier only. Free tier users can connect without TLS encryption.
Free Tier Behavior:
- TLS configuration is ignored in free tier builds
- Warning messages are logged if TLS environment variables are set
- All connections use unencrypted PostgreSQL protocol
- Upgrade to Pro tier for TLS encryption support
Enable TLS
For production deployments with Pro tier, enable TLS encryption:
pgwire:
enabled: true
port: 5432
tls:
enabled: true
cert_path: "/etc/ssl/certs/pgwire.crt"
key_path: "/etc/ssl/private/pgwire.key"Generate Self-Signed Certificates (Development)
# Generate self-signed certificate for development
mkdir -p certs
openssl req -x509 -newkey rsa:4096 \
-keyout certs/pgwire.key \
-out certs/pgwire.crt \
-days 365 -nodes \
-subj "/CN=localhost"
# Update configuration
PGWIRE_TLS_ENABLED=true
PGWIRE_TLS_CERT_PATH=certs/pgwire.crt
PGWIRE_TLS_KEY_PATH=certs/pgwire.keyConnecting with TLS
psql with TLS:
psql "postgresql://your.email%40company.com:PASSWORD@localhost:5432/your_ducklake?sslmode=require"DBeaver with TLS:
- Enable SSL in connection settings
- Set SSL Mode to "require"
- For self-signed certificates, disable certificate validation
Database Integration
DuckDB Persistence Integration
The PGWire server automatically integrates with DuckDB persistence:
# Enable both DuckDB persistence and PGWire
duckdb_persistence:
enabled: true
storage_path: "/data/duckdb"
max_writers: 16
pgwire:
enabled: true
port: 5432
initialization_sql: |
-- Custom initialization SQL
INSTALL icu;
LOAD icu;
SET timezone = 'UTC';Automatic Database Discovery
The server automatically discovers and attaches:
- Topic Databases: Individual databases for each topic
- Unified Views: Cross-database views for each topic
- System Catalogs: PostgreSQL-compatible system tables
Database Refresh
The server periodically refreshes database connections:
- Automatic Discovery: New databases are automatically attached
- View Updates: Topic views are refreshed to include new data
- Configurable Interval: Set via
refresh_interval_seconds
Performance Optimization
Connection Management
- Per-Connection Processors: Each connection gets its own DuckDB processor
- Resource Cleanup: Automatic cleanup of cursors and prepared statements
- Connection Pooling: Efficient resource management
Query Optimization
-- Use LIMIT for exploratory queries
SELECT * FROM large_topic LIMIT 1000;
-- Use time-based filtering
SELECT * FROM events
WHERE event_time >= now() - interval '1 hour';
-- Leverage DuckDB's analytical capabilities
SELECT
date_trunc('minute', event_time) as minute,
count(*) as events,
approx_quantile(value, 0.5) as median_value
FROM sensor_data
WHERE event_time >= now() - interval '24 hours'
GROUP BY minute
ORDER BY minute;Memory Management
- Cursor Streaming: Large result sets are streamed efficiently
- Batch Processing: Respects DuckDB's internal batching (2048 rows)
- Automatic Cleanup: Connection state is cleaned up properly
Troubleshooting
Common Connection Issues
Connection Refused:
# Check if PGWire server is enabled
grep -i pgwire your-config.yaml
# Check server logs
tail -f boilstream.log | grep -i pgwireAuthentication Failures:
# Verify you're using credentials from Web Auth GUI
# Navigate to https://your-server/auth and copy fresh credentials
# Check that your session hasn't expired
# Credentials have a TTL (default 8 hours)Query Issues
No Tables Found:
-- Check if DuckDB persistence is enabled
SELECT name FROM duckdb_databases();
-- Check available schemas
SELECT schema_name FROM information_schema.schemata;
-- Force refresh (if needed)
-- The server automatically refreshes every refresh_interval_secondsType Conversion Errors:
-- Check column types
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'your_table';
-- Use explicit casting if needed
SELECT cast(column_name as integer) FROM your_table;Performance Issues
Slow Queries:
-- Check query execution plan
EXPLAIN SELECT * FROM your_table WHERE condition;
-- Add appropriate filters
SELECT * FROM large_table
WHERE event_time >= now() - interval '1 hour'
LIMIT 10000;Memory Issues:
- Reduce result set size with LIMIT
- Use streaming with cursors for large results
- Filter data by time ranges
Security Considerations
Authentication
- Web Auth GUI: All credentials vended through secure web interface
- OAuth/SAML Support: Enterprise SSO integration
- Temporary Passwords: Credentials expire based on session TTL
- No Static Credentials: No hardcoded passwords in configuration
Network Security
- TLS Encryption: Always use TLS in production
- Firewall Rules: Restrict access to PGWire port
- Network Segmentation: Consider running on internal networks only
Data Security
- Role-Based Access: Users see only data they're authorized to access
- Time-Based Filtering: Implement time-based access controls
- Audit Logging: Monitor query patterns and access
Advanced Configuration
Custom Initialization SQL
pgwire:
initialization_sql: |
-- Install extensions
INSTALL icu;
INSTALL json;
LOAD icu;
LOAD json;
-- Set timezone
SET timezone = 'UTC';
-- Configure memory settings
SET memory_limit = '8GB';
SET threads = 8;
-- Create custom functions
CREATE MACRO format_timestamp(ts) AS strftime(ts, '%Y-%m-%d %H:%M:%S');Connection Settings
pgwire:
enabled: true
port: 5432
refresh_interval_seconds: 30 # Longer intervals for production
query_timeout_seconds: 300 # 5 minute query timeout
max_connections: 100 # Connection limit
idle_timeout_seconds: 900 # 15 minute idle timeoutIntegration Examples
Grafana Integration
# Grafana PostgreSQL data source configuration
apiVersion: 1
datasources:
- name: BoilStream
type: postgres
url: localhost:5432
database: your_ducklake
user: your.email@company.com # From Web Auth GUI
password: $GRAFANA_PG_PASSWORD # Store securely
sslmode: requireTableau Integration
- Connect to PostgreSQL data source
- Server:
localhost - Port:
5432 - Database:
your_ducklake - Username/Password: From Web Auth GUI
Power BI Integration
Use the PostgreSQL connector with credentials from the Web Auth GUI.
Best Practices
Query Patterns
-- Efficient time-based queries
SELECT * FROM events
WHERE event_time >= now() - interval '1 hour'
AND event_type = 'sensor_reading';
-- Aggregation with time bucketing
SELECT
date_trunc('minute', event_time) as time_bucket,
count(*) as event_count
FROM events
WHERE event_time >= now() - interval '24 hours'
GROUP BY time_bucket
ORDER BY time_bucket;Connection Management
- Reuse Connections: Don't create new connections for each query
- Close Connections: Properly close connections when done
- Refresh Credentials: Get new credentials before TTL expires
- Monitor Usage: Track connection patterns and usage
Data Modeling
- Time-Based Partitioning: Leverage time-based filtering
- Appropriate Data Types: Use efficient data types
- Indexing Strategy: Understand DuckDB's indexing capabilities
This comprehensive guide covers all aspects of using the PGWire server with BoilStream. The server provides a powerful way to connect your favorite BI tools and analytical applications directly to your streaming data infrastructure.