PGWire Server
BoilStream includes a built-in PostgreSQL wire protocol server that enables any PostgreSQL-compatible client to connect directly to your streaming data. This allows BI tools like DBeaver, Tableau, Power BI, and command-line tools like psql to query your real-time data using familiar PostgreSQL syntax.
Overview
The PGWire server acts as a bridge between PostgreSQL clients and DuckDB, providing:
- Full PostgreSQL Protocol Compatibility: Works with any PostgreSQL client
- Real-time Data Access: Query streaming data as it arrives
- Cross-Topic Analysis: Join data across different topics using standard SQL
- High Performance: Leverages DuckDB's analytical query engine
- Enterprise Ready: TLS encryption (Pro tier), authentication, and query cancellation
Key Features
Protocol Support
- Simple Query Protocol: Standard SQL queries with immediate execution
- Extended Query Protocol: Prepared statements with parameter binding
- Cursor Support: Efficient handling of large result sets
- Text and Binary Encoding: Full support for both data formats
Data Types
- Comprehensive Type Mapping: All PostgreSQL data types supported
- UUID Support: Native UUID handling with proper metadata
- Array Types: Full array support including complex nested arrays
- JSON Support: Query JSON data with PostgreSQL JSON functions
Advanced Features
- Prepared Statements: Full parameter binding and type inference
- Query Cancellation: Standard PostgreSQL CancelRequest support
- Connection Pooling: Efficient per-connection resource management
- Automatic Schema Discovery: Dynamic database and table discovery
Quick Start
Basic Configuration
Enable the PGWire server in your configuration:
# Enable PGWire server
pgwire:
enabled: true
port: 5432
username: "boilstream"
password: "boilstream"
Environment Variables
# Enable PGWire server
PGWIRE_ENABLED=true
PGWIRE_PORT=5432
PGWIRE_USERNAME=boilstream
PGWIRE_PASSWORD=boilstream
# Start BoilStream
boilstream --config your-config.yaml
Connecting with BI Tools
DBeaver (Recommended)
BoilStream has been extensively tested with DBeaver, providing full schema browsing and query execution capabilities.
Connection Setup:
- Create a new PostgreSQL connection
- Set the following parameters:
- Host:
localhost
(or your server address) - Port:
5432
(or your configured port) - Database:
boilstream
- Username:
boilstream
(or your configured username) - Password:
boilstream
(or your configured password)
- Host:
Features Available:
- Schema browsing and table discovery
- Query execution with syntax highlighting
- Data export and visualization
- Prepared statement support
- Large result set handling with cursors
psql Command Line
Connect using the standard PostgreSQL command-line client:
# Basic connection
psql -h localhost -p 5432 -U boilstream -d boilstream
# With password prompt
psql -h localhost -p 5432 -U boilstream -d boilstream -W
# Connection string format
psql "postgresql://boilstream:boilstream@localhost:5432/boilstream"
JDBC/ODBC Connections
JDBC Connection String:
String url = "jdbc:postgresql://localhost:5432/boilstream";
Properties props = new Properties();
props.setProperty("user", "boilstream");
props.setProperty("password", "boilstream");
Connection conn = DriverManager.getConnection(url, props);
ODBC Connection String:
Driver=PostgreSQL Unicode;Server=localhost;Port=5432;Database=boilstream;Uid=boilstream;Pwd=boilstream;
Querying Your Data
Basic Queries
Once connected, you can query your streaming data using standard SQL:
-- List all available topics (tables)
SELECT table_name FROM information_schema.tables
WHERE table_schema = 'public';
-- Query a specific topic
SELECT * FROM your_topic_name
ORDER BY event_time DESC
LIMIT 100;
-- Aggregate analysis
SELECT
date_trunc('hour', event_time) as hour,
count(*) as event_count,
avg(value) as avg_value
FROM sensor_data
WHERE event_time >= now() - interval '24 hours'
GROUP BY hour
ORDER BY hour;
Cross-Topic Joins
Join data across different topics:
-- Join sensor data with device metadata
SELECT
s.device_id,
d.device_name,
s.temperature,
s.humidity,
s.event_time
FROM sensor_readings s
JOIN device_metadata d ON s.device_id = d.device_id
WHERE s.event_time >= now() - interval '1 hour'
ORDER BY s.event_time DESC;
Advanced Analysis
-- Time-series analysis with window functions
SELECT
device_id,
temperature,
event_time,
avg(temperature) OVER (
PARTITION BY device_id
ORDER BY event_time
ROWS BETWEEN 10 PRECEDING AND CURRENT ROW
) as moving_avg
FROM sensor_readings
WHERE event_time >= now() - interval '6 hours'
ORDER BY device_id, event_time;
TLS Configuration
Pro Tier Feature
TLS encryption for the PGWire server is available in the Pro tier only. Free tier users can connect without TLS encryption.
Free Tier Behavior:
- TLS configuration is ignored in free tier builds
- Warning messages are logged if TLS environment variables are set
- All connections use unencrypted PostgreSQL protocol
- Upgrade to Pro tier for TLS encryption support
Enable TLS
For production deployments with Pro tier, enable TLS encryption:
pgwire:
enabled: true
port: 5432
username: "analyst"
password: "secure_password"
tls:
enabled: true
cert_path: "/etc/ssl/certs/pgwire.crt"
key_path: "/etc/ssl/private/pgwire.key"
Generate Self-Signed Certificates (Development)
# Generate self-signed certificate for development
mkdir -p certs
openssl req -x509 -newkey rsa:4096 \
-keyout certs/pgwire.key \
-out certs/pgwire.crt \
-days 365 -nodes \
-subj "/CN=localhost"
# Update configuration
PGWIRE_TLS_ENABLED=true
PGWIRE_TLS_CERT_PATH=certs/pgwire.crt
PGWIRE_TLS_KEY_PATH=certs/pgwire.key
Connecting with TLS
psql with TLS:
psql "postgresql://boilstream:boilstream@localhost:5432/boilstream?sslmode=require"
DBeaver with TLS:
- Enable SSL in connection settings
- Set SSL Mode to "require"
- For self-signed certificates, disable certificate validation
Database Integration
DuckDB Persistence Integration
The PGWire server automatically integrates with DuckDB persistence:
# Enable both DuckDB persistence and PGWire
duckdb_persistence:
enabled: true
storage_path: "/data/duckdb"
max_writers: 16
pgwire:
enabled: true
port: 5432
initialization_sql: |
-- Custom initialization SQL
INSTALL icu;
LOAD icu;
SET timezone = 'UTC';
Automatic Database Discovery
The server automatically discovers and attaches:
- Topic Databases: Individual databases for each topic
- Unified Views: Cross-database views for each topic
- System Catalogs: PostgreSQL-compatible system tables
Database Refresh
The server periodically refreshes database connections:
- Automatic Discovery: New databases are automatically attached
- View Updates: Topic views are refreshed to include new data
- Configurable Interval: Set via
refresh_interval_seconds
Performance Optimization
Connection Management
- Per-Connection Processors: Each connection gets its own DuckDB processor
- Resource Cleanup: Automatic cleanup of cursors and prepared statements
- Connection Pooling: Efficient resource management
Query Optimization
-- Use LIMIT for exploratory queries
SELECT * FROM large_topic LIMIT 1000;
-- Use time-based filtering
SELECT * FROM events
WHERE event_time >= now() - interval '1 hour';
-- Leverage DuckDB's analytical capabilities
SELECT
date_trunc('minute', event_time) as minute,
count(*) as events,
approx_quantile(value, 0.5) as median_value
FROM sensor_data
WHERE event_time >= now() - interval '24 hours'
GROUP BY minute
ORDER BY minute;
Memory Management
- Cursor Streaming: Large result sets are streamed efficiently
- Batch Processing: Respects DuckDB's internal batching (2048 rows)
- Automatic Cleanup: Connection state is cleaned up properly
Troubleshooting
Common Connection Issues
Connection Refused:
# Check if PGWire server is enabled
grep -i pgwire your-config.yaml
# Check server logs
tail -f boilstream.log | grep -i pgwire
Authentication Failures:
# Verify credentials
psql -h localhost -p 5432 -U boilstream -d boilstream
# Check configuration
echo "Username: $PGWIRE_USERNAME"
echo "Password: $PGWIRE_PASSWORD"
Query Issues
No Tables Found:
-- Check if DuckDB persistence is enabled
SELECT name FROM duckdb_databases();
-- Check available schemas
SELECT schema_name FROM information_schema.schemata;
-- Force refresh (if needed)
-- The server automatically refreshes every refresh_interval_seconds
Type Conversion Errors:
-- Check column types
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'your_table';
-- Use explicit casting if needed
SELECT cast(column_name as integer) FROM your_table;
Performance Issues
Slow Queries:
-- Check query execution plan
EXPLAIN SELECT * FROM your_table WHERE condition;
-- Add appropriate filters
SELECT * FROM large_table
WHERE event_time >= now() - interval '1 hour'
LIMIT 10000;
Memory Issues:
- Reduce result set size with LIMIT
- Use streaming with cursors for large results
- Filter data by time ranges
Security Considerations
Authentication
- Change Default Credentials: Always use strong passwords in production
- Separate Authentication: PGWire uses separate auth from main BoilStream auth
- Connection Limits: Consider implementing connection limits
Network Security
- TLS Encryption: Always use TLS in production
- Firewall Rules: Restrict access to PGWire port
- Network Segmentation: Consider running on internal networks only
Data Security
- Read-Only Access: PGWire provides read-only access to data
- Time-Based Filtering: Implement time-based access controls
- Audit Logging: Monitor query patterns and access
Advanced Configuration
Custom Initialization SQL
pgwire:
initialization_sql: |
-- Install extensions
INSTALL icu;
INSTALL json;
LOAD icu;
LOAD json;
-- Set timezone
SET timezone = 'UTC';
-- Configure memory settings
SET memory_limit = '8GB';
SET threads = 8;
-- Create custom functions
CREATE MACRO format_timestamp(ts) AS strftime(ts, '%Y-%m-%d %H:%M:%S');
Connection Pooling
pgwire:
enabled: true
port: 5432
refresh_interval_seconds: 30 # Longer intervals for production
username: "analyst"
password: "secure_password"
Multi-Port Configuration
# Different ports for different purposes
pgwire:
enabled: true
port: 5432 # Main port
# You can run multiple instances with different configs
# by using different configuration files
Integration Examples
Grafana Integration
# Grafana PostgreSQL data source configuration
apiVersion: 1
datasources:
- name: BoilStream
type: postgres
url: localhost:5432
database: boilstream
user: boilstream
password: boilstream
sslmode: disable
Tableau Integration
- Connect to PostgreSQL data source
- Server:
localhost
- Port:
5432
- Database:
boilstream
- Username/Password: As configured
Power BI Integration
Use the PostgreSQL connector with your BoilStream PGWire server configuration.
Best Practices
Query Patterns
-- Efficient time-based queries
SELECT * FROM events
WHERE event_time >= now() - interval '1 hour'
AND event_type = 'sensor_reading';
-- Aggregation with time bucketing
SELECT
date_trunc('minute', event_time) as time_bucket,
count(*) as event_count
FROM events
WHERE event_time >= now() - interval '24 hours'
GROUP BY time_bucket
ORDER BY time_bucket;
Connection Management
- Reuse Connections: Don't create new connections for each query
- Close Connections: Properly close connections when done
- Monitor Usage: Track connection patterns and usage
Data Modeling
- Time-Based Partitioning: Leverage time-based filtering
- Appropriate Data Types: Use efficient data types
- Indexing Strategy: Understand DuckDB's indexing capabilities
This comprehensive guide covers all aspects of using the PGWire server with BoilStream. The server provides a powerful way to connect your favorite BI tools and analytical applications directly to your streaming data infrastructure.