Skip to content

PGWire Server

BoilStream includes a built-in PostgreSQL wire protocol server that enables any PostgreSQL-compatible client to connect directly to your streaming data. This allows BI tools like DBeaver, Tableau, Power BI, and command-line tools like psql to query your real-time data using familiar PostgreSQL syntax.

Overview

The PGWire server acts as a bridge between PostgreSQL clients and DuckDB, providing:

  • Full PostgreSQL Protocol Compatibility: Works with any PostgreSQL client
  • Real-time Data Access: Query streaming data as it arrives
  • Cross-Topic Analysis: Join data across different topics using standard SQL
  • High Performance: Leverages DuckDB's analytical query engine
  • Enterprise Ready: TLS encryption (Pro tier), authentication, and query cancellation

Key Features

Protocol Support

  • Simple Query Protocol: Standard SQL queries with immediate execution
  • Extended Query Protocol: Prepared statements with parameter binding
  • Cursor Support: Efficient handling of large result sets
  • Text and Binary Encoding: Full support for both data formats

Data Types

  • Comprehensive Type Mapping: All PostgreSQL data types supported
  • UUID Support: Native UUID handling with proper metadata
  • Array Types: Full array support including complex nested arrays
  • JSON Support: Query JSON data with PostgreSQL JSON functions

Advanced Features

  • Prepared Statements: Full parameter binding and type inference
  • Query Cancellation: Standard PostgreSQL CancelRequest support
  • Connection Pooling: Efficient per-connection resource management
  • Automatic Schema Discovery: Dynamic database and table discovery

Quick Start

Basic Configuration

Enable the PGWire server in your configuration:

yaml
# Enable PGWire server
pgwire:
  enabled: true
  port: 5432
  username: "boilstream"
  password: "boilstream"

Environment Variables

bash
# Enable PGWire server
PGWIRE_ENABLED=true
PGWIRE_PORT=5432
PGWIRE_USERNAME=boilstream
PGWIRE_PASSWORD=boilstream

# Start BoilStream
boilstream --config your-config.yaml

Connecting with BI Tools

BoilStream has been extensively tested with DBeaver, providing full schema browsing and query execution capabilities.

Connection Setup:

  1. Create a new PostgreSQL connection
  2. Set the following parameters:
    • Host: localhost (or your server address)
    • Port: 5432 (or your configured port)
    • Database: boilstream
    • Username: boilstream (or your configured username)
    • Password: boilstream (or your configured password)

Features Available:

  • Schema browsing and table discovery
  • Query execution with syntax highlighting
  • Data export and visualization
  • Prepared statement support
  • Large result set handling with cursors

psql Command Line

Connect using the standard PostgreSQL command-line client:

bash
# Basic connection
psql -h localhost -p 5432 -U boilstream -d boilstream

# With password prompt
psql -h localhost -p 5432 -U boilstream -d boilstream -W

# Connection string format
psql "postgresql://boilstream:boilstream@localhost:5432/boilstream"

JDBC/ODBC Connections

JDBC Connection String:

java
String url = "jdbc:postgresql://localhost:5432/boilstream";
Properties props = new Properties();
props.setProperty("user", "boilstream");
props.setProperty("password", "boilstream");
Connection conn = DriverManager.getConnection(url, props);

ODBC Connection String:

Driver=PostgreSQL Unicode;Server=localhost;Port=5432;Database=boilstream;Uid=boilstream;Pwd=boilstream;

Querying Your Data

Basic Queries

Once connected, you can query your streaming data using standard SQL:

sql
-- List all available topics (tables)
SELECT table_name FROM information_schema.tables 
WHERE table_schema = 'public';

-- Query a specific topic
SELECT * FROM your_topic_name 
ORDER BY event_time DESC 
LIMIT 100;

-- Aggregate analysis
SELECT 
    date_trunc('hour', event_time) as hour,
    count(*) as event_count,
    avg(value) as avg_value
FROM sensor_data 
WHERE event_time >= now() - interval '24 hours'
GROUP BY hour
ORDER BY hour;

Cross-Topic Joins

Join data across different topics:

sql
-- Join sensor data with device metadata
SELECT 
    s.device_id,
    d.device_name,
    s.temperature,
    s.humidity,
    s.event_time
FROM sensor_readings s
JOIN device_metadata d ON s.device_id = d.device_id
WHERE s.event_time >= now() - interval '1 hour'
ORDER BY s.event_time DESC;

Advanced Analysis

sql
-- Time-series analysis with window functions
SELECT 
    device_id,
    temperature,
    event_time,
    avg(temperature) OVER (
        PARTITION BY device_id 
        ORDER BY event_time 
        ROWS BETWEEN 10 PRECEDING AND CURRENT ROW
    ) as moving_avg
FROM sensor_readings
WHERE event_time >= now() - interval '6 hours'
ORDER BY device_id, event_time;

TLS Configuration

Pro Tier Feature

TLS encryption for the PGWire server is available in the Pro tier only. Free tier users can connect without TLS encryption.

Free Tier Behavior:

  • TLS configuration is ignored in free tier builds
  • Warning messages are logged if TLS environment variables are set
  • All connections use unencrypted PostgreSQL protocol
  • Upgrade to Pro tier for TLS encryption support

Enable TLS

For production deployments with Pro tier, enable TLS encryption:

yaml
pgwire:
  enabled: true
  port: 5432
  username: "analyst"
  password: "secure_password"
  tls:
    enabled: true
    cert_path: "/etc/ssl/certs/pgwire.crt"
    key_path: "/etc/ssl/private/pgwire.key"

Generate Self-Signed Certificates (Development)

bash
# Generate self-signed certificate for development
mkdir -p certs
openssl req -x509 -newkey rsa:4096 \
  -keyout certs/pgwire.key \
  -out certs/pgwire.crt \
  -days 365 -nodes \
  -subj "/CN=localhost"

# Update configuration
PGWIRE_TLS_ENABLED=true
PGWIRE_TLS_CERT_PATH=certs/pgwire.crt
PGWIRE_TLS_KEY_PATH=certs/pgwire.key

Connecting with TLS

psql with TLS:

bash
psql "postgresql://boilstream:boilstream@localhost:5432/boilstream?sslmode=require"

DBeaver with TLS:

  1. Enable SSL in connection settings
  2. Set SSL Mode to "require"
  3. For self-signed certificates, disable certificate validation

Database Integration

DuckDB Persistence Integration

The PGWire server automatically integrates with DuckDB persistence:

yaml
# Enable both DuckDB persistence and PGWire
duckdb_persistence:
  enabled: true
  storage_path: "/data/duckdb"
  max_writers: 16

pgwire:
  enabled: true
  port: 5432
  initialization_sql: |
    -- Custom initialization SQL
    INSTALL icu;
    LOAD icu;
    SET timezone = 'UTC';

Automatic Database Discovery

The server automatically discovers and attaches:

  • Topic Databases: Individual databases for each topic
  • Unified Views: Cross-database views for each topic
  • System Catalogs: PostgreSQL-compatible system tables

Database Refresh

The server periodically refreshes database connections:

  • Automatic Discovery: New databases are automatically attached
  • View Updates: Topic views are refreshed to include new data
  • Configurable Interval: Set via refresh_interval_seconds

Performance Optimization

Connection Management

  • Per-Connection Processors: Each connection gets its own DuckDB processor
  • Resource Cleanup: Automatic cleanup of cursors and prepared statements
  • Connection Pooling: Efficient resource management

Query Optimization

sql
-- Use LIMIT for exploratory queries
SELECT * FROM large_topic LIMIT 1000;

-- Use time-based filtering
SELECT * FROM events 
WHERE event_time >= now() - interval '1 hour';

-- Leverage DuckDB's analytical capabilities
SELECT 
    date_trunc('minute', event_time) as minute,
    count(*) as events,
    approx_quantile(value, 0.5) as median_value
FROM sensor_data
WHERE event_time >= now() - interval '24 hours'
GROUP BY minute
ORDER BY minute;

Memory Management

  • Cursor Streaming: Large result sets are streamed efficiently
  • Batch Processing: Respects DuckDB's internal batching (2048 rows)
  • Automatic Cleanup: Connection state is cleaned up properly

Troubleshooting

Common Connection Issues

Connection Refused:

bash
# Check if PGWire server is enabled
grep -i pgwire your-config.yaml

# Check server logs
tail -f boilstream.log | grep -i pgwire

Authentication Failures:

bash
# Verify credentials
psql -h localhost -p 5432 -U boilstream -d boilstream

# Check configuration
echo "Username: $PGWIRE_USERNAME"
echo "Password: $PGWIRE_PASSWORD"

Query Issues

No Tables Found:

sql
-- Check if DuckDB persistence is enabled
SELECT name FROM duckdb_databases();

-- Check available schemas
SELECT schema_name FROM information_schema.schemata;

-- Force refresh (if needed)
-- The server automatically refreshes every refresh_interval_seconds

Type Conversion Errors:

sql
-- Check column types
SELECT column_name, data_type 
FROM information_schema.columns 
WHERE table_name = 'your_table';

-- Use explicit casting if needed
SELECT cast(column_name as integer) FROM your_table;

Performance Issues

Slow Queries:

sql
-- Check query execution plan
EXPLAIN SELECT * FROM your_table WHERE condition;

-- Add appropriate filters
SELECT * FROM large_table 
WHERE event_time >= now() - interval '1 hour'
LIMIT 10000;

Memory Issues:

  • Reduce result set size with LIMIT
  • Use streaming with cursors for large results
  • Filter data by time ranges

Security Considerations

Authentication

  • Change Default Credentials: Always use strong passwords in production
  • Separate Authentication: PGWire uses separate auth from main BoilStream auth
  • Connection Limits: Consider implementing connection limits

Network Security

  • TLS Encryption: Always use TLS in production
  • Firewall Rules: Restrict access to PGWire port
  • Network Segmentation: Consider running on internal networks only

Data Security

  • Read-Only Access: PGWire provides read-only access to data
  • Time-Based Filtering: Implement time-based access controls
  • Audit Logging: Monitor query patterns and access

Advanced Configuration

Custom Initialization SQL

yaml
pgwire:
  initialization_sql: |
    -- Install extensions
    INSTALL icu;
    INSTALL json;
    LOAD icu;
    LOAD json;
    
    -- Set timezone
    SET timezone = 'UTC';
    
    -- Configure memory settings
    SET memory_limit = '8GB';
    SET threads = 8;
    
    -- Create custom functions
    CREATE MACRO format_timestamp(ts) AS strftime(ts, '%Y-%m-%d %H:%M:%S');

Connection Pooling

yaml
pgwire:
  enabled: true
  port: 5432
  refresh_interval_seconds: 30  # Longer intervals for production
  username: "analyst"
  password: "secure_password"

Multi-Port Configuration

yaml
# Different ports for different purposes
pgwire:
  enabled: true
  port: 5432  # Main port
  
# You can run multiple instances with different configs
# by using different configuration files

Integration Examples

Grafana Integration

yaml
# Grafana PostgreSQL data source configuration
apiVersion: 1
datasources:
  - name: BoilStream
    type: postgres
    url: localhost:5432
    database: boilstream
    user: boilstream
    password: boilstream
    sslmode: disable

Tableau Integration

  1. Connect to PostgreSQL data source
  2. Server: localhost
  3. Port: 5432
  4. Database: boilstream
  5. Username/Password: As configured

Power BI Integration

Use the PostgreSQL connector with your BoilStream PGWire server configuration.

Best Practices

Query Patterns

sql
-- Efficient time-based queries
SELECT * FROM events 
WHERE event_time >= now() - interval '1 hour'
AND event_type = 'sensor_reading';

-- Aggregation with time bucketing
SELECT 
    date_trunc('minute', event_time) as time_bucket,
    count(*) as event_count
FROM events
WHERE event_time >= now() - interval '24 hours'
GROUP BY time_bucket
ORDER BY time_bucket;

Connection Management

  • Reuse Connections: Don't create new connections for each query
  • Close Connections: Properly close connections when done
  • Monitor Usage: Track connection patterns and usage

Data Modeling

  • Time-Based Partitioning: Leverage time-based filtering
  • Appropriate Data Types: Use efficient data types
  • Indexing Strategy: Understand DuckDB's indexing capabilities

This comprehensive guide covers all aspects of using the PGWire server with BoilStream. The server provides a powerful way to connect your favorite BI tools and analytical applications directly to your streaming data infrastructure.