PGWire Server

BoilStream includes a built-in PostgreSQL wire protocol server that enables any PostgreSQL-compatible client to connect directly to your streaming data. This allows BI tools like DBeaver, Tableau, Power BI, and command-line tools like psql to query your real-time data using familiar PostgreSQL syntax.

Overview

The PGWire server acts as a bridge between PostgreSQL clients and DuckDB, providing:

Full PostgreSQL Protocol Compatibility: Works with any PostgreSQL client
Real-time Data Access: Query streaming data as it arrives
Cross-Topic Analysis: Join data across different topics using standard SQL
High Performance: Leverages DuckDB's analytical query engine
Enterprise Ready: TLS encryption (Pro tier), authentication, and query cancellation

Key Features

Protocol Support

Simple Query Protocol: Standard SQL queries with immediate execution
Extended Query Protocol: Prepared statements with parameter binding
Cursor Support: Efficient handling of large result sets
Text and Binary Encoding: Full support for both data formats

Data Types

Comprehensive Type Mapping: All PostgreSQL data types supported
UUID Support: Native UUID handling with proper metadata
Array Types: Full array support including complex nested arrays
JSON Support: Query JSON data with PostgreSQL JSON functions

Advanced Features

Prepared Statements: Full parameter binding and type inference
Query Cancellation: Standard PostgreSQL CancelRequest support
Connection Pooling: Efficient per-connection resource management
Automatic Schema Discovery: Dynamic database and table discovery

Quick Start

Basic Configuration

Enable the PGWire server in your configuration:

yaml

# Enable PGWire server
pgwire:
  enabled: true
  port: 5432
  username: "boilstream"
  password: "boilstream"

Environment Variables

bash

# Enable PGWire server
PGWIRE_ENABLED=true
PGWIRE_PORT=5432
PGWIRE_USERNAME=boilstream
PGWIRE_PASSWORD=boilstream

# Start BoilStream
boilstream --config your-config.yaml

Connecting with BI Tools

DBeaver (Recommended)

BoilStream has been extensively tested with DBeaver, providing full schema browsing and query execution capabilities.

Connection Setup:

Create a new PostgreSQL connection
Set the following parameters:
- Host: localhost (or your server address)
- Port: 5432 (or your configured port)
- Database: boilstream
- Username: boilstream (or your configured username)
- Password: boilstream (or your configured password)

Features Available:

Schema browsing and table discovery
Query execution with syntax highlighting
Data export and visualization
Prepared statement support
Large result set handling with cursors

psql Command Line

Connect using the standard PostgreSQL command-line client:

bash

# Basic connection
psql -h localhost -p 5432 -U boilstream -d boilstream

# With password prompt
psql -h localhost -p 5432 -U boilstream -d boilstream -W

# Connection string format
psql "postgresql://boilstream:boilstream@localhost:5432/boilstream"

JDBC/ODBC Connections

JDBC Connection String:

java

String url = "jdbc:postgresql://localhost:5432/boilstream";
Properties props = new Properties();
props.setProperty("user", "boilstream");
props.setProperty("password", "boilstream");
Connection conn = DriverManager.getConnection(url, props);

ODBC Connection String:

Driver=PostgreSQL Unicode;Server=localhost;Port=5432;Database=boilstream;Uid=boilstream;Pwd=boilstream;

Querying Your Data

Basic Queries

Once connected, you can query your streaming data using standard SQL:

sql

-- List all available topics (tables)
SELECT table_name FROM information_schema.tables 
WHERE table_schema = 'public';

-- Query a specific topic
SELECT * FROM your_topic_name 
ORDER BY event_time DESC 
LIMIT 100;

-- Aggregate analysis
SELECT 
    date_trunc('hour', event_time) as hour,
    count(*) as event_count,
    avg(value) as avg_value
FROM sensor_data 
WHERE event_time >= now() - interval '24 hours'
GROUP BY hour
ORDER BY hour;

Cross-Topic Joins

Join data across different topics:

sql

-- Join sensor data with device metadata
SELECT 
    s.device_id,
    d.device_name,
    s.temperature,
    s.humidity,
    s.event_time
FROM sensor_readings s
JOIN device_metadata d ON s.device_id = d.device_id
WHERE s.event_time >= now() - interval '1 hour'
ORDER BY s.event_time DESC;

Advanced Analysis

sql

-- Time-series analysis with window functions
SELECT 
    device_id,
    temperature,
    event_time,
    avg(temperature) OVER (
        PARTITION BY device_id 
        ORDER BY event_time 
        ROWS BETWEEN 10 PRECEDING AND CURRENT ROW
    ) as moving_avg
FROM sensor_readings
WHERE event_time >= now() - interval '6 hours'
ORDER BY device_id, event_time;

TLS Configuration

Pro Tier Feature

TLS encryption for the PGWire server is available in the Pro tier only. Free tier users can connect without TLS encryption.

Free Tier Behavior:

TLS configuration is ignored in free tier builds
Warning messages are logged if TLS environment variables are set
All connections use unencrypted PostgreSQL protocol
Upgrade to Pro tier for TLS encryption support

Enable TLS

For production deployments with Pro tier, enable TLS encryption:

yaml

pgwire:
  enabled: true
  port: 5432
  username: "analyst"
  password: "secure_password"
  tls:
    enabled: true
    cert_path: "/etc/ssl/certs/pgwire.crt"
    key_path: "/etc/ssl/private/pgwire.key"

Generate Self-Signed Certificates (Development)

bash

# Generate self-signed certificate for development
mkdir -p certs
openssl req -x509 -newkey rsa:4096 \
  -keyout certs/pgwire.key \
  -out certs/pgwire.crt \
  -days 365 -nodes \
  -subj "/CN=localhost"

# Update configuration
PGWIRE_TLS_ENABLED=true
PGWIRE_TLS_CERT_PATH=certs/pgwire.crt
PGWIRE_TLS_KEY_PATH=certs/pgwire.key

Connecting with TLS

psql with TLS:

bash

psql "postgresql://boilstream:boilstream@localhost:5432/boilstream?sslmode=require"

DBeaver with TLS:

Enable SSL in connection settings
Set SSL Mode to "require"
For self-signed certificates, disable certificate validation

Database Integration

DuckDB Persistence Integration

The PGWire server automatically integrates with DuckDB persistence:

yaml

# Enable both DuckDB persistence and PGWire
duckdb_persistence:
  enabled: true
  storage_path: "/data/duckdb"
  max_writers: 16

pgwire:
  enabled: true
  port: 5432
  initialization_sql: |
    -- Custom initialization SQL
    INSTALL icu;
    LOAD icu;
    SET timezone = 'UTC';

Automatic Database Discovery

The server automatically discovers and attaches:

Topic Databases: Individual databases for each topic
Unified Views: Cross-database views for each topic
System Catalogs: PostgreSQL-compatible system tables

Database Refresh

The server periodically refreshes database connections:

Automatic Discovery: New databases are automatically attached
View Updates: Topic views are refreshed to include new data
Configurable Interval: Set via refresh_interval_seconds

Performance Optimization

Connection Management

Per-Connection Processors: Each connection gets its own DuckDB processor
Resource Cleanup: Automatic cleanup of cursors and prepared statements
Connection Pooling: Efficient resource management

Query Optimization

sql

-- Use LIMIT for exploratory queries
SELECT * FROM large_topic LIMIT 1000;

-- Use time-based filtering
SELECT * FROM events 
WHERE event_time >= now() - interval '1 hour';

-- Leverage DuckDB's analytical capabilities
SELECT 
    date_trunc('minute', event_time) as minute,
    count(*) as events,
    approx_quantile(value, 0.5) as median_value
FROM sensor_data
WHERE event_time >= now() - interval '24 hours'
GROUP BY minute
ORDER BY minute;

Memory Management

Cursor Streaming: Large result sets are streamed efficiently
Batch Processing: Respects DuckDB's internal batching (2048 rows)
Automatic Cleanup: Connection state is cleaned up properly

Troubleshooting

Common Connection Issues

Connection Refused:

bash

# Check if PGWire server is enabled
grep -i pgwire your-config.yaml

# Check server logs
tail -f boilstream.log | grep -i pgwire

Authentication Failures:

bash

# Verify credentials
psql -h localhost -p 5432 -U boilstream -d boilstream

# Check configuration
echo "Username: $PGWIRE_USERNAME"
echo "Password: $PGWIRE_PASSWORD"

Query Issues

No Tables Found:

sql

-- Check if DuckDB persistence is enabled
SELECT name FROM duckdb_databases();

-- Check available schemas
SELECT schema_name FROM information_schema.schemata;

-- Force refresh (if needed)
-- The server automatically refreshes every refresh_interval_seconds

Type Conversion Errors:

sql

-- Check column types
SELECT column_name, data_type 
FROM information_schema.columns 
WHERE table_name = 'your_table';

-- Use explicit casting if needed
SELECT cast(column_name as integer) FROM your_table;

Performance Issues

Slow Queries:

sql

-- Check query execution plan
EXPLAIN SELECT * FROM your_table WHERE condition;

-- Add appropriate filters
SELECT * FROM large_table 
WHERE event_time >= now() - interval '1 hour'
LIMIT 10000;

Memory Issues:

Reduce result set size with LIMIT
Use streaming with cursors for large results
Filter data by time ranges

Security Considerations

Authentication

Change Default Credentials: Always use strong passwords in production
Separate Authentication: PGWire uses separate auth from main BoilStream auth
Connection Limits: Consider implementing connection limits

Network Security

TLS Encryption: Always use TLS in production
Firewall Rules: Restrict access to PGWire port
Network Segmentation: Consider running on internal networks only

Data Security

Read-Only Access: PGWire provides read-only access to data
Time-Based Filtering: Implement time-based access controls
Audit Logging: Monitor query patterns and access

Advanced Configuration

Custom Initialization SQL

yaml

pgwire:
  initialization_sql: |
    -- Install extensions
    INSTALL icu;
    INSTALL json;
    LOAD icu;
    LOAD json;
    
    -- Set timezone
    SET timezone = 'UTC';
    
    -- Configure memory settings
    SET memory_limit = '8GB';
    SET threads = 8;
    
    -- Create custom functions
    CREATE MACRO format_timestamp(ts) AS strftime(ts, '%Y-%m-%d %H:%M:%S');

Connection Pooling

yaml

pgwire:
  enabled: true
  port: 5432
  refresh_interval_seconds: 30  # Longer intervals for production
  username: "analyst"
  password: "secure_password"

Multi-Port Configuration

yaml

# Different ports for different purposes
pgwire:
  enabled: true
  port: 5432  # Main port
  
# You can run multiple instances with different configs
# by using different configuration files

Integration Examples

Grafana Integration

yaml

# Grafana PostgreSQL data source configuration
apiVersion: 1
datasources:
  - name: BoilStream
    type: postgres
    url: localhost:5432
    database: boilstream
    user: boilstream
    password: boilstream
    sslmode: disable

Tableau Integration

Connect to PostgreSQL data source
Server: localhost
Port: 5432
Database: boilstream
Username/Password: As configured

Power BI Integration

Use the PostgreSQL connector with your BoilStream PGWire server configuration.

Best Practices

Query Patterns

sql

-- Efficient time-based queries
SELECT * FROM events 
WHERE event_time >= now() - interval '1 hour'
AND event_type = 'sensor_reading';

-- Aggregation with time bucketing
SELECT 
    date_trunc('minute', event_time) as time_bucket,
    count(*) as event_count
FROM events
WHERE event_time >= now() - interval '24 hours'
GROUP BY time_bucket
ORDER BY time_bucket;

Connection Management

Reuse Connections: Don't create new connections for each query
Close Connections: Properly close connections when done
Monitor Usage: Track connection patterns and usage

Data Modeling

Time-Based Partitioning: Leverage time-based filtering
Appropriate Data Types: Use efficient data types
Indexing Strategy: Understand DuckDB's indexing capabilities

This comprehensive guide covers all aspects of using the PGWire server with BoilStream. The server provides a powerful way to connect your favorite BI tools and analytical applications directly to your streaming data infrastructure.

PGWire Server ​

Overview ​

Key Features ​

Protocol Support ​

Data Types ​

Advanced Features ​

Quick Start ​

Basic Configuration ​

Environment Variables ​

Connecting with BI Tools ​

DBeaver (Recommended) ​

psql Command Line ​

JDBC/ODBC Connections ​

Querying Your Data ​

Basic Queries ​

Cross-Topic Joins ​

Advanced Analysis ​

TLS Configuration ​

Enable TLS ​

Generate Self-Signed Certificates (Development) ​

Connecting with TLS ​

Database Integration ​

DuckDB Persistence Integration ​

Automatic Database Discovery ​

Database Refresh ​

Performance Optimization ​

Connection Management ​

Query Optimization ​

Memory Management ​

Troubleshooting ​

Common Connection Issues ​

Query Issues ​

Performance Issues ​

Security Considerations ​

Authentication ​

Network Security ​

Data Security ​

Advanced Configuration ​

Custom Initialization SQL ​

Connection Pooling ​

Multi-Port Configuration ​

Integration Examples ​

Grafana Integration ​

Tableau Integration ​

Power BI Integration ​

Best Practices ​

Query Patterns ​

Connection Management ​

Data Modeling ​

PGWire Server

Overview

Key Features

Protocol Support

Data Types

Advanced Features

Quick Start

Basic Configuration

Environment Variables

Connecting with BI Tools

DBeaver (Recommended)

psql Command Line

JDBC/ODBC Connections

Querying Your Data

Basic Queries

Cross-Topic Joins

Advanced Analysis

TLS Configuration

Enable TLS

Generate Self-Signed Certificates (Development)

Connecting with TLS

Database Integration

DuckDB Persistence Integration

Automatic Database Discovery

Database Refresh

Performance Optimization

Connection Management

Query Optimization

Memory Management

Troubleshooting

Common Connection Issues

Query Issues

Performance Issues

Security Considerations

Authentication

Network Security

Data Security

Advanced Configuration

Custom Initialization SQL

Connection Pooling

Multi-Port Configuration

Integration Examples

Grafana Integration

Tableau Integration

Power BI Integration

Best Practices

Query Patterns

Connection Management

Data Modeling