Skip to content

Materialized Views (Derived Topics)

Coming Soon

User-facing materialized view creation is not yet available. This page describes the planned functionality using __stream DuckLakes.

BoilStream supports materialized views (also known as derived topics) that provide streaming SQL transformations of your data in real-time. These views automatically process and transform incoming data as it streams through your base topics.

How It Works

Materialized views in BoilStream:

  • Process data in real-time as it arrives in your base topics
  • Create derived output streams based on your SQL logic
  • Automatically update as new data flows through the system
  • Support physical streaming operators including filtering, projections, and transformations

Unlike traditional views that are computed on-demand, materialized views continuously process streaming data and maintain their results as separate topics.

Streaming DuckLakes

To use streaming features, create a DuckLake catalog with the __stream suffix:

my_analytics__stream

Tables created in __stream catalogs automatically become streaming topics with:

  • Real-time ingestion support
  • Automatic __boils_meta_* column injection (timestamps, partition info)
  • Direct Parquet streaming to cloud storage

Creating Materialized Views (Planned)

Once implemented, you'll create materialized views using standard SQL CREATE VIEW syntax on __stream DuckLakes:

sql
-- Connect to your streaming DuckLake
-- (via PostgreSQL interface or boilstream extension)

-- Base table (streaming topic)
CREATE TABLE events (
    user_id VARCHAR,
    event_type VARCHAR,
    timestamp TIMESTAMP,
    payload JSON
);

-- Materialized view: filtered stream
CREATE VIEW click_events AS
SELECT * FROM events WHERE event_type = 'click';

-- Materialized view: transformed stream
CREATE VIEW user_activity AS
SELECT
    user_id,
    event_type,
    timestamp,
    CASE
        WHEN event_type IN ('purchase', 'checkout') THEN 'conversion'
        WHEN event_type IN ('click', 'view') THEN 'engagement'
        ELSE 'other'
    END as activity_category
FROM events;

Views on __stream catalogs automatically become materialized - processing data in real-time as it flows through the base topic.

Supported SQL Operations

Materialized views support physical streaming operators - SQL operations that can process data row-by-row without buffering:

Filtering

sql
CREATE VIEW active_users AS
SELECT * FROM users WHERE status = 'active';

Projections

sql
CREATE VIEW user_summary AS
SELECT user_id, email, created_at FROM users;

Transformations

sql
CREATE VIEW enriched_events AS
SELECT
    *,
    UPPER(event_type) as event_type_upper,
    DATE_TRUNC('hour', timestamp) as event_hour
FROM events;

CASE Expressions

sql
CREATE VIEW categorized_users AS
SELECT
    user_id,
    CASE
        WHEN age >= 65 THEN 'Senior'
        WHEN age >= 18 THEN 'Adult'
        ELSE 'Minor'
    END as age_category
FROM users;

Limitations

Physical Streaming Operators Only

Materialized views only support SQL operations that DuckDB can execute as physical streaming operators. The following are not supported:

  • Window functions (ROW_NUMBER(), LAG(), LEAD(), etc.)
  • Aggregations (GROUP BY, COUNT(), SUM(), etc.)
  • Joins (require buffering both sides)
  • Subqueries (may require materialization)
  • ORDER BY (requires full result set)

These operations require buffering data, which conflicts with real-time streaming semantics. Support for windowed aggregations (tumbling/sliding windows) is on the roadmap.

Real-time Processing

Materialized views process data in real-time:

  1. Data arrives in your base topic
  2. Views are automatically triggered to process new data
  3. Transformed results stream to derived topics
  4. No polling or batch processing - everything happens as data flows

Use Cases

Materialized views are ideal for:

  • Data Filtering: Create focused streams for specific segments
  • Data Enrichment: Add calculated fields or categorizations
  • Format Transformation: Normalize or restructure data
  • Multi-tenant Routing: Split streams by customer or organization
  • Event Processing: Transform raw events into business-meaningful streams

Next Steps