Materialized Views (Derived Topics)
Coming Soon
User-facing materialized view creation is not yet available. This page describes the planned functionality using __stream DuckLakes.
BoilStream supports materialized views (also known as derived topics) that provide streaming SQL transformations of your data in real-time. These views automatically process and transform incoming data as it streams through your base topics.
How It Works
Materialized views in BoilStream:
- Process data in real-time as it arrives in your base topics
- Create derived output streams based on your SQL logic
- Automatically update as new data flows through the system
- Support physical streaming operators including filtering, projections, and transformations
Unlike traditional views that are computed on-demand, materialized views continuously process streaming data and maintain their results as separate topics.
Streaming DuckLakes
To use streaming features, create a DuckLake catalog with the __stream suffix:
my_analytics__streamTables created in __stream catalogs automatically become streaming topics with:
- Real-time ingestion support
- Automatic
__boils_meta_*column injection (timestamps, partition info) - Direct Parquet streaming to cloud storage
Creating Materialized Views (Planned)
Once implemented, you'll create materialized views using standard SQL CREATE VIEW syntax on __stream DuckLakes:
-- Connect to your streaming DuckLake
-- (via PostgreSQL interface or boilstream extension)
-- Base table (streaming topic)
CREATE TABLE events (
user_id VARCHAR,
event_type VARCHAR,
timestamp TIMESTAMP,
payload JSON
);
-- Materialized view: filtered stream
CREATE VIEW click_events AS
SELECT * FROM events WHERE event_type = 'click';
-- Materialized view: transformed stream
CREATE VIEW user_activity AS
SELECT
user_id,
event_type,
timestamp,
CASE
WHEN event_type IN ('purchase', 'checkout') THEN 'conversion'
WHEN event_type IN ('click', 'view') THEN 'engagement'
ELSE 'other'
END as activity_category
FROM events;Views on __stream catalogs automatically become materialized - processing data in real-time as it flows through the base topic.
Supported SQL Operations
Materialized views support physical streaming operators - SQL operations that can process data row-by-row without buffering:
Filtering
CREATE VIEW active_users AS
SELECT * FROM users WHERE status = 'active';Projections
CREATE VIEW user_summary AS
SELECT user_id, email, created_at FROM users;Transformations
CREATE VIEW enriched_events AS
SELECT
*,
UPPER(event_type) as event_type_upper,
DATE_TRUNC('hour', timestamp) as event_hour
FROM events;CASE Expressions
CREATE VIEW categorized_users AS
SELECT
user_id,
CASE
WHEN age >= 65 THEN 'Senior'
WHEN age >= 18 THEN 'Adult'
ELSE 'Minor'
END as age_category
FROM users;Limitations
Physical Streaming Operators Only
Materialized views only support SQL operations that DuckDB can execute as physical streaming operators. The following are not supported:
- Window functions (
ROW_NUMBER(),LAG(),LEAD(), etc.) - Aggregations (
GROUP BY,COUNT(),SUM(), etc.) - Joins (require buffering both sides)
- Subqueries (may require materialization)
- ORDER BY (requires full result set)
These operations require buffering data, which conflicts with real-time streaming semantics. Support for windowed aggregations (tumbling/sliding windows) is on the roadmap.
Real-time Processing
Materialized views process data in real-time:
- Data arrives in your base topic
- Views are automatically triggered to process new data
- Transformed results stream to derived topics
- No polling or batch processing - everything happens as data flows
Use Cases
Materialized views are ideal for:
- Data Filtering: Create focused streams for specific segments
- Data Enrichment: Add calculated fields or categorizations
- Format Transformation: Normalize or restructure data
- Multi-tenant Routing: Split streams by customer or organization
- Event Processing: Transform raw events into business-meaningful streams
Next Steps
- Learn about DuckLake Integration for catalog management
- Explore the Ingestion Interfaces for data streaming
- Check out Getting Started for a complete setup walkthrough