Views & Materialized Views
BoilStream supports three types of views on __stream DuckLake catalogs:
| Syntax | Purpose | Output | When it runs |
|---|---|---|---|
CREATE VIEW | Query-time SQL expansion | None (virtual) | On every query |
CREATE STREAMING VIEW | Continuous row-by-row transform | Derived topic | On every row |
CREATE MATERIALIZED VIEW ... WITH (...) | Windowed aggregation | Output topic | On window close |
Streaming DuckLakes
All three require a DuckLake catalog with the __stream suffix:
my_analytics__streamTables in __stream catalogs are streaming topics with real-time ingestion, automatic metadata column injection, and Parquet streaming to cloud storage.
CREATE VIEW
Plain CREATE VIEW passes through to DuckLake unchanged. The view is expanded at query time — no output topic, no streaming processing.
CREATE VIEW recent_events AS
SELECT * FROM events WHERE ts > NOW() - INTERVAL '1 hour';CREATE STREAMING VIEW
Streaming views process every row as it arrives, applying filters and transformations continuously. Each streaming view has its own derived topic with hot and cold tiers.
-- Filtered stream
CREATE STREAMING VIEW click_events AS
SELECT * FROM events WHERE event_type = 'click';
-- Transformed stream
CREATE STREAMING VIEW enriched_events AS
SELECT
user_id,
event_type,
UPPER(event_type) AS event_type_upper,
DATE_TRUNC('hour', ts) AS event_hour,
CASE
WHEN event_type IN ('purchase', 'checkout') THEN 'conversion'
WHEN event_type IN ('click', 'view') THEN 'engagement'
ELSE 'other'
END AS activity_category
FROM events;Supported Operations
Streaming views support SQL operations that can execute row-by-row without buffering:
- Filtering:
WHEREclauses - Projections: column selection, aliases
- Transformations:
CASE,UPPER(),LOWER(),DATE_TRUNC(), casts - Scalar functions: any function that operates on a single row
Not Supported in Streaming Views
These require buffering and are rejected at creation time with a clear error:
- Aggregations (
GROUP BY,COUNT(),SUM(), etc.) - Window functions (
ROW_NUMBER(),LAG(),LEAD()) - JOINs
- Subqueries, CTEs
ORDER BY,DISTINCT,LIMIT/OFFSETUNION/INTERSECT/EXCEPT
Use CREATE MATERIALIZED VIEW (below) for aggregations.
Error Handling
| Scenario | Result |
|---|---|
Non-streaming SQL (GROUP BY, JOIN, etc.) | Error with specific reason |
| Base topic doesn't exist | Error: "Base topic 'X' does not exist" |
| View name already exists | Error: "Streaming view 'X' already exists" |
Not in a __stream catalog | Error: "Streaming views require a __stream catalog" |
If the view is created in DuckLake but metadata registration fails, the DuckLake view is rolled back automatically.
Dropping Streaming Views
DROP STREAMING VIEW click_events;
DROP STREAMING VIEW IF EXISTS enriched_events;DROP STREAMING VIEW fails if the view doesn't exist. Use IF EXISTS for idempotent drops.
CREATE MATERIALIZED VIEW
Materialized views aggregate data over time windows. They require a WITH clause specifying window parameters.
-- Tumbling window: 1-minute SUM/COUNT aggregation
CREATE MATERIALIZED VIEW sales_per_minute AS
SELECT SUM(amount) AS total, COUNT(*) AS cnt
FROM orders
WITH (window_type='tumbling', window_size='1 minute', timestamp_column='ts')-- Sliding window: 5-minute average sliding every 30 seconds
CREATE MATERIALIZED VIEW avg_price_5m AS
SELECT AVG(price) AS avg_price, COUNT(*) AS cnt
FROM trades
WITH (window_type='sliding', window_size='5 minutes',
slide_interval='30 seconds', timestamp_column='created_at')Ingestion Timestamp Mode
If your source data doesn't have a timestamp column, you can omit timestamp_column to window by ingestion timestamp (__boils_meta_timestamp) instead of event time:
-- Windows based on when data was ingested by the server
CREATE MATERIALIZED VIEW events_per_minute AS
SELECT COUNT(*) AS cnt
FROM events
WITH (window_type='tumbling', window_size='1 minute')This is useful for sources without timestamps (Kafka streams, API events) or when ingestion time is the relevant time dimension.
WITH Parameters
| Parameter | Required | Description |
|---|---|---|
window_type | Yes | 'tumbling' or 'sliding' |
window_size | Yes | Window duration: '<N> <unit>' where unit is seconds, minutes, hours, or days |
timestamp_column | No | Column used for windowing. If omitted, uses server ingestion timestamp (__boils_meta_timestamp) |
slide_interval | Sliding only | How often the window advances. Must evenly divide window_size |
output_topic | No | Output topic name. Defaults to {view_name}_matview |
Output Schema
The output topic contains the columns from your SELECT plus two automatic columns:
window_start— start of the aggregation window (TIMESTAMP)window_end— end of the aggregation window (TIMESTAMP)
Supported Operations
Materialized views run as batch queries over each window's data, so they support any SQL that DuckDB can execute — aggregations, GROUP BY, scalar functions, CASE, etc. Existing WHERE clauses are preserved (ANDed with the window time filter).
Dropping Materialized Views
DROP MATERIALIZED VIEW sales_per_minute;
DROP MATERIALIZED VIEW IF EXISTS avg_price_5m;This stops window processing and deregisters the view. The output topic and its data remain.
How It Works
Write Paths
Base Topic Ingestion
│
├──► CREATE VIEW: no write path (query-time only)
│
├──► CREATE STREAMING VIEW: derived view processor (row-by-row)
│ └──► Derived topic with own hot + cold tiers
│ └──► Can cascade (streaming views on streaming views)
│ ╳ Does NOT re-enter main ingestion queue
│
└──► CREATE MATERIALIZED VIEW: executor fires on window close
└──► INSERT via FlightRPC loopback → main ingestion queue
└──► Output topic with own hot + cold tiers
└──► Can trigger CDC, derived views, etc.Streaming views process rows continuously through the derived view processor. Output goes directly to the derived topic's hot tier (DuckDB chunks), then cold tier (Parquet/S3). This path does not re-enter the main ingestion queue.
Materialized views fire on window boundary close. The aggregation query executes through the pgwire interface via FlightRPC loopback, which re-enters the main ingestion pipeline. This means matview output gets full treatment: routing, hot tier, cold tier, CDC, and can trigger derived views.
Query Paths
All view types resolve through DuckLake. Hot tier (DuckDB chunks) and cold tier (Parquet) are combined transparently. This works identically for pgwire clients and remote DuckDB clients (via Airport/FlightSQL).
Matview Execution Details
A background executor checks window boundaries every second. Watermarks are persisted to PostgreSQL for crash recovery — completed windows are not re-executed on restart. Up to 4 concurrent window aggregations run in parallel, with fast windows prioritized over slow ones.
window_size must not exceed 80% of hot tier retention to ensure data availability.
Next Steps
- Learn about DuckLake Integration for catalog management
- Explore the Ingestion Interfaces for data streaming
- Check out Getting Started for a complete setup walkthrough