Skip to content

Views & Materialized Views

BoilStream supports three types of views on __stream DuckLake catalogs:

SyntaxPurposeOutputWhen it runs
CREATE VIEWQuery-time SQL expansionNone (virtual)On every query
CREATE STREAMING VIEWContinuous row-by-row transformDerived topicOn every row
CREATE MATERIALIZED VIEW ... WITH (...)Windowed aggregationOutput topicOn window close

Streaming DuckLakes

All three require a DuckLake catalog with the __stream suffix:

my_analytics__stream

Tables in __stream catalogs are streaming topics with real-time ingestion, automatic metadata column injection, and Parquet streaming to cloud storage.

CREATE VIEW

Plain CREATE VIEW passes through to DuckLake unchanged. The view is expanded at query time — no output topic, no streaming processing.

sql
CREATE VIEW recent_events AS
  SELECT * FROM events WHERE ts > NOW() - INTERVAL '1 hour';

CREATE STREAMING VIEW

Streaming views process every row as it arrives, applying filters and transformations continuously. Each streaming view has its own derived topic with hot and cold tiers.

sql
-- Filtered stream
CREATE STREAMING VIEW click_events AS
  SELECT * FROM events WHERE event_type = 'click';

-- Transformed stream
CREATE STREAMING VIEW enriched_events AS
  SELECT
    user_id,
    event_type,
    UPPER(event_type) AS event_type_upper,
    DATE_TRUNC('hour', ts) AS event_hour,
    CASE
      WHEN event_type IN ('purchase', 'checkout') THEN 'conversion'
      WHEN event_type IN ('click', 'view') THEN 'engagement'
      ELSE 'other'
    END AS activity_category
  FROM events;

Supported Operations

Streaming views support SQL operations that can execute row-by-row without buffering:

  • Filtering: WHERE clauses
  • Projections: column selection, aliases
  • Transformations: CASE, UPPER(), LOWER(), DATE_TRUNC(), casts
  • Scalar functions: any function that operates on a single row

Not Supported in Streaming Views

These require buffering and are rejected at creation time with a clear error:

  • Aggregations (GROUP BY, COUNT(), SUM(), etc.)
  • Window functions (ROW_NUMBER(), LAG(), LEAD())
  • JOINs
  • Subqueries, CTEs
  • ORDER BY, DISTINCT, LIMIT/OFFSET
  • UNION / INTERSECT / EXCEPT

Use CREATE MATERIALIZED VIEW (below) for aggregations.

Error Handling

ScenarioResult
Non-streaming SQL (GROUP BY, JOIN, etc.)Error with specific reason
Base topic doesn't existError: "Base topic 'X' does not exist"
View name already existsError: "Streaming view 'X' already exists"
Not in a __stream catalogError: "Streaming views require a __stream catalog"

If the view is created in DuckLake but metadata registration fails, the DuckLake view is rolled back automatically.

Dropping Streaming Views

sql
DROP STREAMING VIEW click_events;
DROP STREAMING VIEW IF EXISTS enriched_events;

DROP STREAMING VIEW fails if the view doesn't exist. Use IF EXISTS for idempotent drops.

CREATE MATERIALIZED VIEW

Materialized views aggregate data over time windows. They require a WITH clause specifying window parameters.

sql
-- Tumbling window: 1-minute SUM/COUNT aggregation
CREATE MATERIALIZED VIEW sales_per_minute AS
  SELECT SUM(amount) AS total, COUNT(*) AS cnt
  FROM orders
  WITH (window_type='tumbling', window_size='1 minute', timestamp_column='ts')
sql
-- Sliding window: 5-minute average sliding every 30 seconds
CREATE MATERIALIZED VIEW avg_price_5m AS
  SELECT AVG(price) AS avg_price, COUNT(*) AS cnt
  FROM trades
  WITH (window_type='sliding', window_size='5 minutes',
        slide_interval='30 seconds', timestamp_column='created_at')

Ingestion Timestamp Mode

If your source data doesn't have a timestamp column, you can omit timestamp_column to window by ingestion timestamp (__boils_meta_timestamp) instead of event time:

sql
-- Windows based on when data was ingested by the server
CREATE MATERIALIZED VIEW events_per_minute AS
  SELECT COUNT(*) AS cnt
  FROM events
  WITH (window_type='tumbling', window_size='1 minute')

This is useful for sources without timestamps (Kafka streams, API events) or when ingestion time is the relevant time dimension.

WITH Parameters

ParameterRequiredDescription
window_typeYes'tumbling' or 'sliding'
window_sizeYesWindow duration: '<N> <unit>' where unit is seconds, minutes, hours, or days
timestamp_columnNoColumn used for windowing. If omitted, uses server ingestion timestamp (__boils_meta_timestamp)
slide_intervalSliding onlyHow often the window advances. Must evenly divide window_size
output_topicNoOutput topic name. Defaults to {view_name}_matview

Output Schema

The output topic contains the columns from your SELECT plus two automatic columns:

  • window_start — start of the aggregation window (TIMESTAMP)
  • window_end — end of the aggregation window (TIMESTAMP)

Supported Operations

Materialized views run as batch queries over each window's data, so they support any SQL that DuckDB can execute — aggregations, GROUP BY, scalar functions, CASE, etc. Existing WHERE clauses are preserved (ANDed with the window time filter).

Dropping Materialized Views

sql
DROP MATERIALIZED VIEW sales_per_minute;
DROP MATERIALIZED VIEW IF EXISTS avg_price_5m;

This stops window processing and deregisters the view. The output topic and its data remain.

How It Works

Write Paths

Base Topic Ingestion

  ├──► CREATE VIEW: no write path (query-time only)

  ├──► CREATE STREAMING VIEW: derived view processor (row-by-row)
  │      └──► Derived topic with own hot + cold tiers
  │      └──► Can cascade (streaming views on streaming views)
  │      ╳ Does NOT re-enter main ingestion queue

  └──► CREATE MATERIALIZED VIEW: executor fires on window close
         └──► INSERT via FlightRPC loopback → main ingestion queue
         └──► Output topic with own hot + cold tiers
         └──► Can trigger CDC, derived views, etc.

Streaming views process rows continuously through the derived view processor. Output goes directly to the derived topic's hot tier (DuckDB chunks), then cold tier (Parquet/S3). This path does not re-enter the main ingestion queue.

Materialized views fire on window boundary close. The aggregation query executes through the pgwire interface via FlightRPC loopback, which re-enters the main ingestion pipeline. This means matview output gets full treatment: routing, hot tier, cold tier, CDC, and can trigger derived views.

Query Paths

All view types resolve through DuckLake. Hot tier (DuckDB chunks) and cold tier (Parquet) are combined transparently. This works identically for pgwire clients and remote DuckDB clients (via Airport/FlightSQL).

Matview Execution Details

A background executor checks window boundaries every second. Watermarks are persisted to PostgreSQL for crash recovery — completed windows are not re-executed on restart. Up to 4 concurrent window aggregations run in parallel, with fast windows prioritized over slow ones.

window_size must not exceed 80% of hot tier retention to ensure data availability.

Next Steps