Model Kinds

Choose data behavior with one directive — not pipeline logic

Model kinds define how your data behaves. With one directive, you choose how data is stored, how changes are handled, and how updates are applied.

Mental Model

Same SQL — different behavior:

SELECT * FROM raw.orders

-- @kind: append    → only adds new rows
-- @kind: merge     → updates existing, inserts new
-- @kind: scd2      → keeps full history of every change
-- @kind: table     → rebuilds when something changed

Which Kind Should You Use?

Kind	Use for
`table`	Aggregates, marts, lookup tables
`view`	Lightweight transforms, staging
`append`	Event logs, fact tables, audit trails
`merge`	User/customer tables (upsert)
`scd2`	History tracking, dimension tables
`partition`	Large partitioned fact tables
`events`	HTTP event ingestion

view

No storage. Always up-to-date. Resolves at query time.

-- @kind: view

SELECT order_id, customer_id, total
FROM raw.orders WHERE total > 0

Recreated if definition changes. Skipped otherwise.

table (default)

Rebuilds only when something changed. Tracks upstream dependencies — if nothing changed upstream, nothing runs.

-- @kind: table

SELECT * FROM raw.products WHERE is_active = true

Condition	What happens
First run	Full build
SQL changed	Rebuild
Upstream changed	Rebuild
Nothing changed	Skip — no query, no I/O

append

Add new data. Never updates existing rows. Smart CDC on incremental runs.

-- @kind: append

SELECT * FROM raw.events

Source tables are rewritten to change-only queries via DuckLake time-travel.

merge

Like an UPSERT. Updates existing rows, inserts new ones. Smart CDC on incremental runs.

-- @kind: merge
-- @unique_key: customer_id

SELECT customer_id, name, email, tier
FROM raw.customers

Requires @unique_key.

scd2

Keeps every version of a row. Full history, permanent. Uses full-state comparison — not Smart CDC.

-- @kind: scd2
-- @unique_key: product_id

SELECT product_id, name, price, category
FROM raw.products

OndatraSQL adds valid_from_snapshot, valid_to_snapshot, and is_current automatically. Changed rows get closed and a new version inserted.

Requires @unique_key. History survives snapshot expiration.

partition

Replaces only affected partitions. Smart CDC on incremental runs.

-- @kind: partition
-- @unique_key: year, month

SELECT order_id, total,
    EXTRACT(YEAR FROM order_date) AS year,
    EXTRACT(MONTH FROM order_date) AS month
FROM raw.orders

Requires @unique_key (supports multiple columns).

events

Send events via HTTP. No Kafka.

-- @kind: events

event_name VARCHAR NOT NULL,
page_url VARCHAR,
user_id VARCHAR,
received_at TIMESTAMPTZ

Defines a column schema, not a SELECT. See Event Collection.

How Changes Are Detected

Kind	Strategy
`view`	Definition hash
`table`	Dependency tracking — skip if unchanged
`append` / `merge` / `partition`	Smart CDC via time-travel
`scd2`	Full-state comparison
`events`	Always flush

Storage Hints

DuckLake optimizations for any materializing kind:

-- @sorted_by: updated_at       → better statistics, faster queries
-- @partitioned_by: region       → automatic file organization

Why Model Kinds Exist

In most tools, materialization logic is spread across SQL, orchestration config, and custom scripts. Model kinds encode it directly in the model — one line replaces pipeline logic.

Need	Traditional stack	OndatraSQL
Append data	Custom SQL + scheduler	`@kind: append`
Upsert rows	MERGE + orchestration	`@kind: merge`
Track history	Complex SCD logic	`@kind: scd2`
Collect events	Kafka + consumer	`@kind: events`