Directives

Directives turn SQL into a data pipeline

Directives turn SQL into a data pipeline. They are comments that control how data is materialized, how changes are detected, and how data is validated.

You write SQL. Directives define everything else.

Mental Model

SQL → what the data is
Directives → how it behaves

Together, they define a complete pipeline step.

Before vs After

Without directives:

SELECT * FROM raw.orders

Just a query.

With directives:

-- @kind: merge
-- @unique_key: order_id
-- @incremental: updated_at
-- @constraint: total >= 0
-- @audit: row_count > 0

SELECT * FROM raw.orders

A fully managed pipeline — updates existing rows, blocks invalid data, verifies the result, tracks changes automatically.

Grouped Reference

Core

Control how data is materialized:

Directive	What it does
`@kind`	Materialization strategy — `table`, `merge`, `append`, `scd2`, `partition`, `view`, `events`
`@unique_key`	Row matching key (required for merge, scd2, partition)
`@incremental`	Cursor column for incremental state
`@incremental_initial`	Starting cursor value (default: `1970-01-01T00:00:00Z`)

Storage & Performance

Directive	What it does
`@partitioned_by`	DuckLake file partitioning — organizes data into separate files per key
`@sorted_by`	DuckLake sorted table — better statistics for faster queries

Validation

Directive	What it does
`@constraint`	Block bad data before insert (26 patterns)
`@audit`	Verify results after insert, rollback on failure (17 patterns)
`@warning`	Log issues without failing

Metadata

Directive	What it does
`@description`	Table comment in DuckLake catalog
`@column`	Column comment + masking tags

Runtime

Directive	What it does
`@extension`	Load a DuckDB extension before execution

Syntax by File Type

Same directives, all formats:

-- @kind: merge
-- @unique_key: order_id
-- @constraint: order_id NOT NULL

SELECT * FROM raw.orders

# @kind: append
# @incremental: updated_at

resp = http.get("https://api.example.com/orders")
for order in resp.json:
    save.row(order)

kind: merge
unique_key: order_id
constraints:
  - order_id NOT NULL

Column Masking

Tags in @column directives trigger automatic masking during materialization:

-- @column: email = Contact email | mask_email | PII
-- @column: ssn = SSN | mask_ssn | sensitive

Define masking macros in config/macros.sql:

CREATE OR REPLACE MACRO mask_email(val) AS
    regexp_replace(val::VARCHAR, '(.).*@', '\1***@');

CREATE OR REPLACE MACRO mask_ssn(val) AS
    '***-**-' || val[-4:];

Tags with mask, hash, or redact prefixes are applied as macro calls. Other tags (PII, sensitive) are metadata-only — stored for governance, no execution effect.

Directive Compatibility

Directive	table	view	append	merge	scd2	partition	events
`@description`	Yes	Yes	Yes	Yes	Yes	Yes	Yes
`@unique_key`	—	—	—	Required	Required	Required	—
`@partitioned_by`	Hint	—	Hint	Hint	Hint	—	—
`@sorted_by`	Yes	—	Yes	Yes	Yes	Yes	—
`@incremental`	—	—	Yes	Yes	Yes	Yes	—
`@column`	Yes	—	Yes	Yes	Yes	Yes	—
`@constraint`	Yes	—	Yes	Yes	Yes	Yes	—
`@audit`	Yes	—	Yes	Yes	Yes	Yes	—
`@warning`	Yes	—	Yes	Yes	Yes	Yes	—
`@extension`	Yes	Yes	Yes	Yes	Yes	Yes	—

Why Directives Exist

Most tools split pipeline configuration across multiple systems — SQL for transformations, config files for behavior, separate tools for validation.

Directives unify everything inside the model. One file defines what the data is and how it behaves.