OndatraSQL's execution model

What happens when you run ondatrasql run. The full sequence from model discovery and dependency resolution to DuckLake atomic commit.

March 31, 2026

philosophy

This post describes what happens when you execute ondatrasql run — the full sequence from model discovery to atomic commit.

The run sequence

A single ondatrasql run performs these steps:

Discovery — find all model files in models/
Parsing — extract directives (@kind, @unique_key, @constraint, etc.) and SQL body
Dependency resolution — build the DAG from SQL table references
Change detection — compare model hashes and source snapshots against DuckLake history
Execution — run each model in DAG order, applying the appropriate strategy per kind
Validation — enforce constraints (pre-write) and audits (in-transaction)
Commit — write results to DuckLake with snapshot metadata

Each step is described below.

1. Discovery

OndatraSQL scans the models/ directory for .sql, .star, and .yaml files. The file path determines the target table name:

models/staging/orders.sql  →  staging.orders
models/raw/events.sql      →  raw.events

2. Parsing

Each file is parsed for directives — SQL comments that control execution behavior:

-- @kind: merge
-- @unique_key: order_id
-- @constraint: not_null(order_id)
-- @audit: row_count(>, 0)

The SQL body (everything after the directives) is the transformation query.

3. Dependency resolution

OndatraSQL parses the SQL AST to extract table references from FROM, JOIN, CTEs, and subqueries. These references become edges in the dependency graph.

For Starlark models, query("SELECT ... FROM table") calls are followed through load() imports to extract dependencies.

The graph is validated for cycles at build time.

4. Change detection

For each model, OndatraSQL determines a run type by comparing the current state against DuckLake snapshot history:

Run type	Condition
`skip`	Model hash unchanged, no upstream changes
`backfill`	First run, or model definition changed
`incremental`	Source data changed (detected via `table_changes()`)
`full`	Upstream model was re-materialized

Change detection uses DuckLake’s table_changes() function to check each source table individually. If all sources are unchanged, the model is skipped without scanning any data.

When sources have changed, OndatraSQL further checks change_type (insert, update, delete) to determine whether the changes are insert-only — enabling more efficient processing paths for append and merge models.

5. Execution per kind

Each model kind has a different materialization strategy:

Kind	Strategy
`table`	`TRUNCATE` + `INSERT` (full rebuild)
`append`	`INSERT` new rows only (via Smart CDC)
`merge`	`MERGE INTO` on unique key (via Smart CDC)
`scd2`	Close old versions + insert new versions
`partition`	Delete affected partitions + insert new data
`tracked`	Content-hash per group, skip unchanged groups

For incremental kinds (append, merge, partition), Smart CDC rewrites the query at the AST level to read only changed rows using DuckLake time-travel.

6. Validation

Validation runs in two phases:

Constraints — executed against the temp table (before the data is written to the target). If any constraint fails, the model is aborted and no data is written.

Audits — executed inside the BEGIN ... COMMIT transaction, after the data write but before commit. A failing audit raises error(), which aborts the entire transaction atomically — schema changes, data writes, and commit metadata are all rolled back.

Warnings — executed after commit. They log anomalies (delta detection, statistical checks) without blocking.

7. Commit

The final step writes results to DuckLake inside a single transaction:

BEGIN;
  ALTER TABLE (schema evolution)
  INSERT INTO / MERGE INTO (data write)
  SELECT error(...) (audit checks)
  CALL set_commit_message('ondatrasql', 'Pipeline run: ...', extra_info => '...')
COMMIT;

On success, a DuckLake snapshot is created with metadata including the model name, SQL hash, column definitions, row count, and duration. On failure, no snapshot is created.

The full picture

$ ondatrasql run

Running 4 models...
[OK] raw.orders      (table, skip — unchanged)
[OK] raw.customers   (table, backfill, 20 rows, 95ms)
[OK] staging.orders  (merge, incremental, 3 rows, 200ms)
[OK] mart.revenue    (table, full, 12 rows, 154ms)

Four models. One command. Each model gets the minimum necessary work based on what actually changed.