OndatraSQL's execution model

What happens when you run ondatrasql run. The full sequence from model discovery and dependency resolution to DuckLake atomic commit.

March 31, 2026 · 3 min read · philosophy

This post describes what happens when you execute ondatrasql run — the full sequence from model discovery to atomic commit.

The run sequence

A single ondatrasql run performs these steps:

Discovery — find all model files in models/
Parsing — extract directives (@kind, @unique_key, @constraint, etc.) and SQL body
Dependency resolution — build the DAG from SQL table references
Change detection — compare model hashes and source snapshots against DuckLake history
Execution — run each model in DAG order, applying the appropriate strategy per kind
Validation — enforce constraints (pre-write) and audits (in-transaction)
Commit — write results to DuckLake with snapshot metadata

Each step is described below.

1. Discovery

OndatraSQL scans the models/ directory for .sql files. The file path determines the target table name:

models/staging/orders.sql  →  staging.orders
models/raw/events.sql      →  raw.events

2. Parsing

Each file is parsed for directives — SQL comments that control execution behavior:

-- @kind: merge
-- @unique_key: order_id
-- @constraint: not_null(order_id)
-- @audit: row_count(>, 0)

The SQL body (everything after the directives) is the transformation query.

3. Dependency resolution

OndatraSQL parses the SQL AST to extract table references from FROM, JOIN, CTEs, and subqueries. These references become edges in the dependency graph. Lib functions in FROM (e.g. FROM gam_report(...)) are detected as external sources and do not create model dependencies.

The graph is validated for cycles at build time.

4. Change detection

For each model, OndatraSQL determines a run type by comparing the model’s SQL hash and upstream snapshot state against DuckLake commit history:

Run type	Condition
`skip`	SQL hash unchanged, upstream snapshots unchanged
`backfill`	First run, or SQL/directive definition changed
`incremental`	SQL hash unchanged, upstream data changed
`full`	Upstream model was re-materialized

Run type is determined via a SQL query that compares the current model hash against the last committed hash in DuckLake. For incremental kinds (append, merge), the runtime then uses table_changes() to read only the delta rows from upstream — this is the CDC optimization, not the skip/run decision itself.

When sources have changed, OndatraSQL further checks change_type (insert, update, delete) to determine whether the changes are insert-only — enabling more efficient processing paths for append and merge models.

5. Execution per kind

Each model kind has a different materialization strategy:

Kind	Strategy
`table`	`TRUNCATE` + `INSERT` (full rebuild)
`append`	`INSERT` new rows only (via CDC)
`merge`	`MERGE INTO` on unique key (via CDC)
`scd2`	Close old versions + insert new versions
`tracked`	Content-hash per group, skip unchanged groups

For incremental kinds (append, merge), change detection rewrites the query at the AST level to read only changed rows using DuckLake time-travel.

6. Validation

Validation runs in two phases:

Constraints — executed against the temp table (before the data is written to the target). If any constraint fails, the model is aborted and no data is written.

Audits — executed inside the BEGIN ... COMMIT transaction, after the data write but before commit. A failing audit raises error(), which aborts the entire transaction atomically — schema changes, data writes, and commit metadata are all rolled back.

Warnings — executed after commit. They log anomalies (delta detection, statistical checks) without blocking.

7. Commit

The final step writes results to DuckLake inside a single transaction:

BEGIN;
  ALTER TABLE (schema evolution)
  INSERT INTO / MERGE INTO (data write)
  SELECT error(...) (audit checks)
  CALL set_commit_message('ondatrasql', 'Pipeline run: ...', extra_info => '...')
COMMIT;

On success, a DuckLake snapshot is created with metadata including the model name, SQL hash, column definitions, row count, and duration. On failure, no snapshot is created.

The full picture

ondatrasql run

Running 4 models...
[OK] raw.orders      (table, skip — unchanged)
[OK] raw.customers   (table, backfill, 20 rows, 95ms)
[OK] staging.orders  (merge, incremental, 3 rows, 200ms)
[OK] mart.revenue    (table, full, 12 rows, 154ms)

Four models. One command. Each model gets the minimum necessary work based on what actually changed.