Blueprints Blog Contact About
← Back to blog

OndatraSQL's execution model

What happens when you run ondatrasql run. The full sequence from model discovery and dependency resolution to DuckLake atomic commit.

This post describes what happens when you execute ondatrasql run — the full sequence from model discovery to atomic commit.

The run sequence

A single ondatrasql run performs these steps:

  1. Discovery — find all model files in models/
  2. Parsing — extract directives (@kind, @unique_key, @constraint, etc.) and SQL body
  3. Dependency resolution — build the DAG from SQL table references
  4. Change detection — compare model hashes and source snapshots against DuckLake history
  5. Execution — run each model in DAG order, applying the appropriate strategy per kind
  6. Validation — enforce constraints (pre-write) and audits (in-transaction)
  7. Commit — write results to DuckLake with snapshot metadata

Each step is described below.

1. Discovery

OndatraSQL scans the models/ directory for .sql, .star, and .yaml files. The file path determines the target table name:

models/staging/orders.sql  →  staging.orders
models/raw/events.sql      →  raw.events

2. Parsing

Each file is parsed for directives — SQL comments that control execution behavior:

-- @kind: merge
-- @unique_key: order_id
-- @constraint: not_null(order_id)
-- @audit: row_count(>, 0)

The SQL body (everything after the directives) is the transformation query.

3. Dependency resolution

OndatraSQL parses the SQL AST to extract table references from FROM, JOIN, CTEs, and subqueries. These references become edges in the dependency graph.

For Starlark models, query("SELECT ... FROM table") calls are followed through load() imports to extract dependencies.

The graph is validated for cycles at build time.

4. Change detection

For each model, OndatraSQL determines a run type by comparing the current state against DuckLake snapshot history:

Run typeCondition
skipModel hash unchanged, no upstream changes
backfillFirst run, or model definition changed
incrementalSource data changed (detected via table_changes())
fullUpstream model was re-materialized

Change detection uses DuckLake’s table_changes() function to check each source table individually. If all sources are unchanged, the model is skipped without scanning any data.

When sources have changed, OndatraSQL further checks change_type (insert, update, delete) to determine whether the changes are insert-only — enabling more efficient processing paths for append and merge models.

5. Execution per kind

Each model kind has a different materialization strategy:

KindStrategy
tableTRUNCATE + INSERT (full rebuild)
appendINSERT new rows only (via Smart CDC)
mergeMERGE INTO on unique key (via Smart CDC)
scd2Close old versions + insert new versions
partitionDelete affected partitions + insert new data
trackedContent-hash per group, skip unchanged groups

For incremental kinds (append, merge, partition), Smart CDC rewrites the query at the AST level to read only changed rows using DuckLake time-travel.

6. Validation

Validation runs in two phases:

Constraints — executed against the temp table (before the data is written to the target). If any constraint fails, the model is aborted and no data is written.

Audits — executed inside the BEGIN ... COMMIT transaction, after the data write but before commit. A failing audit raises error(), which aborts the entire transaction atomically — schema changes, data writes, and commit metadata are all rolled back.

Warnings — executed after commit. They log anomalies (delta detection, statistical checks) without blocking.

7. Commit

The final step writes results to DuckLake inside a single transaction:

BEGIN;
  ALTER TABLE (schema evolution)
  INSERT INTO / MERGE INTO (data write)
  SELECT error(...) (audit checks)
  CALL set_commit_message('ondatrasql', 'Pipeline run: ...', extra_info => '...')
COMMIT;

On success, a DuckLake snapshot is created with metadata including the model name, SQL hash, column definitions, row count, and duration. On failure, no snapshot is created.

The full picture

$ ondatrasql run

Running 4 models...
[OK] raw.orders      (table, skip — unchanged)
[OK] raw.customers   (table, backfill, 20 rows, 95ms)
[OK] staging.orders  (merge, incremental, 3 rows, 200ms)
[OK] mart.revenue    (table, full, 12 rows, 154ms)

Four models. One command. Each model gets the minimum necessary work based on what actually changed.