This post describes what happens when you execute ondatrasql run — the full sequence from model discovery to atomic commit.
The run sequence
A single ondatrasql run performs these steps:
- Discovery — find all model files in
models/ - Parsing — extract directives (
@kind,@unique_key,@constraint, etc.) and SQL body - Dependency resolution — build the DAG from SQL table references
- Change detection — compare model hashes and source snapshots against DuckLake history
- Execution — run each model in DAG order, applying the appropriate strategy per kind
- Validation — enforce constraints (pre-write) and audits (in-transaction)
- Commit — write results to DuckLake with snapshot metadata
Each step is described below.
1. Discovery
OndatraSQL scans the models/ directory for .sql, .star, and .yaml files. The file path determines the target table name:
models/staging/orders.sql → staging.orders
models/raw/events.sql → raw.events
2. Parsing
Each file is parsed for directives — SQL comments that control execution behavior:
-- @kind: merge
-- @unique_key: order_id
-- @constraint: not_null(order_id)
-- @audit: row_count(>, 0)
The SQL body (everything after the directives) is the transformation query.
3. Dependency resolution
OndatraSQL parses the SQL AST to extract table references from FROM, JOIN, CTEs, and subqueries. These references become edges in the dependency graph.
For Starlark models, query("SELECT ... FROM table") calls are followed through load() imports to extract dependencies.
The graph is validated for cycles at build time.
4. Change detection
For each model, OndatraSQL determines a run type by comparing the current state against DuckLake snapshot history:
| Run type | Condition |
|---|---|
skip | Model hash unchanged, no upstream changes |
backfill | First run, or model definition changed |
incremental | Source data changed (detected via table_changes()) |
full | Upstream model was re-materialized |
Change detection uses DuckLake’s table_changes() function to check each source table individually. If all sources are unchanged, the model is skipped without scanning any data.
When sources have changed, OndatraSQL further checks change_type (insert, update, delete) to determine whether the changes are insert-only — enabling more efficient processing paths for append and merge models.
5. Execution per kind
Each model kind has a different materialization strategy:
| Kind | Strategy |
|---|---|
table | TRUNCATE + INSERT (full rebuild) |
append | INSERT new rows only (via Smart CDC) |
merge | MERGE INTO on unique key (via Smart CDC) |
scd2 | Close old versions + insert new versions |
partition | Delete affected partitions + insert new data |
tracked | Content-hash per group, skip unchanged groups |
For incremental kinds (append, merge, partition), Smart CDC rewrites the query at the AST level to read only changed rows using DuckLake time-travel.
6. Validation
Validation runs in two phases:
Constraints — executed against the temp table (before the data is written to the target). If any constraint fails, the model is aborted and no data is written.
Audits — executed inside the BEGIN ... COMMIT transaction, after the data write but before commit. A failing audit raises error(), which aborts the entire transaction atomically — schema changes, data writes, and commit metadata are all rolled back.
Warnings — executed after commit. They log anomalies (delta detection, statistical checks) without blocking.
7. Commit
The final step writes results to DuckLake inside a single transaction:
BEGIN;
ALTER TABLE (schema evolution)
INSERT INTO / MERGE INTO (data write)
SELECT error(...) (audit checks)
CALL set_commit_message('ondatrasql', 'Pipeline run: ...', extra_info => '...')
COMMIT;
On success, a DuckLake snapshot is created with metadata including the model name, SQL hash, column definitions, row count, and duration. On failure, no snapshot is created.
The full picture
$ ondatrasql run
Running 4 models...
[OK] raw.orders (table, skip — unchanged)
[OK] raw.customers (table, backfill, 20 rows, 95ms)
[OK] staging.orders (merge, incremental, 3 rows, 200ms)
[OK] mart.revenue (table, full, 12 rows, 154ms)
Four models. One command. Each model gets the minimum necessary work based on what actually changed.
Ondatra Labs