OndatraSQL's execution model
What happens when you run ondatrasql run. The full sequence from model discovery and dependency resolution to DuckLake atomic commit.
This post describes what happens when you execute ondatrasql run — the full sequence from model discovery to atomic commit.
The run sequence
A single ondatrasql run performs these steps:
- Discovery — find all model files in
models/ - Parsing — extract directives (
@kind,@unique_key,@constraint, etc.) and SQL body - Dependency resolution — build the DAG from SQL table references
- Change detection — compare model hashes and source snapshots against DuckLake history
- Execution — run each model in DAG order, applying the appropriate strategy per kind
- Validation — enforce constraints (pre-write) and audits (in-transaction)
- Commit — write results to DuckLake with snapshot metadata
Each step is described below.
1. Discovery
OndatraSQL scans the models/ directory for .sql files. The file path determines the target table name:
models/staging/orders.sql → staging.orders
models/raw/events.sql → raw.events
2. Parsing
Each file is parsed for directives — SQL comments that control execution behavior:
-- @kind: merge
-- @unique_key: order_id
-- @constraint: not_null(order_id)
-- @audit: row_count(>, 0)
The SQL body (everything after the directives) is the transformation query.
3. Dependency resolution
OndatraSQL parses the SQL AST to extract table references from FROM, JOIN, CTEs, and subqueries. These references become edges in the dependency graph. Lib functions in FROM (e.g. FROM gam_report(...)) are detected as external sources and do not create model dependencies.
The graph is validated for cycles at build time.
4. Change detection
For each model, OndatraSQL determines a run type by comparing the model’s SQL hash and upstream snapshot state against DuckLake commit history:
| Run type | Condition |
|---|---|
skip | SQL hash unchanged, upstream snapshots unchanged |
backfill | First run, or SQL/directive definition changed |
incremental | SQL hash unchanged, upstream data changed |
full | Upstream model was re-materialized |
Run type is determined via a SQL query that compares the current model hash against the last committed hash in DuckLake. For incremental kinds (append, merge), the runtime then uses table_changes() to read only the delta rows from upstream — this is the CDC optimization, not the skip/run decision itself.
When sources have changed, OndatraSQL further checks change_type (insert, update, delete) to determine whether the changes are insert-only — enabling more efficient processing paths for append and merge models.
5. Execution per kind
Each model kind has a different materialization strategy:
| Kind | Strategy |
|---|---|
table | TRUNCATE + INSERT (full rebuild) |
append | INSERT new rows only (via CDC) |
merge | MERGE INTO on unique key (via CDC) |
scd2 | Close old versions + insert new versions |
tracked | Content-hash per group, skip unchanged groups |
For incremental kinds (append, merge), change detection rewrites the query at the AST level to read only changed rows using DuckLake time-travel.
6. Validation
Validation runs in two phases:
Constraints — executed against the temp table (before the data is written to the target). If any constraint fails, the model is aborted and no data is written.
Audits — executed inside the BEGIN ... COMMIT transaction, after the data write but before commit. A failing audit raises error(), which aborts the entire transaction atomically — schema changes, data writes, and commit metadata are all rolled back.
Warnings — executed after commit. They log anomalies (delta detection, statistical checks) without blocking.
7. Commit
The final step writes results to DuckLake inside a single transaction:
BEGIN;
ALTER TABLE (schema evolution)
INSERT INTO / MERGE INTO (data write)
SELECT error(...) (audit checks)
CALL set_commit_message('ondatrasql', 'Pipeline run: ...', extra_info => '...')
COMMIT;
On success, a DuckLake snapshot is created with metadata including the model name, SQL hash, column definitions, row count, and duration. On failure, no snapshot is created.
The full picture
ondatrasql run
Running 4 models...
[OK] raw.orders (table, skip — unchanged)
[OK] raw.customers (table, backfill, 20 rows, 95ms)
[OK] staging.orders (merge, incremental, 3 rows, 200ms)
[OK] mart.revenue (table, full, 12 rows, 154ms)
Four models. One command. Each model gets the minimum necessary work based on what actually changed.
OndatraSQL