Lineage & Metadata

On this page

Every run records column-level lineage, schema changes, dependencies, and git context in DuckLake. Query it all with SQL.

CLI

ondatrasql lineage overview              # All models with dependencies
ondatrasql lineage staging.orders        # Column lineage for one model
ondatrasql lineage staging.orders.total  # Trace one column

Output is rendered as ASCII art with box-drawing characters.

Transformation Types

Each column is classified by how it was derived:

TypeMeaningExample
IDENTITYDirect copySELECT name
AGGREGATIONAggregated valueSUM(amount)
ARITHMETICComputedprice * quantity
CONDITIONALLogic appliedCASE WHEN ...
CASTType conversionCAST(id AS VARCHAR)
FUNCTIONFunction callUPPER(name)

Lineage is extracted from the SQL AST, across CTEs, joins, and subqueries.

Commit Metadata

Every run stores metadata in commit_extra_info on the DuckLake snapshot. All fields are JSON.

FieldDescription
modelTarget table name
kindtable, append, merge, scd2, partition, tracked, events
run_typebackfill, incremental, full, skip, flush
rows_affectedRows written
start_timeRun start (ISO 8601)
end_timeRun end (ISO 8601)
duration_msExecution time in milliseconds
stepsSub-step breakdown (array of {name, duration_ms, status})
column_lineageSource columns with transformation types
dependsUpstream table dependencies
columnsOutput column definitions
schema_hashDetects schema evolution
sql_hashTriggers backfill on change
dag_run_idRun identifier
source_fileModel source path
duckdb_versionDuckDB version used
git_commitCommit SHA
git_branchBranch name
git_repo_urlRepository URL
errorError message (on failure)

Querying Metadata

Metadata is stored in DuckLake snapshots. Query with SQL:

-- Recent runs
SELECT
  commit_extra_info->>'model' AS model,
  commit_extra_info->>'run_type' AS run_type,
  commit_extra_info->>'rows_affected' AS rows,
  commit_extra_info->>'duration_ms' AS ms
FROM lake.snapshots()
ORDER BY snapshot_id DESC
LIMIT 10;
-- Column lineage for a model
SELECT
  commit_extra_info->>'model' AS model,
  commit_extra_info->>'column_lineage' AS lineage
FROM lake.snapshots()
WHERE LOWER(commit_extra_info->>'model') = 'mart.revenue'
ORDER BY snapshot_id DESC
LIMIT 1;
-- Models that changed schema
SELECT
  commit_extra_info->>'model' AS model,
  commit_extra_info->>'schema_hash' AS hash
FROM lake.snapshots()
WHERE commit_extra_info->>'run_type' = 'backfill';