Lineage & Metadata

On this page

Every run records column-level lineage, schema changes, dependencies, and git context in DuckLake. Query it all with SQL.

CLI

ondatrasql lineage overview              # All models with dependencies
ondatrasql lineage staging.orders        # Column lineage for one model
ondatrasql lineage staging.orders.total  # Trace one column

Output is rendered as ASCII art with box-drawing characters.

Transformation Types

Each column is classified by how it was derived:

Type Meaning Example
IDENTITY Direct copy SELECT name
AGGREGATION Aggregated value SUM(amount)
ARITHMETIC Computed price * quantity
CONDITIONAL Logic applied CASE WHEN ...
CAST Type conversion CAST(id AS VARCHAR)
FUNCTION Function call UPPER(name)

Lineage is extracted from the SQL AST, across CTEs, joins, and subqueries.

Commit Metadata

Every run stores metadata in commit_extra_info on the DuckLake snapshot. All fields are JSON.

Field Description
model Target table name
kind table, append, merge, scd2, partition, tracked, events
run_type backfill, incremental, full, skip, flush
rows_affected Rows written
start_time Run start (ISO 8601)
end_time Run end (ISO 8601)
duration_ms Execution time in milliseconds
steps Sub-step breakdown (array of {name, duration_ms, status})
column_lineage Source columns with transformation types
depends Upstream table dependencies
columns Output column definitions
schema_hash Detects schema evolution
sql_hash Triggers backfill on change
dag_run_id Run identifier
source_file Model source path
duckdb_version DuckDB version used
git_commit Commit SHA
git_branch Branch name
git_repo_url Repository URL
error Error message (on failure)

Querying Metadata

Metadata is stored in DuckLake snapshots. Query with SQL:

-- Recent runs
SELECT
  commit_extra_info->>'model' AS model,
  commit_extra_info->>'run_type' AS run_type,
  commit_extra_info->>'rows_affected' AS rows,
  commit_extra_info->>'duration_ms' AS ms
FROM lake.snapshots()
ORDER BY snapshot_id DESC
LIMIT 10;
-- Column lineage for a model
SELECT
  commit_extra_info->>'model' AS model,
  commit_extra_info->>'column_lineage' AS lineage
FROM lake.snapshots()
WHERE LOWER(commit_extra_info->>'model') = 'mart.revenue'
ORDER BY snapshot_id DESC
LIMIT 1;
-- Models that changed schema
SELECT
  commit_extra_info->>'model' AS model,
  commit_extra_info->>'schema_hash' AS hash
FROM lake.snapshots()
WHERE commit_extra_info->>'run_type' = 'backfill';