Blueprints Blog Contact About

Lineage & Metadata

Built-in lineage and metadata — no external systems required

OndatraSQL automatically tracks how your data is built. No external lineage tool. No metadata store. No integration required.

Every run records:

  • Column-level lineage
  • Execution details
  • Schema changes
  • Dependencies
  • Git commit

All stored in DuckLake. All queryable with SQL.

Mental Model

Every pipeline run creates a snapshot. Each snapshot includes the data, how it was produced, and where it came from.

Column Lineage

Every column is traced back to its source — across CTEs, joins, subqueries. No annotations. No manual mapping.

ondatrasql lineage mart.revenue.total_amount

Shows source columns, transformations applied, upstream tables.

Transformation Types

Each column is classified by how it was created:

TypeMeaningExample
IDENTITYDirect copySELECT name
AGGREGATIONAggregated valueSUM(amount)
ARITHMETICComputedprice * quantity
CONDITIONALLogic appliedCASE WHEN ...
CASTType conversionCAST(id AS VARCHAR)
FUNCTIONFunction callUPPER(name)

CLI

ondatrasql lineage overview              # All models with dependencies
ondatrasql lineage staging.orders        # Column lineage for one model
ondatrasql lineage staging.orders.total  # Trace one column

Use these to explore dependencies, debug transformations, and understand impact of changes.

Commit Metadata

Every run stores execution metadata inside DuckLake. No separate system required. This acts as a built-in metadata store.

Execution

FieldDescription
run_typebackfill, incremental, full, skip, flush
rows_affectedRows written
duration_msExecution time
stepsSub-step breakdown with timing

Lineage

FieldDescription
column_lineageSource columns with transformation types
dependsUpstream table dependencies

Schema

FieldDescription
columnsOutput column definitions
schema_hashDetects schema evolution

Versioning

FieldDescription
sql_hashTriggers backfill on change
dag_run_idRun identifier
source_fileModel source path

Git

FieldDescription
git_commitCommit SHA
git_branchBranch name
git_repo_urlRepository URL

Trace data back to the code change that produced it.

Query Metadata

Everything is just SQL:

SELECT
  commit_extra_info->>'model' AS model,
  commit_extra_info->>'run_type' AS run_type,
  commit_extra_info->>'rows_affected' AS rows,
  commit_extra_info->>'duration_ms' AS ms
FROM lake.snapshots()
ORDER BY snapshot_id DESC
LIMIT 10;

Build dashboards, alerts, or debugging tools directly in SQL.

Why This Matters

  • Debug pipelines faster
  • Understand data origins
  • Track changes over time
  • Build observability without extra tools