Lineage & Metadata

Built-in lineage and metadata — no external systems required

OndatraSQL automatically tracks how your data is built. No external lineage tool. No metadata store. No integration required.

Every run records:

Column-level lineage
Execution details
Schema changes
Dependencies
Git commit

All stored in DuckLake. All queryable with SQL.

Mental Model

Every pipeline run creates a snapshot. Each snapshot includes the data, how it was produced, and where it came from.

Column Lineage

Every column is traced back to its source — across CTEs, joins, subqueries. No annotations. No manual mapping.

ondatrasql lineage mart.revenue.total_amount

Shows source columns, transformations applied, upstream tables.

Transformation Types

Each column is classified by how it was created:

Type	Meaning	Example
`IDENTITY`	Direct copy	`SELECT name`
`AGGREGATION`	Aggregated value	`SUM(amount)`
`ARITHMETIC`	Computed	`price * quantity`
`CONDITIONAL`	Logic applied	`CASE WHEN ...`
`CAST`	Type conversion	`CAST(id AS VARCHAR)`
`FUNCTION`	Function call	`UPPER(name)`

CLI

ondatrasql lineage overview              # All models with dependencies
ondatrasql lineage staging.orders        # Column lineage for one model
ondatrasql lineage staging.orders.total  # Trace one column

Use these to explore dependencies, debug transformations, and understand impact of changes.

Commit Metadata

Every run stores execution metadata inside DuckLake. No separate system required. This acts as a built-in metadata store.

Execution

Field	Description
`run_type`	`backfill`, `incremental`, `full`, `skip`, `flush`
`rows_affected`	Rows written
`duration_ms`	Execution time
`steps`	Sub-step breakdown with timing

Lineage

Field	Description
`column_lineage`	Source columns with transformation types
`depends`	Upstream table dependencies

Schema

Field	Description
`columns`	Output column definitions
`schema_hash`	Detects schema evolution

Versioning

Field	Description
`sql_hash`	Triggers backfill on change
`dag_run_id`	Run identifier
`source_file`	Model source path

Git

Field	Description
`git_commit`	Commit SHA
`git_branch`	Branch name
`git_repo_url`	Repository URL

Trace data back to the code change that produced it.

Query Metadata

Everything is just SQL:

SELECT
  commit_extra_info->>'model' AS model,
  commit_extra_info->>'run_type' AS run_type,
  commit_extra_info->>'rows_affected' AS rows,
  commit_extra_info->>'duration_ms' AS ms
FROM lake.snapshots()
ORDER BY snapshot_id DESC
LIMIT 10;

Build dashboards, alerts, or debugging tools directly in SQL.

Why This Matters

Debug pipelines faster
Understand data origins
Track changes over time
Build observability without extra tools