Lineage & Metadata
Built-in lineage and metadata — no external systems required
OndatraSQL automatically tracks how your data is built. No external lineage tool. No metadata store. No integration required.
Every run records:
- Column-level lineage
- Execution details
- Schema changes
- Dependencies
- Git commit
All stored in DuckLake. All queryable with SQL.
Mental Model
Every pipeline run creates a snapshot. Each snapshot includes the data, how it was produced, and where it came from.
Column Lineage
Every column is traced back to its source — across CTEs, joins, subqueries. No annotations. No manual mapping.
ondatrasql lineage mart.revenue.total_amount
Shows source columns, transformations applied, upstream tables.
Transformation Types
Each column is classified by how it was created:
| Type | Meaning | Example |
|---|---|---|
IDENTITY | Direct copy | SELECT name |
AGGREGATION | Aggregated value | SUM(amount) |
ARITHMETIC | Computed | price * quantity |
CONDITIONAL | Logic applied | CASE WHEN ... |
CAST | Type conversion | CAST(id AS VARCHAR) |
FUNCTION | Function call | UPPER(name) |
CLI
ondatrasql lineage overview # All models with dependencies
ondatrasql lineage staging.orders # Column lineage for one model
ondatrasql lineage staging.orders.total # Trace one column
Use these to explore dependencies, debug transformations, and understand impact of changes.
Commit Metadata
Every run stores execution metadata inside DuckLake. No separate system required. This acts as a built-in metadata store.
Execution
| Field | Description |
|---|---|
run_type | backfill, incremental, full, skip, flush |
rows_affected | Rows written |
duration_ms | Execution time |
steps | Sub-step breakdown with timing |
Lineage
| Field | Description |
|---|---|
column_lineage | Source columns with transformation types |
depends | Upstream table dependencies |
Schema
| Field | Description |
|---|---|
columns | Output column definitions |
schema_hash | Detects schema evolution |
Versioning
| Field | Description |
|---|---|
sql_hash | Triggers backfill on change |
dag_run_id | Run identifier |
source_file | Model source path |
Git
| Field | Description |
|---|---|
git_commit | Commit SHA |
git_branch | Branch name |
git_repo_url | Repository URL |
Trace data back to the code change that produced it.
Query Metadata
Everything is just SQL:
SELECT
commit_extra_info->>'model' AS model,
commit_extra_info->>'run_type' AS run_type,
commit_extra_info->>'rows_affected' AS rows,
commit_extra_info->>'duration_ms' AS ms
FROM lake.snapshots()
ORDER BY snapshot_id DESC
LIMIT 10;
Build dashboards, alerts, or debugging tools directly in SQL.
Why This Matters
- Debug pipelines faster
- Understand data origins
- Track changes over time
- Build observability without extra tools
Ondatra Labs