About

What OndatraSQL is, why it exists, and the design decisions behind it.

OndatraSQL is a data runtime that handles ingestion, transformation, validation, and scheduling in a single binary. It is built on DuckDB for query execution and DuckLake for catalog management, snapshots, and time-travel.

Design Principles

Single binary

One executable with no external dependencies. Install it, create a project, run it.

In-process execution

Pipelines execute inside the binary itself. No separate scheduler service, no separate query engine, no separate metadata database. The same process runs on a laptop, in CI, or on a server.

SQL-first

Transformations are SQL files. The runtime handles materialization, change detection, schema evolution, and dependency ordering.

Validation as execution

Constraints, audits, and warnings run as part of the pipeline, not as a separate step.

Snapshots by default

Every run creates a DuckLake snapshot. Previous states are queryable via time-travel. Failed runs leave no trace.

Incremental by default

Only changed data is processed. The runtime uses DuckLake's table_changes() to detect which rows changed between snapshots.

Sandbox preview

Run the full pipeline against a temporary catalog copy. See row diffs, schema changes, and downstream impact before committing.

Technology

DuckDB

Query execution

DuckLake

Snapshots, time-travel, ACID

Runtime language

Starlark

Sandboxed scripting

Apache Parquet

Columnar storage

SQLite

Default catalog backend

DuckDB

Alternative catalog backend

PostgreSQL

Multi-user catalog backend

Durable event buffer

REST query protocol

Cloud data storage

Cloud data storage

Open Source

OndatraSQL is released under the GNU AGPL v3. Source code is available on GitHub.

The Name

Ondatra is the genus name for the muskrat (Ondatra zibethicus), a semi-aquatic rodent that builds tunnels and channels through lakes and wetlands. It lives in the same lakes as ducks, and it builds pipes through them. DuckDB. DuckLake. Pipelines.

Background

OndatraSQL is created and maintained by Marcus Hernandez, who spent nine years working on publisher revenue in ad tech (ad servers, SSPs, header bidding, and reporting). The constraint was always the same: data had to be correct before the morning standup, running on whatever was available.

That experience shaped OndatraSQL's design: a single system that collects, transforms, and validates data without requiring separate infrastructure for each concern.