Blueprints Blog Contact About

About

The story behind OndatraSQL — why I built a data runtime instead of another pipeline tool

The Story

I didn’t come from building data platforms. I came from needing data to work.

For nine years, I worked on publisher revenue — ad servers, SSPs, header bidding, reporting. Not as a platform engineer, but as the person responsible for having the numbers ready every morning.

And the constraint was always the same:

It has to run now, with what I already have.

Most “modern data stack” tools assume something very different. They assume a cloud warehouse is already running, that spinning up infrastructure is normal, and that the stack comes first.

That’s not how I worked. My world was a laptop, some SQL, and real data that had to be correct before the day started. A report combining Google Ad Manager revenue, Prebid auction data, and site analytics — broken down by domain, bidder, currency, device type, and ad unit — aggregated per publisher, ready before the morning standup. For every client. Every day.

Every time I evaluated tools like dbt, I hit the same wall.

Not because they were bad — but because they assumed the rest of the stack already existed.

To get started, I needed:

  • Airbyte for ingestion
  • Apache Kafka for events
  • Apache Airflow for orchestration
  • a warehouse like Snowflake
  • and something else for dashboards

Each tool solved one layer — but required all the others.

And even after setting it up, I wasn’t solving data problems. I was writing Jinja instead of SQL, working around missing connectors, and choosing infrastructure I didn’t want.

So I kept asking a simple question:

Why does collecting and transforming data require an entire stack?

Nobody built the simple version. So I did.

What OndatraSQL Is

OndatraSQL is not a data tool. It’s a data runtime.

A single binary that:

  • collects events over HTTP
  • stores them durably
  • ingests from REST APIs
  • transforms with SQL
  • tracks changes automatically
  • and makes the result queryable

No Kafka. No Airflow. No dbt. No warehouse setup.

It’s built on top of DuckDB and DuckLake — designed to run locally, or anywhere you deploy it.

Change detection, CDC, lineage, validation — these aren’t plugins or macros. They’re part of the execution model.

You don’t assemble a stack. You run a system.

The Idea

Data pipelines should feel like running a program — not operating infrastructure.

The Vision

A complete data stack:

  • Ingestion
  • Transformation
  • Validation
  • Visualization
  • Machine learning

All open source. All running on hardware you control. No cloud required.

OndatraSQL is the foundation.

The Name

Ondatra is the genus name for the muskrat (Ondatra zibethicus) — a semi-aquatic rodent that builds tunnels and channels through lakes and wetlands. It lives in the same lakes as ducks, and it builds pipes through them. DuckDB. DuckLake. Pipelines. The name wrote itself.

License

OndatraSQL is open source software released under the GNU AGPL v3.