You don't need a data warehouse — you need a catalog

Every data project starts with a warehouse decision. What if that decision didn't exist?

March 31, 2026

architectureducklakecatalog

Every data project starts the same way.

You sit down to build a pipeline, and before you write a single line of SQL, you have to make a decision:

Which warehouse are we using?

Snowflake? BigQuery? Redshift? Postgres?

That decision pulls in everything else:

How you ingest data
How you transform it
How you query it
How you pay for it

Before you’ve even started solving the actual problem.

What if that decision didn’t exist?

That’s the idea behind OndatraSQL.

There is no warehouse to provision. No service to spin up. No infrastructure to manage.

Instead, there’s just a catalog.

So what is the catalog?

The simplest way to think about it:

The catalog is where your data lives.

Not behind an API. Not inside a managed service.

Just:

Metadata
Files

That’s it.

Two parts, one system

The catalog is made up of two things:

1. Metadata

This is the “brain”:

Table definitions
Schemas
Snapshots
Lineage
Execution history

2. Data files

This is the “body”:

Stored as Parquet
Versioned over time
Append-only

Together, they behave like a warehouse.

But they’re not one.

No warehouse, no problem

In a traditional setup, you provision a warehouse:

Create a database
Allocate compute
Configure storage
Manage access
Keep it running

In OndatraSQL:

You point the runtime at a location.

That location can be:

A local file
A folder
An S3 bucket
A server-backed catalog

And that’s enough.

What the runtime does

You don’t interact with the catalog directly.

OndatraSQL takes care of everything:

Creating tables
Evolving schemas
Detecting changes
Managing snapshots
Ensuring consistency

Every time you run your pipeline, it produces a new version of your data.

No migrations. No manual state tracking. No orchestration glue.

Why this matters

This changes how you think about data systems.

You don’t start with infrastructure anymore.

You start with:

SELECT ...
FROM ...

And the rest follows.

Compare that to the modern data stack

A typical setup looks like this:

Airbyte for ingestion
dbt for transformations
Airflow for orchestration
Kafka for events
Snowflake for storage

Each tool solves one piece.

But each one assumes the rest already exists.

The catalog flips that model

With OndatraSQL:

Storage is just files
Metadata is just a catalog
Execution happens locally

There’s no central service holding everything together.

The runtime does that instead.

One sentence

The catalog is your warehouse — but without the warehouse.

And that’s the point

You don’t need to spend a week setting up infrastructure just to answer a question.

You don’t need a stack just to run SQL.

You don’t need a warehouse to have one.