Why I Stopped Writing Data Pipelines in Python

Python made data pipelines possible. But it also made them more complex than they need to be.

March 31, 2026

pipelinespythonstarlarkingestion

For years, Python has been the default language for data pipelines.

Airflow DAGs. Custom ingestion scripts. Requests, Pandas, OAuth libraries, retry logic, state handling.

It works.

But after building enough pipelines, something becomes obvious:

You’re not solving data problems. You’re managing a runtime.

The Hidden Cost of Python Pipelines

A typical Python pipeline looks simple at first:

resp = requests.get(url, headers=headers)
data = resp.json()

for row in data:
    process(row)

But in reality, that script depends on an entire invisible system:

A scheduler (Airflow, cron, Prefect)
A state store (database, files, Redis)
Retry logic and backoff handling
Authentication flows (OAuth, tokens, refresh)
Logging, monitoring, error handling
Dependency management (pip, venv, Docker)

None of that is in your code.

But all of it is your responsibility.

Pipelines Are Not Programs

This is the core mismatch.

Python is designed for general-purpose programming. Data pipelines are something else entirely.

Pipelines are:

Deterministic
State-aware
Dependency-driven
Data-first

But Python gives you:

Mutable state
Implicit dependencies
Hidden side effects
Unlimited flexibility (which becomes complexity)

So what happens?

You end up rebuilding a pipeline system inside Python.

What OndatraSQL Does Differently

OndatraSQL flips this model.

Instead of writing programs that happen to move data, you write data pipelines directly.

# @kind: append

resp = http.get("https://api.example.com/users")

for user in resp.json:
    save.row(user)

That’s it.

No imports. No environment. No scheduler code. No state handling.

Because all of that already exists — in the runtime.

The Runtime Is the Difference

In Python:

You build the runtime
Then you write your pipeline

In OndatraSQL:

The runtime already exists
You just write the pipeline

What’s built in:

HTTP client (with retries, backoff, auth)
OAuth handling (with automatic refresh)
State tracking (incremental cursors)
Database access (query DuckDB directly)
Output handling (materialization strategies)
Change detection (automatic CDC)

You don’t install these. You don’t configure them. You don’t glue them together.

They are part of the execution model.

No More Glue Code

Python pipelines are mostly glue:

Connect API → transform → write → track state → retry → log

With OndatraSQL, that glue disappears.

You don’t write:

Retry loops
Token refresh logic
Incremental filters
Database connection code

Because those are not business logic. They are infrastructure.

Determinism by Default

Python pipelines are hard to reason about:

Did this run already?
What changed?
What failed halfway?
Is this idempotent?

You solve this with checkpoints, idempotency keys, external state.

OndatraSQL makes this the default:

Every run is tracked via DuckLake snapshots
Change detection is automatic
Rollbacks are built-in
Incremental processing is implicit

No extra code.

No Environment to Manage

Python pipelines require:

pip
virtualenv
Docker
dependency pinning
version conflicts

OndatraSQL requires:

One binary.

That’s it. No runtime drift. No “works on my machine”. No broken environments.

You Stay in the Data Model

This is the biggest shift.

In Python:

You think in code
Then map it to data

In OndatraSQL:

You think in data
And express it directly

SQL defines transformations. Scripts handle ingestion. The runtime connects everything.

When Python Still Wins

This isn’t about replacing Python everywhere.

Python is still better when:

You need complex algorithms
You’re doing ML or heavy computation
You need full language flexibility

But for pipelines? Python is often solving the wrong problem.

The Simpler System Wins

After years of building pipelines, the pattern is clear:

The best system is not the most flexible one. It’s the one that removes the most decisions.

No tool choices
No architecture debates
No setup overhead

Just: write the model, run the pipeline.

The Shift

The shift is subtle but important:

From: “How do I build this pipeline?”

To: “What data do I want?”

And once you make that shift, Python starts to feel… heavy.

Final Thought

Python made data pipelines possible.

But it also made them more complex than they need to be.

OndatraSQL is an attempt to go back to something simpler:

A system where the pipeline is the code. Not the infrastructure around it.