For years, Python has been the default language for data pipelines.
Airflow DAGs. Custom ingestion scripts. Requests, Pandas, OAuth libraries, retry logic, state handling.
It works.
But after building enough pipelines, something becomes obvious:
You’re not solving data problems. You’re managing a runtime.
The Hidden Cost of Python Pipelines
A typical Python pipeline looks simple at first:
resp = requests.get(url, headers=headers)
data = resp.json()
for row in data:
process(row)
But in reality, that script depends on an entire invisible system:
- A scheduler (Airflow, cron, Prefect)
- A state store (database, files, Redis)
- Retry logic and backoff handling
- Authentication flows (OAuth, tokens, refresh)
- Logging, monitoring, error handling
- Dependency management (pip, venv, Docker)
None of that is in your code.
But all of it is your responsibility.
Pipelines Are Not Programs
This is the core mismatch.
Python is designed for general-purpose programming. Data pipelines are something else entirely.
Pipelines are:
- Deterministic
- State-aware
- Dependency-driven
- Data-first
But Python gives you:
- Mutable state
- Implicit dependencies
- Hidden side effects
- Unlimited flexibility (which becomes complexity)
So what happens?
You end up rebuilding a pipeline system inside Python.
What OndatraSQL Does Differently
OndatraSQL flips this model.
Instead of writing programs that happen to move data, you write data pipelines directly.
# @kind: append
resp = http.get("https://api.example.com/users")
for user in resp.json:
save.row(user)
That’s it.
No imports. No environment. No scheduler code. No state handling.
Because all of that already exists — in the runtime.
The Runtime Is the Difference
In Python:
- You build the runtime
- Then you write your pipeline
In OndatraSQL:
- The runtime already exists
- You just write the pipeline
What’s built in:
- HTTP client (with retries, backoff, auth)
- OAuth handling (with automatic refresh)
- State tracking (incremental cursors)
- Database access (query DuckDB directly)
- Output handling (materialization strategies)
- Change detection (automatic CDC)
You don’t install these. You don’t configure them. You don’t glue them together.
They are part of the execution model.
No More Glue Code
Python pipelines are mostly glue:
Connect API → transform → write → track state → retry → log
With OndatraSQL, that glue disappears.
You don’t write:
- Retry loops
- Token refresh logic
- Incremental filters
- Database connection code
Because those are not business logic. They are infrastructure.
Determinism by Default
Python pipelines are hard to reason about:
- Did this run already?
- What changed?
- What failed halfway?
- Is this idempotent?
You solve this with checkpoints, idempotency keys, external state.
OndatraSQL makes this the default:
- Every run is tracked via DuckLake snapshots
- Change detection is automatic
- Rollbacks are built-in
- Incremental processing is implicit
No extra code.
No Environment to Manage
Python pipelines require:
- pip
- virtualenv
- Docker
- dependency pinning
- version conflicts
OndatraSQL requires:
One binary.
That’s it. No runtime drift. No “works on my machine”. No broken environments.
You Stay in the Data Model
This is the biggest shift.
In Python:
- You think in code
- Then map it to data
In OndatraSQL:
- You think in data
- And express it directly
SQL defines transformations. Scripts handle ingestion. The runtime connects everything.
When Python Still Wins
This isn’t about replacing Python everywhere.
Python is still better when:
- You need complex algorithms
- You’re doing ML or heavy computation
- You need full language flexibility
But for pipelines? Python is often solving the wrong problem.
The Simpler System Wins
After years of building pipelines, the pattern is clear:
The best system is not the most flexible one. It’s the one that removes the most decisions.
- No tool choices
- No architecture debates
- No setup overhead
Just: write the model, run the pipeline.
The Shift
The shift is subtle but important:
From: “How do I build this pipeline?”
To: “What data do I want?”
And once you make that shift, Python starts to feel… heavy.
Final Thought
Python made data pipelines possible.
But it also made them more complex than they need to be.
OndatraSQL is an attempt to go back to something simpler:
A system where the pipeline is the code. Not the infrastructure around it.
Ondatra Labs