OndatraSQL uses two languages: SQL for data transformation and Starlark for data ingestion. Both execute in the same pipeline, share the same dependency graph, and produce output to DuckLake.
This post describes the design decisions behind this approach and how the two languages interact at runtime.
Two concerns, two languages
Data pipelines involve two distinct types of work:
Transformation is declarative. The input is a table. The output is a table. The logic is joins, aggregations, filters, and window functions. SQL expresses this directly.
Ingestion is imperative. It requires HTTP requests, pagination, authentication, error handling, and conditional logic. SQL cannot express this.
OndatraSQL uses the language that fits each concern:
| Concern | Language | Model format |
|---|---|---|
| Transformation | SQL | .sql |
| Ingestion | Starlark | .star |
| Configured sources | YAML + Starlark | .yaml |
SQL models
SQL models are SELECT statements with directives:
-- models/mart/revenue.sql
-- @kind: table
SELECT
order_date,
COUNT(*) AS orders,
SUM(amount) AS revenue
FROM staging.orders
GROUP BY order_date
OndatraSQL handles table creation, schema evolution, change detection, incremental processing, dependency ordering, and validation. The user writes the query.
Starlark models
Starlark is a deterministic scripting language with Python-like syntax, originally designed for build systems (Bazel, Buck2). OndatraSQL embeds Starlark directly in the binary.
# models/raw/users.star
# @kind: append
# @incremental: updated_at
resp = http.get("https://api.example.com/users",
params={"since": incremental.last_value})
for user in resp.json:
save.row(user)
Why Starlark
Starlark was chosen for three properties:
- Deterministic — no mutable global state, no threading. Same input always produces the same output.
- Sandboxed — no filesystem access, no arbitrary imports. Scripts can only use the modules OndatraSQL provides.
- Embeddable — runs inside the Go binary. No external runtime, no dependency management.
Built-in modules
| Module | Provides |
|---|---|
http | GET, POST, PUT, DELETE with automatic retries |
http.oauth | OAuth2 client credentials and authorization code flows |
save | Row output to the pipeline |
query | Read from DuckDB (creates DAG dependencies) |
time | Timestamps and formatting |
json | Parse and serialize JSON |
base64 | Encode and decode |
Shared libraries
Reusable logic lives in lib/ and is imported with load():
load("lib/pagination.star", "paginate")
for page in paginate("https://api.example.com/orders"):
for order in page:
save.row(order)
How the two languages interact
All model types — SQL, Starlark, and YAML — participate in the same dependency graph. Dependencies are extracted automatically:
- SQL models: table references from
FROM,JOIN, CTEs - Starlark models:
query("SELECT ... FROM table")calls
A single ondatrasql run executes both languages in DAG order:
raw/users.star (Starlark — ingests from API)
↓
staging/users.sql (SQL — transforms)
↓
mart/metrics.sql (SQL — aggregates)
Tradeoffs
Starlark is not a general-purpose language. It does not have access to the Python ecosystem, machine learning libraries, or specialized parsers. For workloads that require those capabilities, a separate ingestion step using Python or another tool may be more appropriate.
The benefit is simplicity: one binary handles both transformation and ingestion, with no external runtime to manage.
Ondatra Labs