Blueprints Blog Contact About
← Back to blog

OndatraSQL's dual-language execution model

How OndatraSQL combines SQL for transformation with Starlark for ingestion. Both languages share one dependency graph and output to DuckLake.

OndatraSQL uses two languages: SQL for data transformation and Starlark for data ingestion. Both execute in the same pipeline, share the same dependency graph, and produce output to DuckLake.

This post describes the design decisions behind this approach and how the two languages interact at runtime.

Two concerns, two languages

Data pipelines involve two distinct types of work:

Transformation is declarative. The input is a table. The output is a table. The logic is joins, aggregations, filters, and window functions. SQL expresses this directly.

Ingestion is imperative. It requires HTTP requests, pagination, authentication, error handling, and conditional logic. SQL cannot express this.

OndatraSQL uses the language that fits each concern:

ConcernLanguageModel format
TransformationSQL.sql
IngestionStarlark.star
Configured sourcesYAML + Starlark.yaml

SQL models

SQL models are SELECT statements with directives:

-- models/mart/revenue.sql
-- @kind: table

SELECT
    order_date,
    COUNT(*) AS orders,
    SUM(amount) AS revenue
FROM staging.orders
GROUP BY order_date

OndatraSQL handles table creation, schema evolution, change detection, incremental processing, dependency ordering, and validation. The user writes the query.

Starlark models

Starlark is a deterministic scripting language with Python-like syntax, originally designed for build systems (Bazel, Buck2). OndatraSQL embeds Starlark directly in the binary.

# models/raw/users.star
# @kind: append
# @incremental: updated_at

resp = http.get("https://api.example.com/users",
    params={"since": incremental.last_value})

for user in resp.json:
    save.row(user)

Why Starlark

Starlark was chosen for three properties:

  1. Deterministic — no mutable global state, no threading. Same input always produces the same output.
  2. Sandboxed — no filesystem access, no arbitrary imports. Scripts can only use the modules OndatraSQL provides.
  3. Embeddable — runs inside the Go binary. No external runtime, no dependency management.

Built-in modules

ModuleProvides
httpGET, POST, PUT, DELETE with automatic retries
http.oauthOAuth2 client credentials and authorization code flows
saveRow output to the pipeline
queryRead from DuckDB (creates DAG dependencies)
timeTimestamps and formatting
jsonParse and serialize JSON
base64Encode and decode

Shared libraries

Reusable logic lives in lib/ and is imported with load():

load("lib/pagination.star", "paginate")

for page in paginate("https://api.example.com/orders"):
    for order in page:
        save.row(order)

How the two languages interact

All model types — SQL, Starlark, and YAML — participate in the same dependency graph. Dependencies are extracted automatically:

  • SQL models: table references from FROM, JOIN, CTEs
  • Starlark models: query("SELECT ... FROM table") calls

A single ondatrasql run executes both languages in DAG order:

raw/users.star     (Starlark — ingests from API)
    ↓
staging/users.sql  (SQL — transforms)
    ↓
mart/metrics.sql   (SQL — aggregates)

Tradeoffs

Starlark is not a general-purpose language. It does not have access to the Python ecosystem, machine learning libraries, or specialized parsers. For workloads that require those capabilities, a separate ingestion step using Python or another tool may be more appropriate.

The benefit is simplicity: one binary handles both transformation and ingestion, with no external runtime to manage.