Models

Everything is a model — ingestion, transformation, and events in one system

Everything in OndatraSQL is a model.

A model can transform data (SQL), ingest data (API scripts), or collect events (HTTP). All models run in the same pipeline, with the same execution model.

Mental Model

SQL = transformations
Scripts = ingestion
Events = streaming input

Different inputs — same pipeline.

Three Formats — One Runtime

Type	You write	OndatraSQL does
`.sql`	A SELECT statement	Materializes, tracks changes, evolves schema
`.star`	API logic	Runs it, buffers output, materializes
`.yaml`	Config for a source function	Calls it, materializes
`.sql` (events)	A column schema	Receives HTTP events, buffers, flushes

File path = table name: models/staging/orders.sql → staging.orders.

SQL Models

Write a SELECT. Everything else is automatic.

Table creation
Incremental logic
Schema evolution
Change detection

-- @kind: merge
-- @unique_key: order_id

SELECT order_id, customer_id, total, updated_at
FROM raw.orders

Views

No materialization. Resolves live.

-- @kind: view

SELECT order_id, customer_id, total
FROM raw.orders
WHERE total > 0

Starlark Scripts

Fetch data from APIs without leaving the pipeline. HTTP, OAuth, pagination — all built in. No Python, no dependencies.

# @kind: append
# @incremental: updated_at

resp = http.get("https://api.example.com/users")
for user in resp.json:
    save.row(user)

Read From Your Data

Scripts can query DuckDB directly. This automatically creates dependencies in the DAG.

rows = query("SELECT * FROM mart.customers WHERE synced = false")
for row in rows:
    http.post("https://api.hubspot.com/contacts", json=row)
    save.row({"id": row["id"], "synced_at": str(time.now())})

Shared Libraries

Put reusable logic in lib/. Import with load().

load("lib/pagination.star", "paginate")

for page in paginate("https://api.example.com/users"):
    for user in page:
        save.row(user)

See Scripting for all built-in modules.

Events Models

Define a schema. Send events via HTTP. No Kafka. No ingestion system.

-- @kind: events

event_name VARCHAR NOT NULL,
page_url VARCHAR,
user_id VARCHAR,
received_at TIMESTAMPTZ

Then:

curl -X POST localhost:8080/collect/raw/events \
  -d '{"event_name":"pageview","page_url":"/home"}'

See Event Collection.

YAML Models

Use configuration instead of code. When the logic already exists in lib/.

kind: append
incremental: report_date
source: gam_report
config:
  network_code: ${GAM_NETWORK_CODE}
  dimensions:
    - AD_UNIT_NAME
    - DATE

OndatraSQL calls your source function automatically. See Blueprints for examples.

Directives

Control behavior with comments:

-- @kind: merge
-- @unique_key: order_id
-- @incremental: updated_at
-- @constraint: order_id NOT NULL
-- @audit: row_count > 0

See Directives for the full list.

Why Models Matter

Most tools split this into multiple systems:

Task	Traditional stack	OndatraSQL
Transform data	dbt	SQL model
Ingest APIs	Python + Airflow	Starlark model
Collect events	Kafka	Events model
Schedule runs	Airflow	`ondatrasql run`

OndatraSQL unifies everything into models. One abstraction, one pipeline, one binary.