Models
Everything is a model — ingestion, transformation, and events in one system
Everything in OndatraSQL is a model.
A model can transform data (SQL), ingest data (API scripts), or collect events (HTTP). All models run in the same pipeline, with the same execution model.
Mental Model
- SQL = transformations
- Scripts = ingestion
- Events = streaming input
Different inputs — same pipeline.
Three Formats — One Runtime
| Type | You write | OndatraSQL does |
|---|---|---|
.sql | A SELECT statement | Materializes, tracks changes, evolves schema |
.star | API logic | Runs it, buffers output, materializes |
.yaml | Config for a source function | Calls it, materializes |
.sql (events) | A column schema | Receives HTTP events, buffers, flushes |
File path = table name: models/staging/orders.sql → staging.orders.
SQL Models
Write a SELECT. Everything else is automatic.
- Table creation
- Incremental logic
- Schema evolution
- Change detection
-- @kind: merge
-- @unique_key: order_id
SELECT order_id, customer_id, total, updated_at
FROM raw.orders
Views
No materialization. Resolves live.
-- @kind: view
SELECT order_id, customer_id, total
FROM raw.orders
WHERE total > 0
Starlark Scripts
Fetch data from APIs without leaving the pipeline. HTTP, OAuth, pagination — all built in. No Python, no dependencies.
# @kind: append
# @incremental: updated_at
resp = http.get("https://api.example.com/users")
for user in resp.json:
save.row(user)
Read From Your Data
Scripts can query DuckDB directly. This automatically creates dependencies in the DAG.
rows = query("SELECT * FROM mart.customers WHERE synced = false")
for row in rows:
http.post("https://api.hubspot.com/contacts", json=row)
save.row({"id": row["id"], "synced_at": str(time.now())})
Shared Libraries
Put reusable logic in lib/. Import with load().
load("lib/pagination.star", "paginate")
for page in paginate("https://api.example.com/users"):
for user in page:
save.row(user)
See Scripting for all built-in modules.
Events Models
Define a schema. Send events via HTTP. No Kafka. No ingestion system.
-- @kind: events
event_name VARCHAR NOT NULL,
page_url VARCHAR,
user_id VARCHAR,
received_at TIMESTAMPTZ
Then:
curl -X POST localhost:8080/collect/raw/events \
-d '{"event_name":"pageview","page_url":"/home"}'
See Event Collection.
YAML Models
Use configuration instead of code. When the logic already exists in lib/.
kind: append
incremental: report_date
source: gam_report
config:
network_code: ${GAM_NETWORK_CODE}
dimensions:
- AD_UNIT_NAME
- DATE
OndatraSQL calls your source function automatically. See Blueprints for examples.
Directives
Control behavior with comments:
-- @kind: merge
-- @unique_key: order_id
-- @incremental: updated_at
-- @constraint: order_id NOT NULL
-- @audit: row_count > 0
See Directives for the full list.
Why Models Matter
Most tools split this into multiple systems:
| Task | Traditional stack | OndatraSQL |
|---|---|---|
| Transform data | dbt | SQL model |
| Ingest APIs | Python + Airflow | Starlark model |
| Collect events | Kafka | Events model |
| Schedule runs | Airflow | ondatrasql run |
OndatraSQL unifies everything into models. One abstraction, one pipeline, one binary.
Ondatra Labs