Reverse ETL

Push pipeline data to external systems — no separate sync tool

Send data from your pipeline to external systems — CRMs, APIs, SaaS tools — without a separate sync service.

OndatraSQL handles reverse ETL with the same tools you already use: SQL models, Starlark scripts, and directives. No Census. No Hightouch. No extra infrastructure.

Mental Model

Forward ETL: external → DuckLake (ingestion) Reverse ETL: DuckLake → external (activation)

Same pipeline. Same runtime. Different direction.

The Pattern

A SQL model selects the data to sync
A Starlark script sends it to the destination
@kind: tracked skips rows that haven’t changed

-- models/mart/customers.sql
-- @kind: table

SELECT id, name, email, plan, mrr
FROM staging.subscriptions
JOIN staging.users USING (user_id)
WHERE status = 'active'

# models/sync/hubspot.star
# @kind: tracked
# @unique_key: customer_id

rows = query("SELECT * FROM mart.customers")
for row in rows:
    http.post("https://api.hubspot.com/contacts/v1/contact",
        headers={"Authorization": "Bearer " + env.get("HUBSPOT_KEY")},
        json={"properties": [
            {"property": "email", "value": row["email"]},
            {"property": "plan", "value": row["plan"]},
            {"property": "mrr", "value": row["mrr"]},
        ]},
    )
    save.row({"customer_id": row["id"], "name": row["name"], "email": row["email"], "plan": row["plan"], "mrr": row["mrr"]})

Sync models live in models/sync/ — a dedicated directory created by ondatrasql init. The target becomes sync.hubspot.

First run: all customers synced. Next run: only changed customers synced. The tracked kind computes a content hash per customer and skips rows where nothing changed.

Sync Behaviors

Behavior	How to do it
Upsert	`@kind: tracked` — update changed, create new
Append	`@kind: append` — send every row, never skip
Mirror	`@kind: table` + script — full replace, delete removed
Create only	`@kind: tracked` — skip existing (unchanged hash = skip)

Upsert (most common)

# @kind: tracked
# @unique_key: id

rows = query("SELECT * FROM mart.contacts")
for row in rows:
    http.put("https://api.example.com/contacts/" + row["id"], json=row)
    save.row(row)

Changed contacts are re-sent. Unchanged contacts are skipped automatically.

Mirror (full sync with deletes)

# @kind: table

source = query("SELECT * FROM mart.customers")
remote = http.get("https://api.example.com/customers").json

source_ids = [r["id"] for r in source]

# Upsert all
for row in source:
    http.put("https://api.example.com/customers/" + row["id"], json=row)
    save.row(row)

# Delete removed
for contact in remote:
    if contact["id"] not in source_ids:
        http.delete("https://api.example.com/customers/" + contact["id"])

@kind: table replaces the DuckLake table each run — it mirrors what was sent.

Append (event-style)

# @kind: append
# @incremental: created_at

rows = query("SELECT * FROM mart.events WHERE created_at > '" + incremental.last_value + "'")
for row in rows:
    http.post("https://api.example.com/events", json=row)
    save.row(row)

Every new row is sent once. Never re-sent.

Rate Limiting

Use retry and backoff for APIs with rate limits:

http.post(url, json=row, retry=3, backoff=2)

OndatraSQL retries on 429 and 5xx with exponential backoff, and respects Retry-After headers.

Authentication

All HTTP auth methods work:

# Bearer token
http.post(url, headers={"Authorization": "Bearer " + env.get("API_KEY")}, json=row)

# Basic auth
http.post(url, auth=("user", "pass"), json=row)

# OAuth2 (managed or local)
token = oauth.token("hubspot")
http.post(url, headers={"Authorization": "Bearer " + token}, json=row)

What You Don’t Need

No Census or Hightouch subscription
No separate sync infrastructure
No connector marketplace
No mapping UI

You write the sync logic in Starlark. The pipeline handles scheduling, change detection, and error handling.

Compared to Sync Tools

	Census/Hightouch	OndatraSQL
Connectors	200+ pre-built	`http.post()` to any API
Change detection	Snapshot diff engine	`@kind: tracked` with content hash
Pricing	Per-row or per-sync	Free (self-hosted)
Custom logic	Limited	Full Starlark scripting
Infrastructure	SaaS	Single binary
Scheduling	Built-in	Cron or `ondatrasql run`