Reverse ETL
Push pipeline data to external systems — no separate sync tool
Send data from your pipeline to external systems — CRMs, APIs, SaaS tools — without a separate sync service.
OndatraSQL handles reverse ETL with the same tools you already use: SQL models, Starlark scripts, and directives. No Census. No Hightouch. No extra infrastructure.
Mental Model
Forward ETL: external → DuckLake (ingestion) Reverse ETL: DuckLake → external (activation)
Same pipeline. Same runtime. Different direction.
The Pattern
- A SQL model selects the data to sync
- A Starlark script sends it to the destination
@kind: trackedskips rows that haven’t changed
-- models/mart/customers.sql
-- @kind: table
SELECT id, name, email, plan, mrr
FROM staging.subscriptions
JOIN staging.users USING (user_id)
WHERE status = 'active'
# models/sync/hubspot.star
# @kind: tracked
# @unique_key: customer_id
rows = query("SELECT * FROM mart.customers")
for row in rows:
http.post("https://api.hubspot.com/contacts/v1/contact",
headers={"Authorization": "Bearer " + env.get("HUBSPOT_KEY")},
json={"properties": [
{"property": "email", "value": row["email"]},
{"property": "plan", "value": row["plan"]},
{"property": "mrr", "value": row["mrr"]},
]},
)
save.row({"customer_id": row["id"], "name": row["name"], "email": row["email"], "plan": row["plan"], "mrr": row["mrr"]})
Sync models live in models/sync/ — a dedicated directory created by ondatrasql init. The target becomes sync.hubspot.
First run: all customers synced. Next run: only changed customers synced. The tracked kind computes a content hash per customer and skips rows where nothing changed.
Sync Behaviors
| Behavior | How to do it |
|---|---|
| Upsert | @kind: tracked — update changed, create new |
| Append | @kind: append — send every row, never skip |
| Mirror | @kind: table + script — full replace, delete removed |
| Create only | @kind: tracked — skip existing (unchanged hash = skip) |
Upsert (most common)
# @kind: tracked
# @unique_key: id
rows = query("SELECT * FROM mart.contacts")
for row in rows:
http.put("https://api.example.com/contacts/" + row["id"], json=row)
save.row(row)
Changed contacts are re-sent. Unchanged contacts are skipped automatically.
Mirror (full sync with deletes)
# @kind: table
source = query("SELECT * FROM mart.customers")
remote = http.get("https://api.example.com/customers").json
source_ids = [r["id"] for r in source]
# Upsert all
for row in source:
http.put("https://api.example.com/customers/" + row["id"], json=row)
save.row(row)
# Delete removed
for contact in remote:
if contact["id"] not in source_ids:
http.delete("https://api.example.com/customers/" + contact["id"])
@kind: table replaces the DuckLake table each run — it mirrors what was sent.
Append (event-style)
# @kind: append
# @incremental: created_at
rows = query("SELECT * FROM mart.events WHERE created_at > '" + incremental.last_value + "'")
for row in rows:
http.post("https://api.example.com/events", json=row)
save.row(row)
Every new row is sent once. Never re-sent.
Rate Limiting
Use retry and backoff for APIs with rate limits:
http.post(url, json=row, retry=3, backoff=2)
OndatraSQL retries on 429 and 5xx with exponential backoff, and respects Retry-After headers.
Authentication
All HTTP auth methods work:
# Bearer token
http.post(url, headers={"Authorization": "Bearer " + env.get("API_KEY")}, json=row)
# Basic auth
http.post(url, auth=("user", "pass"), json=row)
# OAuth2 (managed or local)
token = oauth.token("hubspot")
http.post(url, headers={"Authorization": "Bearer " + token}, json=row)
What You Don’t Need
- No Census or Hightouch subscription
- No separate sync infrastructure
- No connector marketplace
- No mapping UI
You write the sync logic in Starlark. The pipeline handles scheduling, change detection, and error handling.
Compared to Sync Tools
| Census/Hightouch | OndatraSQL | |
|---|---|---|
| Connectors | 200+ pre-built | http.post() to any API |
| Change detection | Snapshot diff engine | @kind: tracked with content hash |
| Pricing | Per-row or per-sync | Free (self-hosted) |
| Custom logic | Limited | Full Starlark scripting |
| Infrastructure | SaaS | Single binary |
| Scheduling | Built-in | Cron or ondatrasql run |
Ondatra Labs