Blueprints Blog Contact About

Reverse ETL

Push pipeline data to external systems — no separate sync tool

Send data from your pipeline to external systems — CRMs, APIs, SaaS tools — without a separate sync service.

OndatraSQL handles reverse ETL with the same tools you already use: SQL models, Starlark scripts, and directives. No Census. No Hightouch. No extra infrastructure.

Mental Model

Forward ETL: external → DuckLake (ingestion) Reverse ETL: DuckLake → external (activation)

Same pipeline. Same runtime. Different direction.

The Pattern

  1. A SQL model selects the data to sync
  2. A Starlark script sends it to the destination
  3. @kind: tracked skips rows that haven’t changed
-- models/mart/customers.sql
-- @kind: table

SELECT id, name, email, plan, mrr
FROM staging.subscriptions
JOIN staging.users USING (user_id)
WHERE status = 'active'
# models/sync/hubspot.star
# @kind: tracked
# @unique_key: customer_id

rows = query("SELECT * FROM mart.customers")
for row in rows:
    http.post("https://api.hubspot.com/contacts/v1/contact",
        headers={"Authorization": "Bearer " + env.get("HUBSPOT_KEY")},
        json={"properties": [
            {"property": "email", "value": row["email"]},
            {"property": "plan", "value": row["plan"]},
            {"property": "mrr", "value": row["mrr"]},
        ]},
    )
    save.row({"customer_id": row["id"], "name": row["name"], "email": row["email"], "plan": row["plan"], "mrr": row["mrr"]})

Sync models live in models/sync/ — a dedicated directory created by ondatrasql init. The target becomes sync.hubspot.

First run: all customers synced. Next run: only changed customers synced. The tracked kind computes a content hash per customer and skips rows where nothing changed.

Sync Behaviors

BehaviorHow to do it
Upsert@kind: tracked — update changed, create new
Append@kind: append — send every row, never skip
Mirror@kind: table + script — full replace, delete removed
Create only@kind: tracked — skip existing (unchanged hash = skip)

Upsert (most common)

# @kind: tracked
# @unique_key: id

rows = query("SELECT * FROM mart.contacts")
for row in rows:
    http.put("https://api.example.com/contacts/" + row["id"], json=row)
    save.row(row)

Changed contacts are re-sent. Unchanged contacts are skipped automatically.

Mirror (full sync with deletes)

# @kind: table

source = query("SELECT * FROM mart.customers")
remote = http.get("https://api.example.com/customers").json

source_ids = [r["id"] for r in source]

# Upsert all
for row in source:
    http.put("https://api.example.com/customers/" + row["id"], json=row)
    save.row(row)

# Delete removed
for contact in remote:
    if contact["id"] not in source_ids:
        http.delete("https://api.example.com/customers/" + contact["id"])

@kind: table replaces the DuckLake table each run — it mirrors what was sent.

Append (event-style)

# @kind: append
# @incremental: created_at

rows = query("SELECT * FROM mart.events WHERE created_at > '" + incremental.last_value + "'")
for row in rows:
    http.post("https://api.example.com/events", json=row)
    save.row(row)

Every new row is sent once. Never re-sent.

Rate Limiting

Use retry and backoff for APIs with rate limits:

http.post(url, json=row, retry=3, backoff=2)

OndatraSQL retries on 429 and 5xx with exponential backoff, and respects Retry-After headers.

Authentication

All HTTP auth methods work:

# Bearer token
http.post(url, headers={"Authorization": "Bearer " + env.get("API_KEY")}, json=row)

# Basic auth
http.post(url, auth=("user", "pass"), json=row)

# OAuth2 (managed or local)
token = oauth.token("hubspot")
http.post(url, headers={"Authorization": "Bearer " + token}, json=row)

What You Don’t Need

  • No Census or Hightouch subscription
  • No separate sync infrastructure
  • No connector marketplace
  • No mapping UI

You write the sync logic in Starlark. The pipeline handles scheduling, change detection, and error handling.

Compared to Sync Tools

Census/HightouchOndatraSQL
Connectors200+ pre-builthttp.post() to any API
Change detectionSnapshot diff engine@kind: tracked with content hash
PricingPer-row or per-syncFree (self-hosted)
Custom logicLimitedFull Starlark scripting
InfrastructureSaaSSingle binary
SchedulingBuilt-inCron or ondatrasql run