API Dict

On this page

The API dict declares the complete contract for a lib function. One dict per file. All values must be literals — no variables, no concatenation. The runtime parses it as AST without executing code.

API = {
    # Shared config (injected into all http.* calls)
    "base_url": "https://api.example.com",
    "auth": {"env": "API_KEY"},
    "headers": {"Accept": "application/json"},
    "timeout": 30,
    "retry": 3,
    "backoff": 1,
    "rate_limit": {"requests": 100, "per": "10s"},

    # Inbound
    "fetch": {
        "args": ["resource"],
        "page_size": 100,
        "dynamic_columns": True,
    },

    # Outbound
    "push": {
        "batch_size": 100,
        "batch_mode": "sync",
    },
}

Top-level config (shared)

Injected into all http.* calls by both fetch() and push().

FieldTypeDefaultDescription
base_urlstringrequiredPrepended to relative URLs in http.* calls
authdictnoneAuth injection (see Auth patterns)
headersdictnoneDefault headers merged into every request
timeoutint30Request timeout in seconds
retryint0Number of retries on 5xx/429
backoffint1Initial backoff in seconds (exponential)
rate_limitdictnoneProactive rate limiting: {"requests": N, "per": "Ns"}

Per-call kwargs in http.get(url, timeout=60) override these defaults.

Auth patterns

The runtime handles token refresh, header injection, and caching. Auth is only injected when the caller does NOT set auth= in the http.* call.

Google service account

"auth": {
    "google_key_file_env": "GAM_KEY_FILE",
    "scope": "https://www.googleapis.com/auth/admanager",
}

google_key_file_env resolves the key file path from .env. The runtime handles JWT signing and token refresh automatically.

For a literal path (not recommended — hardcodes the filename):

"auth": {
    "google_key_file": "service-account.json",
    "scope": "https://www.googleapis.com/auth/analytics.readonly",
}

OAuth2 provider (browser-based SaaS APIs)

"auth": {"provider": "hubspot"}

Register with ondatrasql auth <provider>. Tokens refresh automatically.

API key from .env

"auth": {"env": "API_KEY"}                              # → Authorization: Bearer <value>
"auth": {"env": "API_KEY", "header": "X-Api-Key"}       # → X-Api-Key: <value>
"auth": {"env": "API_KEY", "param": "api_key"}           # → ?api_key=<value>

Basic auth

"auth": {"env_user": "USER", "env_pass": "PASS"}        # → Authorization: Basic <base64>

Fetch section

FieldTypeDefaultDescription
argslist[]Parameter names passed from SQL
columnsdictnoneFixed column definitions ({name: {type}})
dynamic_columnsboolFalseAccept any column name — types inferred from SQL casts
page_sizeint0Rows per page (0 = single call)

columns vs dynamic_columns

Use columns when the API has a fixed schema you want to declare:

"columns": {
    "id": {"type": "BIGINT"},
    "amount": {"type": "DOUBLE"},
    "created_at": {"type": "TIMESTAMP"},
}

Use dynamic_columns: True when SQL controls the schema. The runtime extracts column names and types from the SELECT via DuckDB AST and passes them to fetch() as the columns kwarg:

"dynamic_columns": True
-- SQL casts become types in columns kwarg
SELECT name, amount::DECIMAL, items::JSON FROM my_api('orders')
-- columns = [{"name": "name", "type": "string"}, {"name": "amount", "type": "number"}, {"name": "items", "type": "array"}]

dynamic_columns is the recommended approach — it keeps the schema in SQL where transformations belong.

Args from SQL

API = {"fetch": {"args": ["resource", "options"]}}
SELECT * FROM my_api('users', '{"filter": "active"}')

Args are positional strings. For structured configuration, pass JSON and decode in Starlark:

def fetch(resource, options, page):
    opts = json.decode(options) if options else {}

Push section

FieldTypeDefaultDescription
batch_sizeint1Rows per push() call
batch_modestring"sync""sync", "atomic", or "async"
max_concurrentint1Parallel batch workers
rate_limitdictinheritedPer-direction override
poll_intervalstring"30s"Async polling interval
poll_timeoutstring"1h"Async polling timeout

Batch modes

ModeReturnBehavior
sync{"rowid:change_type": status}Per-row ack/nack
atomicNoneAll-or-nothing
async{"job_id": ...}Job-based polling via poll()

Finalize (optional)

def finalize(succeeded, failed):
    if failed == 0:
        http.post(webhook, json={"status": "complete", "rows": succeeded})

Literal values only

The API dict is parsed as Starlark AST — not executed. All values must be literals:

# Works — literal values
API = {"base_url": "https://api.example.com", "timeout": 30}

# Does NOT work — variable reference
BASE = "https://api.example.com"
API = {"base_url": BASE}

# Does NOT work — concatenation
API = {"base_url": "https://" + HOST}

This is by design. The dict is pure configuration — readable, validatable, and inspectable without running code. Dynamic values belong in the fetch() or push() function.

Validation

The runtime validates at startup:

  • fetch() params must match args + optional page/columns/target
  • push() must take exactly one parameter (rows)
  • batch_mode must be "sync", "atomic", or "async"
  • rate_limit.per must be valid duration
  • rate_limit.requests must be > 0
  • max_concurrent > 1 not allowed with atomic or async

See Fetch Contract and Push Contract for the complete function specs.