API Dict
On this page
The API dict declares the complete contract for a lib function. One dict per file. All values must be literals — no variables, no concatenation. The runtime parses it as AST without executing code.
API = {
# Shared config (injected into all http.* calls)
"base_url": "https://api.example.com",
"auth": {"env": "API_KEY"},
"headers": {"Accept": "application/json"},
"timeout": 30,
"retry": 3,
"backoff": 1,
"rate_limit": {"requests": 100, "per": "10s"},
# Inbound
"fetch": {
"args": ["resource"],
"page_size": 100,
"dynamic_columns": True,
},
# Outbound
"push": {
"batch_size": 100,
"batch_mode": "sync",
},
}
Top-level config (shared)
Injected into all http.* calls by both fetch() and push().
| Field | Type | Default | Description |
|---|---|---|---|
base_url | string | required | Prepended to relative URLs in http.* calls |
auth | dict | none | Auth injection (see Auth patterns) |
headers | dict | none | Default headers merged into every request |
timeout | int | 30 | Request timeout in seconds |
retry | int | 0 | Number of retries on 5xx/429 |
backoff | int | 1 | Initial backoff in seconds (exponential) |
rate_limit | dict | none | Proactive rate limiting: {"requests": N, "per": "Ns"} |
Per-call kwargs in http.get(url, timeout=60) override these defaults.
Auth patterns
The runtime handles token refresh, header injection, and caching. Auth is only injected when the caller does NOT set auth= in the http.* call.
Google service account
"auth": {
"google_key_file_env": "GAM_KEY_FILE",
"scope": "https://www.googleapis.com/auth/admanager",
}
google_key_file_env resolves the key file path from .env. The runtime handles JWT signing and token refresh automatically.
For a literal path (not recommended — hardcodes the filename):
"auth": {
"google_key_file": "service-account.json",
"scope": "https://www.googleapis.com/auth/analytics.readonly",
}
OAuth2 provider (browser-based SaaS APIs)
"auth": {"provider": "hubspot"}
Register with ondatrasql auth <provider>. Tokens refresh automatically.
API key from .env
"auth": {"env": "API_KEY"} # → Authorization: Bearer <value>
"auth": {"env": "API_KEY", "header": "X-Api-Key"} # → X-Api-Key: <value>
"auth": {"env": "API_KEY", "param": "api_key"} # → ?api_key=<value>
Basic auth
"auth": {"env_user": "USER", "env_pass": "PASS"} # → Authorization: Basic <base64>
Fetch section
| Field | Type | Default | Description |
|---|---|---|---|
args | list | [] | Parameter names passed from SQL |
columns | dict | none | Fixed column definitions ({name: {type}}) |
dynamic_columns | bool | False | Accept any column name — types inferred from SQL casts |
page_size | int | 0 | Rows per page (0 = single call) |
columns vs dynamic_columns
Use columns when the API has a fixed schema you want to declare:
"columns": {
"id": {"type": "BIGINT"},
"amount": {"type": "DOUBLE"},
"created_at": {"type": "TIMESTAMP"},
}
Use dynamic_columns: True when SQL controls the schema. The runtime extracts column names and types from the SELECT via DuckDB AST and passes them to fetch() as the columns kwarg:
"dynamic_columns": True
-- SQL casts become types in columns kwarg
SELECT name, amount::DECIMAL, items::JSON FROM my_api('orders')
-- columns = [{"name": "name", "type": "string"}, {"name": "amount", "type": "number"}, {"name": "items", "type": "array"}]
dynamic_columns is the recommended approach — it keeps the schema in SQL where transformations belong.
Args from SQL
API = {"fetch": {"args": ["resource", "options"]}}
SELECT * FROM my_api('users', '{"filter": "active"}')
Args are positional strings. For structured configuration, pass JSON and decode in Starlark:
def fetch(resource, options, page):
opts = json.decode(options) if options else {}
Push section
| Field | Type | Default | Description |
|---|---|---|---|
batch_size | int | 1 | Rows per push() call |
batch_mode | string | "sync" | "sync", "atomic", or "async" |
max_concurrent | int | 1 | Parallel batch workers |
rate_limit | dict | inherited | Per-direction override |
poll_interval | string | "30s" | Async polling interval |
poll_timeout | string | "1h" | Async polling timeout |
Batch modes
| Mode | Return | Behavior |
|---|---|---|
sync | {"rowid:change_type": status} | Per-row ack/nack |
atomic | None | All-or-nothing |
async | {"job_id": ...} | Job-based polling via poll() |
Finalize (optional)
def finalize(succeeded, failed):
if failed == 0:
http.post(webhook, json={"status": "complete", "rows": succeeded})
Literal values only
The API dict is parsed as Starlark AST — not executed. All values must be literals:
# Works — literal values
API = {"base_url": "https://api.example.com", "timeout": 30}
# Does NOT work — variable reference
BASE = "https://api.example.com"
API = {"base_url": BASE}
# Does NOT work — concatenation
API = {"base_url": "https://" + HOST}
This is by design. The dict is pure configuration — readable, validatable, and inspectable without running code. Dynamic values belong in the fetch() or push() function.
Validation
The runtime validates at startup:
fetch()params must matchargs+ optionalpage/columns/targetpush()must take exactly one parameter (rows)batch_modemust be"sync","atomic", or"async"rate_limit.permust be valid durationrate_limit.requestsmust be > 0max_concurrent > 1not allowed withatomicorasync
See Fetch Contract and Push Contract for the complete function specs.
OndatraSQL