HTTP Module
Built-in HTTP client for API ingestion
Fetch data from APIs directly inside your pipeline.
No Python. No requests. No separate ingestion tool.
The http module is built into OndatraSQL and designed for production data ingestion. It is a predeclared global — no import needed.
Why This Exists
Most data stacks require a separate ingestion layer:
- Python scripts
- Airbyte connectors
- Custom services
With OndatraSQL, ingestion is part of the pipeline. You fetch data, transform it, and materialize it — in one runtime.
Mental Model
http.get()fetches datasave.row()emits rows- OndatraSQL handles storage, validation, and materialization
You don’t write scripts around APIs — you write models.
Production-Ready by Default
- Automatic retries with exponential backoff
- Rate limit handling (
Retry-After) - OAuth, Basic, and Digest authentication
- mTLS support (client certificates)
- Built-in timeout and error handling
HTTP Methods
All methods follow the same pattern:
http.get(url, ...)
http.post(url, ...)
http.put(url, ...)
http.patch(url, ...)
http.delete(url, ...)
GET
resp = http.get(url)
resp = http.get(url, headers={"Authorization": "Bearer " + token})
resp = http.get(url, params={"page": "1", "limit": "50"})
resp = http.get(url, headers=headers, timeout=60, retry=3)
POST
# JSON body (sets Content-Type: application/json automatically)
resp = http.post(url, json={"key": "value"})
# Form data (sets Content-Type: application/x-www-form-urlencoded automatically)
resp = http.post(url, data={"key": "value"})
# With headers and options
resp = http.post(url, json=payload, headers=headers, timeout=60)
PUT / PATCH / DELETE
resp = http.put(url, json=payload, headers=headers)
resp = http.patch(url, json=patch)
resp = http.delete(url, headers=headers)
json= and data= are mutually exclusive — use one or the other.
Response Object
| Field | Type | Description |
|---|---|---|
status_code | int | HTTP status code |
text | string | Raw response body |
ok | bool | True if status code is 200–299 |
json | any | Parsed JSON, or None if the response is not JSON |
headers | dict | Response headers |
Options
| Kwarg | Default | Description |
|---|---|---|
headers | None | Request headers dict |
params | None | Query parameters dict (appended to URL) |
json | None | JSON body (dict, list, or string) |
data | None | Form-encoded body (dict) |
timeout | 30 | Request timeout in seconds |
retry | 0 | Number of retry attempts |
backoff | 1 | Initial backoff between retries in seconds |
auth | None | Authentication tuple — see Authentication |
cert | None | Path to client certificate file (PEM) — see mTLS |
key | None | Path to client private key file (PEM) |
ca | None | Path to custom CA certificate file (PEM) |
Retry and Backoff
APIs fail. OndatraSQL retries automatically.
- Retries on 429 and 5xx errors
- Exponential backoff with jitter
- Respects
Retry-Afterheaders
When the server sends a Retry-After header (on 429 or 503), OndatraSQL respects it — both integer seconds and HTTP-date (RFC 1123) formats are supported. The Retry-After value overrides the calculated backoff for that retry.
# Retry up to 3 times with 2s initial backoff (2s, 4s, 8s + jitter)
resp = http.get(url, retry=3, backoff=2)
Authentication
Basic Auth
# Basic auth (default scheme)
resp = http.get(url, auth=("username", "password"))
resp = http.get(url, auth=("username", "password", "basic"))
Digest Auth
HTTP Digest Authentication (RFC 7616) is a challenge-response scheme. The client first receives a 401 with a nonce from the server, then re-sends the request with a computed hash. OndatraSQL handles this automatically.
# Digest auth — the challenge-response is handled transparently
resp = http.get(url, auth=("username", "password", "digest"))
Supports MD5 and SHA-256 algorithms with qop=auth.
When using digest auth, at least one retry is required for the challenge-response flow. OndatraSQL automatically sets retry=1 if it’s not already >= 1.
mTLS (Mutual TLS)
For APIs that require client certificate authentication (PSD2/Open Banking, government APIs, enterprise service-to-service), use the cert and key kwargs.
# Client certificate + private key
resp = http.get(url, cert="/path/to/client.crt", key="/path/to/client.key")
# With custom CA (for self-signed server certificates)
resp = http.get(url,
cert="/path/to/client.crt",
key="/path/to/client.key",
ca="/path/to/ca.crt",
)
# Combine with other kwargs
resp = http.post(url,
json=payload,
cert="client.crt",
key="client.key",
timeout=30,
)
certandkeymust be provided togethercais optional — use it when the server uses a self-signed or private CA certificate- All paths are PEM-encoded files
Example: Paginated API
Typical ingestion loop:
# @kind: append
token = oauth.token(
token_url="https://auth.example.com/token",
client_id=env.get("CLIENT_ID"),
client_secret=env.get("CLIENT_SECRET"),
)
page = 1
while True:
resp = http.get(
"https://api.example.com/users",
headers={"Authorization": "Bearer " + token.access_token},
params={"page": str(page)},
retry=3,
backoff=2,
)
if not resp.ok:
fail("API error: " + str(resp.status_code))
if len(resp.json["data"]) == 0:
break
for user in resp.json["data"]:
save.row({
"id": user["id"],
"name": user["name"],
"email": user["email"],
})
page = page + 1
sleep(0.1) # Rate limiting
Example: HMAC Request Signing
Some APIs (AWS, Stripe webhooks, payment providers) require HMAC-signed requests. Use the crypto module to compute signatures.
# @kind: append
payload = json.encode({"event": "payment", "amount": 100})
timestamp = str(int(time.now().unix))
# Sign the request
signature = crypto.hmac_sha256(
env.get("API_SECRET"),
timestamp + "." + payload,
)
resp = http.post("https://api.example.com/webhook",
json=json.decode(payload),
headers={
"X-Signature": signature,
"X-Timestamp": timestamp,
},
)
Integrated with the Pipeline
Data fetched via http is:
- Buffered durably
- Validated via constraints and audits
- Versioned in DuckLake
- Tracked in lineage
No separate ingestion layer required.
Compared to Traditional Ingestion
Traditional approach:
- Python scripts with requests
- Airbyte connectors
- Separate orchestration
OndatraSQL:
- HTTP client built into the runtime
- Runs inside your pipeline
- No extra services or languages
Ondatra Labs