HTTP Module

Built-in HTTP client for API ingestion

Fetch data from APIs directly inside your pipeline.

No Python. No requests. No separate ingestion tool.

The http module is built into OndatraSQL and designed for production data ingestion. It is a predeclared global — no import needed.

Why This Exists

Most data stacks require a separate ingestion layer:

Python scripts
Airbyte connectors
Custom services

With OndatraSQL, ingestion is part of the pipeline. You fetch data, transform it, and materialize it — in one runtime.

Mental Model

http.get() fetches data
save.row() emits rows
OndatraSQL handles storage, validation, and materialization

You don’t write scripts around APIs — you write models.

Production-Ready by Default

Automatic retries with exponential backoff
Rate limit handling (Retry-After)
OAuth, Basic, and Digest authentication
mTLS support (client certificates)
Built-in timeout and error handling

HTTP Methods

All methods follow the same pattern:

http.get(url, ...)
http.post(url, ...)
http.put(url, ...)
http.patch(url, ...)
http.delete(url, ...)

GET

resp = http.get(url)
resp = http.get(url, headers={"Authorization": "Bearer " + token})
resp = http.get(url, params={"page": "1", "limit": "50"})
resp = http.get(url, headers=headers, timeout=60, retry=3)

POST

# JSON body (sets Content-Type: application/json automatically)
resp = http.post(url, json={"key": "value"})

# Form data (sets Content-Type: application/x-www-form-urlencoded automatically)
resp = http.post(url, data={"key": "value"})

# With headers and options
resp = http.post(url, json=payload, headers=headers, timeout=60)

PUT / PATCH / DELETE

resp = http.put(url, json=payload, headers=headers)
resp = http.patch(url, json=patch)
resp = http.delete(url, headers=headers)

json= and data= are mutually exclusive — use one or the other.

Response Object

Field	Type	Description
`status_code`	int	HTTP status code
`text`	string	Raw response body
`ok`	bool	`True` if status code is 200–299
`json`	any	Parsed JSON, or `None` if the response is not JSON
`headers`	dict	Response headers

Options

Kwarg	Default	Description
`headers`	`None`	Request headers dict
`params`	`None`	Query parameters dict (appended to URL)
`json`	`None`	JSON body (dict, list, or string)
`data`	`None`	Form-encoded body (dict)
`timeout`	`30`	Request timeout in seconds
`retry`	`0`	Number of retry attempts
`backoff`	`1`	Initial backoff between retries in seconds
`auth`	`None`	Authentication tuple — see Authentication
`cert`	`None`	Path to client certificate file (PEM) — see mTLS
`key`	`None`	Path to client private key file (PEM)
`ca`	`None`	Path to custom CA certificate file (PEM)

Retry and Backoff

APIs fail. OndatraSQL retries automatically.

Retries on 429 and 5xx errors
Exponential backoff with jitter
Respects Retry-After headers

When the server sends a Retry-After header (on 429 or 503), OndatraSQL respects it — both integer seconds and HTTP-date (RFC 1123) formats are supported. The Retry-After value overrides the calculated backoff for that retry.

# Retry up to 3 times with 2s initial backoff (2s, 4s, 8s + jitter)
resp = http.get(url, retry=3, backoff=2)

Authentication

Basic Auth

# Basic auth (default scheme)
resp = http.get(url, auth=("username", "password"))
resp = http.get(url, auth=("username", "password", "basic"))

Digest Auth

HTTP Digest Authentication (RFC 7616) is a challenge-response scheme. The client first receives a 401 with a nonce from the server, then re-sends the request with a computed hash. OndatraSQL handles this automatically.

# Digest auth — the challenge-response is handled transparently
resp = http.get(url, auth=("username", "password", "digest"))

Supports MD5 and SHA-256 algorithms with qop=auth.

When using digest auth, at least one retry is required for the challenge-response flow. OndatraSQL automatically sets retry=1 if it’s not already >= 1.

mTLS (Mutual TLS)

For APIs that require client certificate authentication (PSD2/Open Banking, government APIs, enterprise service-to-service), use the cert and key kwargs.

# Client certificate + private key
resp = http.get(url, cert="/path/to/client.crt", key="/path/to/client.key")

# With custom CA (for self-signed server certificates)
resp = http.get(url,
    cert="/path/to/client.crt",
    key="/path/to/client.key",
    ca="/path/to/ca.crt",
)

# Combine with other kwargs
resp = http.post(url,
    json=payload,
    cert="client.crt",
    key="client.key",
    timeout=30,
)

cert and key must be provided together
ca is optional — use it when the server uses a self-signed or private CA certificate
All paths are PEM-encoded files

Example: Paginated API

Typical ingestion loop:

# @kind: append

token = oauth.token(
    token_url="https://auth.example.com/token",
    client_id=env.get("CLIENT_ID"),
    client_secret=env.get("CLIENT_SECRET"),
)

page = 1
while True:
    resp = http.get(
        "https://api.example.com/users",
        headers={"Authorization": "Bearer " + token.access_token},
        params={"page": str(page)},
        retry=3,
        backoff=2,
    )

    if not resp.ok:
        fail("API error: " + str(resp.status_code))

    if len(resp.json["data"]) == 0:
        break

    for user in resp.json["data"]:
        save.row({
            "id": user["id"],
            "name": user["name"],
            "email": user["email"],
        })

    page = page + 1
    sleep(0.1)  # Rate limiting

Example: HMAC Request Signing

Some APIs (AWS, Stripe webhooks, payment providers) require HMAC-signed requests. Use the crypto module to compute signatures.

# @kind: append

payload = json.encode({"event": "payment", "amount": 100})
timestamp = str(int(time.now().unix))

# Sign the request
signature = crypto.hmac_sha256(
    env.get("API_SECRET"),
    timestamp + "." + payload,
)

resp = http.post("https://api.example.com/webhook",
    json=json.decode(payload),
    headers={
        "X-Signature": signature,
        "X-Timestamp": timestamp,
    },
)

Integrated with the Pipeline

Data fetched via http is:

Buffered durably
Validated via constraints and audits
Versioned in DuckLake
Tracked in lineage

No separate ingestion layer required.

Compared to Traditional Ingestion

Traditional approach:

Python scripts with requests
Airbyte connectors
Separate orchestration

OndatraSQL:

HTTP client built into the runtime
Runs inside your pipeline
No extra services or languages