Blueprints Blog Contact About

HTTP Module

Built-in HTTP client for API ingestion

Fetch data from APIs directly inside your pipeline.

No Python. No requests. No separate ingestion tool.

The http module is built into OndatraSQL and designed for production data ingestion. It is a predeclared global — no import needed.

Why This Exists

Most data stacks require a separate ingestion layer:

  • Python scripts
  • Airbyte connectors
  • Custom services

With OndatraSQL, ingestion is part of the pipeline. You fetch data, transform it, and materialize it — in one runtime.

Mental Model

  • http.get() fetches data
  • save.row() emits rows
  • OndatraSQL handles storage, validation, and materialization

You don’t write scripts around APIs — you write models.

Production-Ready by Default

  • Automatic retries with exponential backoff
  • Rate limit handling (Retry-After)
  • OAuth, Basic, and Digest authentication
  • mTLS support (client certificates)
  • Built-in timeout and error handling

HTTP Methods

All methods follow the same pattern:

http.get(url, ...)
http.post(url, ...)
http.put(url, ...)
http.patch(url, ...)
http.delete(url, ...)

GET

resp = http.get(url)
resp = http.get(url, headers={"Authorization": "Bearer " + token})
resp = http.get(url, params={"page": "1", "limit": "50"})
resp = http.get(url, headers=headers, timeout=60, retry=3)

POST

# JSON body (sets Content-Type: application/json automatically)
resp = http.post(url, json={"key": "value"})

# Form data (sets Content-Type: application/x-www-form-urlencoded automatically)
resp = http.post(url, data={"key": "value"})

# With headers and options
resp = http.post(url, json=payload, headers=headers, timeout=60)

PUT / PATCH / DELETE

resp = http.put(url, json=payload, headers=headers)
resp = http.patch(url, json=patch)
resp = http.delete(url, headers=headers)

json= and data= are mutually exclusive — use one or the other.

Response Object

FieldTypeDescription
status_codeintHTTP status code
textstringRaw response body
okboolTrue if status code is 200–299
jsonanyParsed JSON, or None if the response is not JSON
headersdictResponse headers

Options

KwargDefaultDescription
headersNoneRequest headers dict
paramsNoneQuery parameters dict (appended to URL)
jsonNoneJSON body (dict, list, or string)
dataNoneForm-encoded body (dict)
timeout30Request timeout in seconds
retry0Number of retry attempts
backoff1Initial backoff between retries in seconds
authNoneAuthentication tuple — see Authentication
certNonePath to client certificate file (PEM) — see mTLS
keyNonePath to client private key file (PEM)
caNonePath to custom CA certificate file (PEM)

Retry and Backoff

APIs fail. OndatraSQL retries automatically.

  • Retries on 429 and 5xx errors
  • Exponential backoff with jitter
  • Respects Retry-After headers

When the server sends a Retry-After header (on 429 or 503), OndatraSQL respects it — both integer seconds and HTTP-date (RFC 1123) formats are supported. The Retry-After value overrides the calculated backoff for that retry.

# Retry up to 3 times with 2s initial backoff (2s, 4s, 8s + jitter)
resp = http.get(url, retry=3, backoff=2)

Authentication

Basic Auth

# Basic auth (default scheme)
resp = http.get(url, auth=("username", "password"))
resp = http.get(url, auth=("username", "password", "basic"))

Digest Auth

HTTP Digest Authentication (RFC 7616) is a challenge-response scheme. The client first receives a 401 with a nonce from the server, then re-sends the request with a computed hash. OndatraSQL handles this automatically.

# Digest auth — the challenge-response is handled transparently
resp = http.get(url, auth=("username", "password", "digest"))

Supports MD5 and SHA-256 algorithms with qop=auth.

When using digest auth, at least one retry is required for the challenge-response flow. OndatraSQL automatically sets retry=1 if it’s not already >= 1.

mTLS (Mutual TLS)

For APIs that require client certificate authentication (PSD2/Open Banking, government APIs, enterprise service-to-service), use the cert and key kwargs.

# Client certificate + private key
resp = http.get(url, cert="/path/to/client.crt", key="/path/to/client.key")

# With custom CA (for self-signed server certificates)
resp = http.get(url,
    cert="/path/to/client.crt",
    key="/path/to/client.key",
    ca="/path/to/ca.crt",
)

# Combine with other kwargs
resp = http.post(url,
    json=payload,
    cert="client.crt",
    key="client.key",
    timeout=30,
)
  • cert and key must be provided together
  • ca is optional — use it when the server uses a self-signed or private CA certificate
  • All paths are PEM-encoded files

Example: Paginated API

Typical ingestion loop:

# @kind: append

token = oauth.token(
    token_url="https://auth.example.com/token",
    client_id=env.get("CLIENT_ID"),
    client_secret=env.get("CLIENT_SECRET"),
)

page = 1
while True:
    resp = http.get(
        "https://api.example.com/users",
        headers={"Authorization": "Bearer " + token.access_token},
        params={"page": str(page)},
        retry=3,
        backoff=2,
    )

    if not resp.ok:
        fail("API error: " + str(resp.status_code))

    if len(resp.json["data"]) == 0:
        break

    for user in resp.json["data"]:
        save.row({
            "id": user["id"],
            "name": user["name"],
            "email": user["email"],
        })

    page = page + 1
    sleep(0.1)  # Rate limiting

Example: HMAC Request Signing

Some APIs (AWS, Stripe webhooks, payment providers) require HMAC-signed requests. Use the crypto module to compute signatures.

# @kind: append

payload = json.encode({"event": "payment", "amount": 100})
timestamp = str(int(time.now().unix))

# Sign the request
signature = crypto.hmac_sha256(
    env.get("API_SECRET"),
    timestamp + "." + payload,
)

resp = http.post("https://api.example.com/webhook",
    json=json.decode(payload),
    headers={
        "X-Signature": signature,
        "X-Timestamp": timestamp,
    },
)

Integrated with the Pipeline

Data fetched via http is:

  • Buffered durably
  • Validated via constraints and audits
  • Versioned in DuckLake
  • Tracked in lineage

No separate ingestion layer required.

Compared to Traditional Ingestion

Traditional approach:

  • Python scripts with requests
  • Airbyte connectors
  • Separate orchestration

OndatraSQL:

  • HTTP client built into the runtime
  • Runs inside your pipeline
  • No extra services or languages