Lib Functions

On this page

Your models are SQL. Your API transport is Starlark. They live in different directories because they solve different problems, and keeping them separate is what makes everything else work.

Why the separation matters

SQL is great at transforming data — joins, aggregations, filters, window functions. But it can’t make HTTP requests, handle pagination, refresh OAuth tokens, or retry on failure. Those are imperative concerns that need loops, conditionals, and error handling.

By keeping your models as pure SQL and putting transport in lib/, the runtime can analyze your models statically. It extracts dependencies from FROM and JOIN clauses, detects changes via DuckLake snapshots, rewrites queries for incremental processing, and traces column-level lineage. None of that would work if your models contained arbitrary code.

The tradeoff: you need two languages. But they’re both simple — SQL you already know, and Starlark is a Python subset you’ll pick up in an hour.

How it works

A lib function is a .star file in lib/ that declares an API dict. The file name becomes the function name in SQL. One file can handle both inbound (fetch) and outbound (push), sharing the same auth, headers, retry, and rate limit config.

# lib/hubspot.star
API = {
    "base_url": "https://api.hubapi.com",
    "auth": {"provider": "hubspot"},
    "fetch": {"args": ["object_type"], "page_size": 100},
    "push": {"batch_size": 100, "batch_mode": "sync"},
}

Inbound: your SQL writes FROM hubspot('contacts'), and the runtime calls fetch() with managed pagination.

Outbound: your SQL writes @sink: hubspot, and the runtime detects what changed and calls push() per batch.

You don’t register anything. Drop a .star file in lib/ and it’s available. The runtime scans lib/ at startup, validates the dict, and registers DuckDB macros so the SQL parser accepts FROM func_name(...).

Three layers

Layer	Responsibility	Location
API dict	Configuration — auth, endpoints, rate limits, pagination	`lib/*.star` (top-level dict)
Starlark	I/O logic — API calls, response parsing, request building	`lib/*.star` (functions)
SQL	Transformation — joins, aggregations, casts, pivoting	`models/*.sql`

No layer does another layer’s job. Blueprints have no SQL access — common operations are exposed as builtins that run DuckDB under the hood.

What the runtime handles for you

When you declare config in the API dict, the runtime injects it into every http.* call your function makes. You don’t set auth headers manually, you don’t implement retry logic, you don’t write a pagination loop. The runtime does that.

The runtime also:

Extracts typed columns from your SQL SELECT and passes them as kwargs
Filters kwargs to match your function signature — declare only what you need
Passes incremental state (is_backfill, last_value, etc.) as kwargs
Handles async polling for report-style APIs (submit → poll → fetch results)

This means your fetch() function focuses on the API-specific logic — how to parse the response, what the next page cursor looks like, how to map fields. Everything else is handled.

Lib Functions

Why the separation matters

How it works

Three layers

What the runtime handles for you

Further reading