Language Reference

A simpler alternative to Python for pipelines

OndatraSQL scripts use Starlark — a Python-like language designed for data pipelines.

No packages. No virtual environments. No runtime setup.

If you know Python, you can write pipelines immediately.

Why Starlark?

Starlark is a restricted, deterministic subset of Python.

No hidden state
No external dependencies
No runtime environment

This makes pipelines reproducible and easy to reason about.

Differences from Python

Starlark is intentionally simpler:

Python	Starlark
`import module`	`load("lib/module.star", "func")`
`try / except`	Not available — errors stop the script
`class Foo:`	Not available — use dicts and functions
`with open(f):`	Not available — no file I/O
`for c in "hello":`	`for c in "hello".elems():`
`2 ** 10`	`math.pow(2, 10)`
`yield / async`	Not available
`global / nonlocal`	Not available
`x is y`	Use `==`

This keeps pipelines predictable and portable.

In OndatraSQL

Starlark is used for:

Data ingestion (HTTP APIs, files)
Custom logic between models
Reverse ETL workflows

It runs inside the OndatraSQL runtime — with access to:

http (API calls)
query() (read from DuckDB)
save (write output)
incremental (pipeline state)

Example: A Simple Pipeline Script

# @kind: append

resp = http.get("https://api.example.com/users")

for user in resp.json:
    save.row({
        "id": user["id"],
        "name": user["name"],
    })

This fetches data from an API, transforms it, and writes it into your pipeline. All in one script.

Basics

Everything works like Python.

# Numbers
count = 42
rate = 0.95

# Strings
name = "Alice"
query = 'SELECT * FROM users'

# Booleans
active = True
deleted = False

# None
result = None

# Lists (mutable)
items = [1, 2, 3]
items.append(4)

# Tuples (immutable)
pair = (1, 2)

# Dicts (mutable, ordered)
user = {"name": "Alice", "age": 30}

# Sets (mutable)
tags = set(["a", "b", "c"])

Control Flow

# If / elif / else
if resp.ok:
    process(resp.json)
elif resp.status_code == 429:
    sleep(1)
else:
    fail("Error: " + str(resp.status_code))

# For loops
for item in items:
    save.row(item)

for i, item in enumerate(items):
    print(i, item)

for i in range(10):
    print(i)

# While loops
page = 1
while True:
    resp = http.get(url + "?page=" + str(page))
    if len(resp.json) == 0:
        break
    for item in resp.json:
        save.row(item)
    page += 1

Functions and Lambdas

def fetch_page(url, page=1, limit=50):
    return http.get(
        url + "?page=" + str(page) + "&limit=" + str(limit),
    )

# Lambda expressions
double = lambda x: x * 2
sorted(users, key=lambda u: u["name"])

load() — Shared Libraries

Import functions and values from other Starlark files using load():

load("lib/pagination.star", "paginate", "DEFAULT_PAGE_SIZE")
load("lib/auth.star", "get_token")

token = get_token()
for page in paginate(url, page_size=DEFAULT_PAGE_SIZE):
    process(page)

Syntax

load("path/to/module.star", "name1", "name2")       # import by name
load("path/to/module.star", alias="original_name")   # import with alias

Paths are relative to the project root. Only files inside the project directory can be loaded — path traversal and symlinks pointing outside the project are rejected.

Caching

Each module is executed once per model run. Multiple load() calls — including from different nested modules — share the cached result.

Library Scope

Loaded modules have access to all built-in modules (http, oauth, json, etc.) except save and incremental. Functions that need save should accept it as a parameter:

# lib/fetcher.star
def fetch_users(save, url):
    resp = http.get(url)
    for user in resp.json:
        save.row(user)

This is the pattern used by YAML models, where save is passed automatically.

String Formatting

# %-operator
msg = "Fetched %d rows from %s" % (count, source)
price = "Total: $%.2f" % amount

# .format() method
msg = "Page {} of {}".format(page, total)
msg = "Page {page} of {total}".format(page=1, total=10)

Format specifiers: %s (string), %d (integer), %f (float), %x (hex), %o (octal).

Comprehensions

# List comprehension
ids = [item["id"] for item in resp.json]
active = [u for u in users if u["status"] == "active"]

# Dict comprehension
lookup = {row.id: row.name for row in duckdb.sql("SELECT id, name FROM t")}

# Nested
flat = [cell for row in matrix for cell in row]

Built-in Functions

Function	Description
`abs(x)`	Absolute value
`all(x)`	`True` if all elements are truthy
`any(x)`	`True` if any element is truthy
`bool(x)`	Convert to boolean
`chr(i)`	Unicode code point to string
`dict(pairs)`	Create dict
`dir(x)`	List attributes/methods
`enumerate(x)`	Yields `(index, value)` pairs
`float(x)`	Convert to float
`getattr(x, name)`	Get attribute by name
`hasattr(x, name)`	Check if attribute exists
`hash(x)`	Hash a string
`int(x)`	Convert to int
`len(x)`	Number of elements
`list(x)`	Create list from iterable
`max(x)`	Largest value
`min(x)`	Smallest value
`ord(s)`	String to Unicode code point
`print(*args)`	Print to stderr
`range(n)`	Sequence of integers
`repr(x)`	Debug representation
`reversed(x)`	Reversed sequence
`set(x)`	Create set
`sorted(x, key?, reverse?)`	Sorted list
`str(x)`	Convert to string
`tuple(x)`	Create tuple
`type(x)`	Type name as string
`zip(a, b)`	Zip two sequences

String Methods

Method	Description
`s.capitalize()`	First letter uppercase
`s.count(sub)`	Count occurrences
`s.endswith(suffix)`	Check suffix
`s.find(sub)`	Find index, -1 if not found
`s.format(args, *kwargs)`	Format string
`s.index(sub)`	Find index, error if not found
`s.isalnum()`	Alphanumeric only?
`s.isalpha()`	Letters only?
`s.isdigit()`	Digits only?
`s.islower()`	Lowercase only?
`s.isspace()`	Whitespace only?
`s.istitle()`	Title case?
`s.isupper()`	Uppercase only?
`s.join(iterable)`	Join strings
`s.lower()`	To lowercase
`s.lstrip(chars?)`	Strip left
`s.upper()`	To uppercase
`s.strip(chars?)`	Strip both sides
`s.rstrip(chars?)`	Strip right
`s.replace(old, new)`	Replace substring
`s.split(sep?)`	Split string
`s.splitlines()`	Split on newlines
`s.startswith(prefix)`	Check prefix
`s.removeprefix(prefix)`	Remove prefix
`s.removesuffix(suffix)`	Remove suffix
`s.partition(sep)`	Split into 3 parts
`s.rpartition(sep)`	Split into 3 parts from right
`s.title()`	Title Case
`s.elems()`	Iterate bytes
`s.codepoints()`	Iterate unicode characters

Note: Strings are not directly iterable. Use s.elems() or s.codepoints() to iterate over characters.

List Methods

Method	Description
`l.append(x)`	Add element
`l.clear()`	Remove all elements
`l.extend(iterable)`	Add all elements from iterable
`l.index(x)`	Find index
`l.insert(i, x)`	Insert at position
`l.pop(i?)`	Remove and return element
`l.remove(x)`	Remove first occurrence

Supports slicing: l[1:3], l[::-1], l[::2].

Dict Methods

Method	Description
`d.clear()`	Remove all entries
`d.get(key, default?)`	Get with fallback
`d.items()`	List of `(key, value)` tuples
`d.keys()`	List of keys
`d.pop(key, default?)`	Remove and return value
`d.popitem()`	Remove and return first entry
`d.setdefault(key, default?)`	Get or set default
`d.update(pairs)`	Merge entries
`d.values()`	List of values

Set Methods

Method	Description
`s.add(x)`	Add element
`s.clear()`	Remove all elements
`s.discard(x)`	Remove if present
`s.remove(x)`	Remove, error if missing
`s.pop()`	Remove and return an element
`s.union(other)`	Union (`s \| other`)
`s.intersection(other)`	Intersection (`s & other`)
`s.difference(other)`	Difference
`s.symmetric_difference(other)`	Symmetric difference
`s.issubset(other)`	Subset check
`s.issuperset(other)`	Superset check

You don’t need more than this. Combined with the built-in runtime modules, this is enough to build complete data pipelines.