Blueprints Blog Contact About

Language Reference

A simpler alternative to Python for pipelines

OndatraSQL scripts use Starlark — a Python-like language designed for data pipelines.

No packages. No virtual environments. No runtime setup.

If you know Python, you can write pipelines immediately.

Why Starlark?

Starlark is a restricted, deterministic subset of Python.

  • No hidden state
  • No external dependencies
  • No runtime environment

This makes pipelines reproducible and easy to reason about.

Differences from Python

Starlark is intentionally simpler:

PythonStarlark
import moduleload("lib/module.star", "func")
try / exceptNot available — errors stop the script
class Foo:Not available — use dicts and functions
with open(f):Not available — no file I/O
for c in "hello":for c in "hello".elems():
2 ** 10math.pow(2, 10)
yield / asyncNot available
global / nonlocalNot available
x is yUse ==

This keeps pipelines predictable and portable.

In OndatraSQL

Starlark is used for:

  • Data ingestion (HTTP APIs, files)
  • Custom logic between models
  • Reverse ETL workflows

It runs inside the OndatraSQL runtime — with access to:

  • http (API calls)
  • query() (read from DuckDB)
  • save (write output)
  • incremental (pipeline state)

Example: A Simple Pipeline Script

# @kind: append

resp = http.get("https://api.example.com/users")

for user in resp.json:
    save.row({
        "id": user["id"],
        "name": user["name"],
    })

This fetches data from an API, transforms it, and writes it into your pipeline. All in one script.

Basics

Everything works like Python.

# Numbers
count = 42
rate = 0.95

# Strings
name = "Alice"
query = 'SELECT * FROM users'

# Booleans
active = True
deleted = False

# None
result = None

# Lists (mutable)
items = [1, 2, 3]
items.append(4)

# Tuples (immutable)
pair = (1, 2)

# Dicts (mutable, ordered)
user = {"name": "Alice", "age": 30}

# Sets (mutable)
tags = set(["a", "b", "c"])

Control Flow

# If / elif / else
if resp.ok:
    process(resp.json)
elif resp.status_code == 429:
    sleep(1)
else:
    fail("Error: " + str(resp.status_code))

# For loops
for item in items:
    save.row(item)

for i, item in enumerate(items):
    print(i, item)

for i in range(10):
    print(i)

# While loops
page = 1
while True:
    resp = http.get(url + "?page=" + str(page))
    if len(resp.json) == 0:
        break
    for item in resp.json:
        save.row(item)
    page += 1

Functions and Lambdas

def fetch_page(url, page=1, limit=50):
    return http.get(
        url + "?page=" + str(page) + "&limit=" + str(limit),
    )

# Lambda expressions
double = lambda x: x * 2
sorted(users, key=lambda u: u["name"])

load() — Shared Libraries

Import functions and values from other Starlark files using load():

load("lib/pagination.star", "paginate", "DEFAULT_PAGE_SIZE")
load("lib/auth.star", "get_token")

token = get_token()
for page in paginate(url, page_size=DEFAULT_PAGE_SIZE):
    process(page)

Syntax

load("path/to/module.star", "name1", "name2")       # import by name
load("path/to/module.star", alias="original_name")   # import with alias

Paths are relative to the project root. Only files inside the project directory can be loaded — path traversal and symlinks pointing outside the project are rejected.

Caching

Each module is executed once per model run. Multiple load() calls — including from different nested modules — share the cached result.

Library Scope

Loaded modules have access to all built-in modules (http, oauth, json, etc.) except save and incremental. Functions that need save should accept it as a parameter:

# lib/fetcher.star
def fetch_users(save, url):
    resp = http.get(url)
    for user in resp.json:
        save.row(user)

This is the pattern used by YAML models, where save is passed automatically.

String Formatting

# %-operator
msg = "Fetched %d rows from %s" % (count, source)
price = "Total: $%.2f" % amount

# .format() method
msg = "Page {} of {}".format(page, total)
msg = "Page {page} of {total}".format(page=1, total=10)

Format specifiers: %s (string), %d (integer), %f (float), %x (hex), %o (octal).

Comprehensions

# List comprehension
ids = [item["id"] for item in resp.json]
active = [u for u in users if u["status"] == "active"]

# Dict comprehension
lookup = {row.id: row.name for row in duckdb.sql("SELECT id, name FROM t")}

# Nested
flat = [cell for row in matrix for cell in row]

Built-in Functions

FunctionDescription
abs(x)Absolute value
all(x)True if all elements are truthy
any(x)True if any element is truthy
bool(x)Convert to boolean
chr(i)Unicode code point to string
dict(pairs)Create dict
dir(x)List attributes/methods
enumerate(x)Yields (index, value) pairs
float(x)Convert to float
getattr(x, name)Get attribute by name
hasattr(x, name)Check if attribute exists
hash(x)Hash a string
int(x)Convert to int
len(x)Number of elements
list(x)Create list from iterable
max(x)Largest value
min(x)Smallest value
ord(s)String to Unicode code point
print(*args)Print to stderr
range(n)Sequence of integers
repr(x)Debug representation
reversed(x)Reversed sequence
set(x)Create set
sorted(x, key?, reverse?)Sorted list
str(x)Convert to string
tuple(x)Create tuple
type(x)Type name as string
zip(a, b)Zip two sequences

String Methods

MethodDescription
s.capitalize()First letter uppercase
s.count(sub)Count occurrences
s.endswith(suffix)Check suffix
s.find(sub)Find index, -1 if not found
s.format(*args, **kwargs)Format string
s.index(sub)Find index, error if not found
s.isalnum()Alphanumeric only?
s.isalpha()Letters only?
s.isdigit()Digits only?
s.islower()Lowercase only?
s.isspace()Whitespace only?
s.istitle()Title case?
s.isupper()Uppercase only?
s.join(iterable)Join strings
s.lower()To lowercase
s.lstrip(chars?)Strip left
s.upper()To uppercase
s.strip(chars?)Strip both sides
s.rstrip(chars?)Strip right
s.replace(old, new)Replace substring
s.split(sep?)Split string
s.splitlines()Split on newlines
s.startswith(prefix)Check prefix
s.removeprefix(prefix)Remove prefix
s.removesuffix(suffix)Remove suffix
s.partition(sep)Split into 3 parts
s.rpartition(sep)Split into 3 parts from right
s.title()Title Case
s.elems()Iterate bytes
s.codepoints()Iterate unicode characters

Note: Strings are not directly iterable. Use s.elems() or s.codepoints() to iterate over characters.

List Methods

MethodDescription
l.append(x)Add element
l.clear()Remove all elements
l.extend(iterable)Add all elements from iterable
l.index(x)Find index
l.insert(i, x)Insert at position
l.pop(i?)Remove and return element
l.remove(x)Remove first occurrence

Supports slicing: l[1:3], l[::-1], l[::2].

Dict Methods

MethodDescription
d.clear()Remove all entries
d.get(key, default?)Get with fallback
d.items()List of (key, value) tuples
d.keys()List of keys
d.pop(key, default?)Remove and return value
d.popitem()Remove and return first entry
d.setdefault(key, default?)Get or set default
d.update(pairs)Merge entries
d.values()List of values

Set Methods

MethodDescription
s.add(x)Add element
s.clear()Remove all elements
s.discard(x)Remove if present
s.remove(x)Remove, error if missing
s.pop()Remove and return an element
s.union(other)Union (s | other)
s.intersection(other)Intersection (s & other)
s.difference(other)Difference
s.symmetric_difference(other)Symmetric difference
s.issubset(other)Subset check
s.issuperset(other)Superset check

You don’t need more than this. Combined with the built-in runtime modules, this is enough to build complete data pipelines.