Language Reference
A simpler alternative to Python for pipelines
OndatraSQL scripts use Starlark — a Python-like language designed for data pipelines.
No packages. No virtual environments. No runtime setup.
If you know Python, you can write pipelines immediately.
Why Starlark?
Starlark is a restricted, deterministic subset of Python.
- No hidden state
- No external dependencies
- No runtime environment
This makes pipelines reproducible and easy to reason about.
Differences from Python
Starlark is intentionally simpler:
| Python | Starlark |
|---|---|
import module | load("lib/module.star", "func") |
try / except | Not available — errors stop the script |
class Foo: | Not available — use dicts and functions |
with open(f): | Not available — no file I/O |
for c in "hello": | for c in "hello".elems(): |
2 ** 10 | math.pow(2, 10) |
yield / async | Not available |
global / nonlocal | Not available |
x is y | Use == |
This keeps pipelines predictable and portable.
In OndatraSQL
Starlark is used for:
- Data ingestion (HTTP APIs, files)
- Custom logic between models
- Reverse ETL workflows
It runs inside the OndatraSQL runtime — with access to:
http(API calls)query()(read from DuckDB)save(write output)incremental(pipeline state)
Example: A Simple Pipeline Script
# @kind: append
resp = http.get("https://api.example.com/users")
for user in resp.json:
save.row({
"id": user["id"],
"name": user["name"],
})
This fetches data from an API, transforms it, and writes it into your pipeline. All in one script.
Basics
Everything works like Python.
# Numbers
count = 42
rate = 0.95
# Strings
name = "Alice"
query = 'SELECT * FROM users'
# Booleans
active = True
deleted = False
# None
result = None
# Lists (mutable)
items = [1, 2, 3]
items.append(4)
# Tuples (immutable)
pair = (1, 2)
# Dicts (mutable, ordered)
user = {"name": "Alice", "age": 30}
# Sets (mutable)
tags = set(["a", "b", "c"])
Control Flow
# If / elif / else
if resp.ok:
process(resp.json)
elif resp.status_code == 429:
sleep(1)
else:
fail("Error: " + str(resp.status_code))
# For loops
for item in items:
save.row(item)
for i, item in enumerate(items):
print(i, item)
for i in range(10):
print(i)
# While loops
page = 1
while True:
resp = http.get(url + "?page=" + str(page))
if len(resp.json) == 0:
break
for item in resp.json:
save.row(item)
page += 1
Functions and Lambdas
def fetch_page(url, page=1, limit=50):
return http.get(
url + "?page=" + str(page) + "&limit=" + str(limit),
)
# Lambda expressions
double = lambda x: x * 2
sorted(users, key=lambda u: u["name"])
load() — Shared Libraries
Import functions and values from other Starlark files using load():
load("lib/pagination.star", "paginate", "DEFAULT_PAGE_SIZE")
load("lib/auth.star", "get_token")
token = get_token()
for page in paginate(url, page_size=DEFAULT_PAGE_SIZE):
process(page)
Syntax
load("path/to/module.star", "name1", "name2") # import by name
load("path/to/module.star", alias="original_name") # import with alias
Paths are relative to the project root. Only files inside the project directory can be loaded — path traversal and symlinks pointing outside the project are rejected.
Caching
Each module is executed once per model run. Multiple load() calls — including from different nested modules — share the cached result.
Library Scope
Loaded modules have access to all built-in modules (http, oauth, json, etc.) except save and incremental. Functions that need save should accept it as a parameter:
# lib/fetcher.star
def fetch_users(save, url):
resp = http.get(url)
for user in resp.json:
save.row(user)
This is the pattern used by YAML models, where save is passed automatically.
String Formatting
# %-operator
msg = "Fetched %d rows from %s" % (count, source)
price = "Total: $%.2f" % amount
# .format() method
msg = "Page {} of {}".format(page, total)
msg = "Page {page} of {total}".format(page=1, total=10)
Format specifiers: %s (string), %d (integer), %f (float), %x (hex), %o (octal).
Comprehensions
# List comprehension
ids = [item["id"] for item in resp.json]
active = [u for u in users if u["status"] == "active"]
# Dict comprehension
lookup = {row.id: row.name for row in duckdb.sql("SELECT id, name FROM t")}
# Nested
flat = [cell for row in matrix for cell in row]
Built-in Functions
| Function | Description |
|---|---|
abs(x) | Absolute value |
all(x) | True if all elements are truthy |
any(x) | True if any element is truthy |
bool(x) | Convert to boolean |
chr(i) | Unicode code point to string |
dict(pairs) | Create dict |
dir(x) | List attributes/methods |
enumerate(x) | Yields (index, value) pairs |
float(x) | Convert to float |
getattr(x, name) | Get attribute by name |
hasattr(x, name) | Check if attribute exists |
hash(x) | Hash a string |
int(x) | Convert to int |
len(x) | Number of elements |
list(x) | Create list from iterable |
max(x) | Largest value |
min(x) | Smallest value |
ord(s) | String to Unicode code point |
print(*args) | Print to stderr |
range(n) | Sequence of integers |
repr(x) | Debug representation |
reversed(x) | Reversed sequence |
set(x) | Create set |
sorted(x, key?, reverse?) | Sorted list |
str(x) | Convert to string |
tuple(x) | Create tuple |
type(x) | Type name as string |
zip(a, b) | Zip two sequences |
String Methods
| Method | Description |
|---|---|
s.capitalize() | First letter uppercase |
s.count(sub) | Count occurrences |
s.endswith(suffix) | Check suffix |
s.find(sub) | Find index, -1 if not found |
s.format(*args, **kwargs) | Format string |
s.index(sub) | Find index, error if not found |
s.isalnum() | Alphanumeric only? |
s.isalpha() | Letters only? |
s.isdigit() | Digits only? |
s.islower() | Lowercase only? |
s.isspace() | Whitespace only? |
s.istitle() | Title case? |
s.isupper() | Uppercase only? |
s.join(iterable) | Join strings |
s.lower() | To lowercase |
s.lstrip(chars?) | Strip left |
s.upper() | To uppercase |
s.strip(chars?) | Strip both sides |
s.rstrip(chars?) | Strip right |
s.replace(old, new) | Replace substring |
s.split(sep?) | Split string |
s.splitlines() | Split on newlines |
s.startswith(prefix) | Check prefix |
s.removeprefix(prefix) | Remove prefix |
s.removesuffix(suffix) | Remove suffix |
s.partition(sep) | Split into 3 parts |
s.rpartition(sep) | Split into 3 parts from right |
s.title() | Title Case |
s.elems() | Iterate bytes |
s.codepoints() | Iterate unicode characters |
Note: Strings are not directly iterable. Use s.elems() or s.codepoints() to iterate over characters.
List Methods
| Method | Description |
|---|---|
l.append(x) | Add element |
l.clear() | Remove all elements |
l.extend(iterable) | Add all elements from iterable |
l.index(x) | Find index |
l.insert(i, x) | Insert at position |
l.pop(i?) | Remove and return element |
l.remove(x) | Remove first occurrence |
Supports slicing: l[1:3], l[::-1], l[::2].
Dict Methods
| Method | Description |
|---|---|
d.clear() | Remove all entries |
d.get(key, default?) | Get with fallback |
d.items() | List of (key, value) tuples |
d.keys() | List of keys |
d.pop(key, default?) | Remove and return value |
d.popitem() | Remove and return first entry |
d.setdefault(key, default?) | Get or set default |
d.update(pairs) | Merge entries |
d.values() | List of values |
Set Methods
| Method | Description |
|---|---|
s.add(x) | Add element |
s.clear() | Remove all elements |
s.discard(x) | Remove if present |
s.remove(x) | Remove, error if missing |
s.pop() | Remove and return an element |
s.union(other) | Union (s | other) |
s.intersection(other) | Intersection (s & other) |
s.difference(other) | Difference |
s.symmetric_difference(other) | Symmetric difference |
s.issubset(other) | Subset check |
s.issuperset(other) | Superset check |
You don’t need more than this. Combined with the built-in runtime modules, this is enough to build complete data pipelines.
Ondatra Labs