DuckDB Configuration
How the runtime boots — DuckDB configuration via SQL files
OndatraSQL prepares DuckDB before running your pipeline. Everything in config/ is executed as SQL — no YAML, no config DSL.
This is where you configure DuckDB, define credentials, attach your catalog, and connect external sources.
Mental Model
The runtime loads configuration in two phases:
Before the catalog is attached → configure DuckDB itself
After the catalog is attached → define logic that depends on your data
Loading Order
Pre-catalog:
| File | What it does |
|---|---|
settings.sql | Configure DuckDB (memory, threads, etc.) |
secrets.sql | Credentials — not stored in data, only used at runtime |
extensions.sql | Load DuckDB extensions |
Catalog:
| File | What it does |
|---|---|
catalog.sql | Attach DuckLake — without this, there’s no database to write to |
Post-catalog:
| File | What it does |
|---|---|
macros.sql | Reusable SQL logic |
variables.sql | Runtime variables |
sources.sql | External database connections |
All files except catalog.sql are optional. Missing files are silently skipped.
models/staging/orders.sql creates staging schema on first run.Common Patterns
Connect to PostgreSQL:
-- config/sources.sql
ATTACH 'postgresql://${PG_USER}:${PG_PASSWORD}@${PG_HOST}/db' AS source (READ_ONLY);
Add a reusable macro:
-- config/macros.sql
CREATE MACRO is_active(status) AS status = 'active';
Use cloud storage:
-- config/secrets.sql
CREATE SECRET aws_chain (TYPE S3, PROVIDER credential_chain);
Set DuckDB memory limit:
-- config/settings.sql
SET memory_limit = '4GB';
SET threads = 4;
Built-in Runtime
OndatraSQL automatically provides variables and macros after config loads. No setup required. These power change detection, lineage, and incremental execution.
Variables
| Variable | Description |
|---|---|
ondatra_run_time | When the current run started (UTC) |
ondatra_load_id | Unique identifier for this run |
curr_snapshot | Current DuckLake snapshot ID |
prev_snapshot | Previous snapshot ID (for CDC) |
dag_start_snapshot | Snapshot at DAG start (consistent CDC across models) |
Macros
Runtime
ondatra_now() · ondatra_load_id() · last_snapshot() · prev_snapshot() · dag_start_snapshot()
Change Data Capture
ondatra_cdc_changes() · ondatra_cdc_deletes() · ondatra_cdc_updates_before() · ondatra_cdc_summary() · ondatra_cdc_empty()
Schema & Metadata
ondatra_table_exists() · ondatra_column_exists() · ondatra_column_type() · ondatra_get_columns() · ondatra_schema_hash()
Data Quality
ondatra_row_count() · ondatra_has_rows() · ondatra_min_value() · ondatra_max_value() · ondatra_distinct_count() · ondatra_null_count() · ondatra_duplicate_count() · ondatra_table_diff() · ondatra_compare_tables()
Metadata
ondatra_get_sql_hash() · ondatra_get_commit_info() · ondatra_get_downstream()
Editing
ondatrasql edit settings
ondatrasql edit secrets
ondatrasql edit catalog
ondatrasql edit macros
ondatrasql edit variables
ondatrasql edit sources
Sandbox Mode
In sandbox mode, catalog.sql is skipped. Instead:
- Production catalog → attached read-only
- Sandbox catalog → attached writable
Models read from production, write to sandbox. All other config files load normally.
settings.sql
Execution settings — how the runtime uses DuckDB
secrets.sql
Access external data without adding infrastructure
extensions.sql
Optional capabilities for the runtime
catalog.sql
Where your data lives — the only required configuration
macros.sql
Reuse SQL without templating
variables.sql
Configure your pipeline with SQL
sources.sql
No connectors. Just queries.
Ondatra Labs