Blueprints Blog Contact About

DuckDB Configuration

How the runtime boots — DuckDB configuration via SQL files

OndatraSQL prepares DuckDB before running your pipeline. Everything in config/ is executed as SQL — no YAML, no config DSL.

This is where you configure DuckDB, define credentials, attach your catalog, and connect external sources.

Mental Model

The runtime loads configuration in two phases:

Before the catalog is attached → configure DuckDB itself

After the catalog is attached → define logic that depends on your data

Loading Order

Pre-catalog:

FileWhat it does
settings.sqlConfigure DuckDB (memory, threads, etc.)
secrets.sqlCredentials — not stored in data, only used at runtime
extensions.sqlLoad DuckDB extensions

Catalog:

FileWhat it does
catalog.sqlAttach DuckLake — without this, there’s no database to write to

Post-catalog:

FileWhat it does
macros.sqlReusable SQL logic
variables.sqlRuntime variables
sources.sqlExternal database connections

All files except catalog.sql are optional. Missing files are silently skipped.

Schemas are created automatically. Organize models into directories — models/staging/orders.sql creates staging schema on first run.

Common Patterns

Connect to PostgreSQL:

-- config/sources.sql
ATTACH 'postgresql://${PG_USER}:${PG_PASSWORD}@${PG_HOST}/db' AS source (READ_ONLY);

Add a reusable macro:

-- config/macros.sql
CREATE MACRO is_active(status) AS status = 'active';

Use cloud storage:

-- config/secrets.sql
CREATE SECRET aws_chain (TYPE S3, PROVIDER credential_chain);

Set DuckDB memory limit:

-- config/settings.sql
SET memory_limit = '4GB';
SET threads = 4;

Built-in Runtime

OndatraSQL automatically provides variables and macros after config loads. No setup required. These power change detection, lineage, and incremental execution.

Variables

VariableDescription
ondatra_run_timeWhen the current run started (UTC)
ondatra_load_idUnique identifier for this run
curr_snapshotCurrent DuckLake snapshot ID
prev_snapshotPrevious snapshot ID (for CDC)
dag_start_snapshotSnapshot at DAG start (consistent CDC across models)

Macros

Runtime

ondatra_now() · ondatra_load_id() · last_snapshot() · prev_snapshot() · dag_start_snapshot()

Change Data Capture

ondatra_cdc_changes() · ondatra_cdc_deletes() · ondatra_cdc_updates_before() · ondatra_cdc_summary() · ondatra_cdc_empty()

Schema & Metadata

ondatra_table_exists() · ondatra_column_exists() · ondatra_column_type() · ondatra_get_columns() · ondatra_schema_hash()

Data Quality

ondatra_row_count() · ondatra_has_rows() · ondatra_min_value() · ondatra_max_value() · ondatra_distinct_count() · ondatra_null_count() · ondatra_duplicate_count() · ondatra_table_diff() · ondatra_compare_tables()

Metadata

ondatra_get_sql_hash() · ondatra_get_commit_info() · ondatra_get_downstream()

Editing

ondatrasql edit settings
ondatrasql edit secrets
ondatrasql edit catalog
ondatrasql edit macros
ondatrasql edit variables
ondatrasql edit sources

Sandbox Mode

In sandbox mode, catalog.sql is skipped. Instead:

  • Production catalog → attached read-only
  • Sandbox catalog → attached writable

Models read from production, write to sandbox. All other config files load normally.