catalog.sql

Where your data lives — the only required configuration

Catalog — where your data lives

Phase: Catalog | Order: 4 | Required: Yes

The catalog is where your data lives.

OndatraSQL writes tables to DuckLake — a versioned storage layer built on Parquet. This file tells the runtime where to store metadata and data files.

This is the only required configuration.

You don’t provision a warehouse. You point the runtime at a location.

OndatraSQL handles:

Table creation
Schema evolution
Snapshot management
Change detection

You only define where the data should live.

Mental Model

The catalog stores metadata (tables, schemas, snapshots)
The data path stores Parquet files
Together, they form your warehouse

Recommended Backends

If you’re unsure, use SQLite.

SQLite (default)

Zero configuration. Embedded. Works out of the box.

This is the default and the right choice for most use cases — including production on a single machine.

-- Local storage (default — created by ondatrasql init)
ATTACH 'ducklake:sqlite:ducklake.sqlite' AS lake
    (DATA_PATH 'ducklake.sqlite.files');

-- Cloud storage (SQLite catalog + S3 data)
-- Requires: httpfs in extensions.sql
ATTACH 'ducklake:sqlite:ducklake.sqlite' AS lake
    (DATA_PATH 's3://my-bucket/data/');

PostgreSQL

Use this when multiple users or systems need to access the same catalog.

The data remains in DuckLake — PostgreSQL only stores metadata. This allows external tools (Metabase, Grafana, psql, pg_duckdb) to query your data without copying it.

-- Multi-user (PostgreSQL catalog + local data)
-- Requires: postgres in extensions.sql, credentials in secrets.sql
ATTACH 'ducklake:postgres:dbname=ducklake_catalog host=localhost' AS lake
    (DATA_PATH '/data/warehouse/');

-- Multi-user (PostgreSQL catalog + S3 data)
ATTACH 'ducklake:postgres:dbname=ducklake_catalog host=db.prod.internal' AS lake
    (DATA_PATH 's3://my-bucket/data/');

Which to choose?

	SQLite	PostgreSQL
Setup	Zero config	Requires a PostgreSQL server
Read access	OndatraSQL only	Any PostgreSQL client (pg_duckdb, Metabase, psql)
Multi-machine	No (local file)	Yes
Performance	Fast (embedded)	Fast (network round-trips for metadata)
Cloud data	Yes (S3/GCS via DATA_PATH)	Yes
Best for	Development, single-machine prod	Shared warehouse, BI tools, team access

DuckLake also supports DuckDB and MySQL as catalog backends, but OndatraSQL only supports SQLite and PostgreSQL. Features like sandbox mode, CDC, and run-type detection are tested and maintained for these two backends only.

Advanced Options

Data Inlining

DuckLake automatically inlines small writes — inserts or deletes affecting fewer than 10 rows are stored directly in the catalog metadata instead of creating new Parquet files. This is enabled by default and requires no configuration.

This is particularly useful for incremental append models that add a few rows per run. Instead of creating a new Parquet file for each run, the data is stored in the catalog until it accumulates enough to flush.

Override the threshold via the DATA_INLINING_ROW_LIMIT option on ATTACH:

-- Increase threshold to 50 rows
ATTACH 'ducklake:sqlite:ducklake.sqlite' AS lake
    (DATA_PATH 'ducklake.sqlite.files', DATA_INLINING_ROW_LIMIT 50);

-- Disable inlining entirely
ATTACH 'ducklake:sqlite:ducklake.sqlite' AS lake
    (DATA_PATH 'ducklake.sqlite.files', DATA_INLINING_ROW_LIMIT 0);

You can also set it per-table after attachment:

CALL lake.set_option('data_inlining_row_limit', 50, table_name => 'staging.events');

Inlined data behaves identically to Parquet data — time travel, queries, and schema evolution all work the same. To manually flush inlined data to Parquet files, use:

CALL ducklake_flush_inlined_data('lake');

The default threshold of 10 rows is a good balance for most workloads. Only change it if you have many models that consistently produce small batches (increase) or if you want all data in Parquet immediately (set to 0).

Encryption

DuckLake can automatically encrypt all Parquet data files using the ENCRYPTED option. Each file gets a unique encryption key, stored in the catalog. Decryption is transparent — queries work identically.

This is useful when storing data on untrusted storage (S3, shared filesystems) where the catalog database is the trusted component.

-- Encrypted local storage
ATTACH 'ducklake:sqlite:ducklake.sqlite' AS lake
    (DATA_PATH 'ducklake.sqlite.files', ENCRYPTED);

-- Encrypted cloud storage
ATTACH 'ducklake:sqlite:ducklake.sqlite' AS lake
    (DATA_PATH 's3://my-bucket/data/', ENCRYPTED);

-- Encrypted multi-user (PostgreSQL catalog controls key access)
ATTACH 'ducklake:postgres:dbname=ducklake_catalog host=localhost' AS lake
    (DATA_PATH 's3://my-bucket/data/', ENCRYPTED);

Encryption keys are stored in the catalog database. Anyone with catalog access can decrypt the data. Use this to protect data at rest on untrusted storage — not as a substitute for catalog access control.