Getting Started

On this page

This guide walks through creating and running a data pipeline with OndatraSQL.

1. Install

curl -fsSL https://ondatra.sh/install.sh | sh

OndatraSQL supports Linux, macOS, and Windows via WSL2. The installation produces a single binary with no external dependencies.

2. Create a project

mkdir my-pipeline && cd my-pipeline
ondatrasql init

Initialized project "my-pipeline"

3. Ingest data

Create a blueprint in lib/ and a SQL model:

mkdir -p lib

# lib/rest_countries.star

API = {
    "base_url": "https://restcountries.com",
    "fetch": {
        "args": ["region"],
    },
}

def fetch(region, page):
    resp = http.get("/v3.1/region/" + region)
    rows = [{"name": c["name"]["common"], "capital": c["capital"][0], "population": c["population"]} for c in resp.json]
    return {"rows": rows, "next": None}

ondatrasql new raw.countries
ondatrasql edit raw.countries

-- @kind: table

SELECT * FROM rest_countries('europe')

The file path determines the target table: raw.countries.

4. Transform with SQL

ondatrasql new staging.countries
ondatrasql edit staging.countries

-- @kind: table
-- @constraint: not_null(name)
-- @constraint: compare(population, >, 0)

SELECT
    name, capital, population,
    CASE
        WHEN population > 50000000 THEN 'large'
        WHEN population > 10000000 THEN 'medium'
        ELSE 'small'
    END AS size
FROM raw.countries

5. Aggregate into a mart

ondatrasql new mart.population
ondatrasql edit mart.population

-- @kind: table
-- @audit: row_count(>, 0)

SELECT size, COUNT(*) AS countries, SUM(population)::BIGINT AS total_pop
FROM staging.countries
GROUP BY size

6. Run the pipeline

ondatrasql run

Running 3 models...
[OK] raw.countries      (table, backfill, 53 rows, 1.1s)
[OK] staging.countries  (table, backfill, 53 rows, 250ms, first run)
[OK] mart.population    (table, backfill, 3 rows, 200ms, first run)

Done: 3 ran, 0 skipped, 0 failed (109 rows, 1.6s)

7. Query the results

ondatrasql sql "SELECT * FROM mart.population"

| size   | countries | total_pop |
| ------ | --------- | --------- |
| large  | 5         | 424080603 |
| medium | 10        | 211169761 |
| small  | 38        | 106407558 |

Additional capabilities

Serve to BI tools: add @expose and connect Power BI, Excel, or Grafana via OData v4
Outbound sync: push data to APIs with @sink and automatic change detection
Ingest from APIs: blueprints with HTTP, OAuth, and pagination
Collect events: POST to an embedded endpoint with in-memory and batch buffering
Validate data: constraints, audits, and warnings with atomic rollback
Track lineage: column-level lineage extracted from SQL AST