Getting Started
On this page
This guide walks through creating and running a data pipeline with OndatraSQL.
1. Install
curl -fsSL https://ondatra.sh/install.sh | sh
OndatraSQL supports Linux, macOS, and Windows via WSL2. The installation produces a single binary with no external dependencies.
2. Create a project
mkdir my-pipeline && cd my-pipeline
ondatrasql init
Initialized project "my-pipeline"
3. Ingest data
Create a blueprint in lib/ and a SQL model:
mkdir -p lib
# lib/rest_countries.star
API = {
"base_url": "https://restcountries.com",
"fetch": {
"args": ["region"],
},
}
def fetch(region, page):
resp = http.get("/v3.1/region/" + region)
rows = [{"name": c["name"]["common"], "capital": c["capital"][0], "population": c["population"]} for c in resp.json]
return {"rows": rows, "next": None}
ondatrasql new raw.countries
ondatrasql edit raw.countries
-- @kind: table
SELECT * FROM rest_countries('europe')
The file path determines the target table: raw.countries.
4. Transform with SQL
ondatrasql new staging.countries
ondatrasql edit staging.countries
-- @kind: table
-- @constraint: not_null(name)
-- @constraint: compare(population, >, 0)
SELECT
name, capital, population,
CASE
WHEN population > 50000000 THEN 'large'
WHEN population > 10000000 THEN 'medium'
ELSE 'small'
END AS size
FROM raw.countries
5. Aggregate into a mart
ondatrasql new mart.population
ondatrasql edit mart.population
-- @kind: table
-- @audit: row_count(>, 0)
SELECT size, COUNT(*) AS countries, SUM(population)::BIGINT AS total_pop
FROM staging.countries
GROUP BY size
6. Run the pipeline
ondatrasql run
Running 3 models...
[OK] raw.countries (table, backfill, 53 rows, 1.1s)
[OK] staging.countries (table, backfill, 53 rows, 250ms, first run)
[OK] mart.population (table, backfill, 3 rows, 200ms, first run)
Done: 3 ran, 0 skipped, 0 failed (109 rows, 1.6s)
7. Query the results
ondatrasql sql "SELECT * FROM mart.population"
| size | countries | total_pop |
| ------ | --------- | --------- |
| large | 5 | 424080603 |
| medium | 10 | 211169761 |
| small | 38 | 106407558 |
Additional capabilities
- Serve to BI tools: add
@exposeand connect Power BI, Excel, or Grafana via OData v4 - Outbound sync: push data to APIs with
@sinkand automatic change detection - Ingest from APIs: blueprints with HTTP, OAuth, and pagination
- Collect events: POST to an embedded endpoint with in-memory and batch buffering
- Validate data: constraints, audits, and warnings with atomic rollback
- Track lineage: column-level lineage extracted from SQL AST
OndatraSQL