Audits
Detect regressions — rolled back automatically if they fail
Audits detect regressions after data is written.
If an audit fails, the transaction is rolled back and the model fails.
Constraints stop bad rows. Audits catch bad results.
Why This Matters
Some data issues can’t be detected row by row.
- Row counts suddenly drop
- Distributions drift
- Freshness breaks
- Aggregates don’t match source systems
These are not row-level problems — they are dataset-level problems. That’s what audits are for.
When Audits Run
Audits run inside the same transaction as the data write.
This means:
- Either everything succeeds
- Or nothing is committed
If any audit fails:
- The transaction is rolled back
- No data is committed
- Downstream models do not run
Usage
Add audits directly in your model:
-- @audit: PATTERN
You can define multiple audits per model.
-- @audit: row_count > 0
-- @audit: row_count_change < 20%
-- @audit: freshness(updated_at, 24h)
-- @audit: reconcile_sum(total, raw.orders.total)
-- @audit: distribution(status) STABLE(0.1)
If any of these fail, the entire model is rolled back.
Constraints vs Audits
Constraints:
- Run before insert
- Validate individual rows
- Prevent bad data from entering
Audits:
- Run after insert
- Validate the dataset as a whole
- Catch regressions and anomalies
Common Patterns
| Pattern | Example | What it catches |
|---|---|---|
row_count > 0 | row_count > 0 | Empty results |
freshness(col, duration) | freshness(updated_at, 24h) | Stale data |
row_count_change < N% | row_count_change < 20% | Sudden drops or spikes |
min(col) OP N | min(price) >= 0 | Out-of-range values |
max(col) OP N | max(price) <= 10000 | Out-of-range values |
reconcile_sum(col, other.col) | reconcile_sum(total, raw.orders.total) | Source/target mismatch |
distribution(col) STABLE(N) | distribution(status) STABLE(0.1) | Category drift |
All 17 Patterns
Row Count
| Pattern | Example | Description |
|---|---|---|
row_count OP N | row_count >= 100 | Check total row count (supports >=, <=, >, <, =) |
row_count_change < N% | row_count_change < 10% | Row count change from previous run |
Freshness
| Pattern | Example | Description |
|---|---|---|
freshness(col, duration) | freshness(updated_at, 24h) | Maximum age of most recent record (supports h for hours, d for days) |
Statistical
| Pattern | Example | Description |
|---|---|---|
mean(col) ... | mean(price) BETWEEN 10 AND 100 | Average value check (supports BETWEEN and comparisons) |
stddev(col) < N | stddev(amount) < 25 | Standard deviation threshold |
min(col) OP N | min(price) >= 0 | Minimum value check |
max(col) OP N | max(price) <= 10000 | Maximum value check |
sum(col) OP N | sum(amount) >= 0 | Aggregate sum check |
zscore(col) < N | zscore(amount) < 3.0 | Statistical outlier detection |
percentile(col, p) OP N | percentile(latency, 0.95) <= 500 | Quantile validation |
Reconciliation
| Pattern | Example | Description |
|---|---|---|
reconcile_count(table) | reconcile_count(raw.orders) | Row counts must match |
reconcile_sum(col, other.col) | reconcile_sum(total, raw.orders.total) | Aggregate sums must match |
Schema
| Pattern | Example | Description |
|---|---|---|
column_exists(col) | column_exists(email) | Verify column exists in output |
column_type(col, type) | column_type(price, DECIMAL(18,2)) | Verify column type (supports precision) |
Data Comparison
| Pattern | Example | Description |
|---|---|---|
golden('path') | golden('tests/expected.csv') | Compare output with golden dataset |
distribution(col) STABLE(threshold) | distribution(status) STABLE(0.1) | Categorical distribution stability |
Full Example
-- @kind: append
-- @incremental: event_time
-- @audit: row_count > 0
-- @audit: freshness(event_time, 24h)
-- @audit: row_count_change < 50%
-- @audit: min(amount) >= 0
-- @audit: zscore(amount) < 3.0
-- @audit: reconcile_sum(amount, raw.events.amount)
SELECT event_id, event_time, amount, category
FROM raw.events
Compared to Traditional Validation
Traditional approach:
- Run tests after data is written
- Failures require manual cleanup
- Data may already be used downstream
OndatraSQL:
- Audits run inside the transaction
- Failures automatically roll back
- Bad data never persists
Ondatra Labs