Blueprints Blog Contact About

Audits

Detect regressions — rolled back automatically if they fail

Audits detect regressions after data is written.

If an audit fails, the transaction is rolled back and the model fails.

Constraints stop bad rows. Audits catch bad results.

Why This Matters

Some data issues can’t be detected row by row.

  • Row counts suddenly drop
  • Distributions drift
  • Freshness breaks
  • Aggregates don’t match source systems

These are not row-level problems — they are dataset-level problems. That’s what audits are for.

When Audits Run

Audits run inside the same transaction as the data write.

This means:

  • Either everything succeeds
  • Or nothing is committed

If any audit fails:

  • The transaction is rolled back
  • No data is committed
  • Downstream models do not run

Usage

Add audits directly in your model:

-- @audit: PATTERN

You can define multiple audits per model.

-- @audit: row_count > 0
-- @audit: row_count_change < 20%
-- @audit: freshness(updated_at, 24h)
-- @audit: reconcile_sum(total, raw.orders.total)
-- @audit: distribution(status) STABLE(0.1)

If any of these fail, the entire model is rolled back.

Constraints vs Audits

Constraints:

  • Run before insert
  • Validate individual rows
  • Prevent bad data from entering

Audits:

  • Run after insert
  • Validate the dataset as a whole
  • Catch regressions and anomalies

Common Patterns

PatternExampleWhat it catches
row_count > 0row_count > 0Empty results
freshness(col, duration)freshness(updated_at, 24h)Stale data
row_count_change < N%row_count_change < 20%Sudden drops or spikes
min(col) OP Nmin(price) >= 0Out-of-range values
max(col) OP Nmax(price) <= 10000Out-of-range values
reconcile_sum(col, other.col)reconcile_sum(total, raw.orders.total)Source/target mismatch
distribution(col) STABLE(N)distribution(status) STABLE(0.1)Category drift

All 17 Patterns

Row Count

PatternExampleDescription
row_count OP Nrow_count >= 100Check total row count (supports >=, <=, >, <, =)
row_count_change < N%row_count_change < 10%Row count change from previous run

Freshness

PatternExampleDescription
freshness(col, duration)freshness(updated_at, 24h)Maximum age of most recent record (supports h for hours, d for days)

Statistical

PatternExampleDescription
mean(col) ...mean(price) BETWEEN 10 AND 100Average value check (supports BETWEEN and comparisons)
stddev(col) < Nstddev(amount) < 25Standard deviation threshold
min(col) OP Nmin(price) >= 0Minimum value check
max(col) OP Nmax(price) <= 10000Maximum value check
sum(col) OP Nsum(amount) >= 0Aggregate sum check
zscore(col) < Nzscore(amount) < 3.0Statistical outlier detection
percentile(col, p) OP Npercentile(latency, 0.95) <= 500Quantile validation

Reconciliation

PatternExampleDescription
reconcile_count(table)reconcile_count(raw.orders)Row counts must match
reconcile_sum(col, other.col)reconcile_sum(total, raw.orders.total)Aggregate sums must match

Schema

PatternExampleDescription
column_exists(col)column_exists(email)Verify column exists in output
column_type(col, type)column_type(price, DECIMAL(18,2))Verify column type (supports precision)

Data Comparison

PatternExampleDescription
golden('path')golden('tests/expected.csv')Compare output with golden dataset
distribution(col) STABLE(threshold)distribution(status) STABLE(0.1)Categorical distribution stability

Full Example

-- @kind: append
-- @incremental: event_time
-- @audit: row_count > 0
-- @audit: freshness(event_time, 24h)
-- @audit: row_count_change < 50%
-- @audit: min(amount) >= 0
-- @audit: zscore(amount) < 3.0
-- @audit: reconcile_sum(amount, raw.events.amount)

SELECT event_id, event_time, amount, category
FROM raw.events

Compared to Traditional Validation

Traditional approach:

  • Run tests after data is written
  • Failures require manual cleanup
  • Data may already be used downstream

OndatraSQL:

  • Audits run inside the transaction
  • Failures automatically roll back
  • Bad data never persists