Google Sheets
Read from and write to Google Sheets with OndatraSQL. Fetch spreadsheet data into your pipeline, push results back to a sheet. OAuth with browser-based login.
models/raw/sheet_data.sqlRead from and write to Google Sheets. Fetch spreadsheet data into your DuckLake pipeline, transform it with SQL, push results back to another sheet. OAuth with browser-based login — no service account needed.
What This Blueprint Gives You
- Fetch — read any Google Sheet as a SQL table:
FROM gsheets('spreadsheet_id', 'Sheet1') - Push — write pipeline results back to a sheet via
@sink: gsheets('spreadsheet_id', 'Sheet1') tablekind: full replace (clear + write)appendkind: add rows below existing data- First row treated as headers — column names come from the sheet
- OAuth via
ondatrasql auth google-sheets— browser-based, tokens refresh automatically
Quick Start
1. Authenticate
ondatrasql auth google-sheets
Your browser opens. Approve access. Done — one time per project.
2. Add the blueprint
Copy lib/gsheets.star into your project’s lib/ directory.
3. Fetch from a sheet
-- models/raw/sheet_data.sql
-- @kind: table
SELECT *
FROM gsheets('1SbbFq-5H349xL7sSDGlv_DBAQTAG4grX_H2hhm_cHGM', 'Sheet1')
The spreadsheet ID is the long string in the Google Sheets URL between /d/ and /edit.
4. Run
ondatrasql run raw.sheet_data
[OK] raw.sheet_data (table, backfill, 20 rows, 1.1s — first run)
| Datum | Impressions |
|---|---|
| 2021-01-01 | 65443 |
| 2021-01-02 | 65432 |
| 2021-01-03 | 632 |
5. Push to another sheet
Create a sync model with the target spreadsheet ID and sheet tab as sink args:
-- models/sync/sheet_output.sql
-- @kind: table
-- @sink: gsheets('1DYJCOdMwmbru2YB2t9HTj9BNlXQPtVrcc5rJGL-9_SM', 'Sheet1')
SELECT *
FROM raw.sheet_data
ondatrasql run
[OK] raw.sheet_data (table, backfill, 20 rows, 1.1s — first run)
[OK] sync.sheet_output (table, backfill, 20 rows, 2.3s — first run)
The target sheet is cleared and populated with the pipeline output — headers included.
Fetch options
Different sheet tab
FROM gsheets('spreadsheet_id', 'Revenue')
Specific range
FROM gsheets('spreadsheet_id', 'Sheet1!A1:D100')
Transform in staging
-- models/staging/daily_metrics.sql
-- @kind: table
SELECT
Datum::DATE AS date,
Impressions::BIGINT AS impressions
FROM raw.sheet_data
Push modes
Full replace (table)
-- @kind: table
-- @sink: gsheets('spreadsheet_id', 'Sheet1')
SELECT * FROM mart.summary
Clears the sheet and writes all rows on every run. Headers are written automatically on the first batch.
Append only
-- @kind: append
-- @sink: gsheets('spreadsheet_id', 'Log')
SELECT * FROM mart.notifications
Adds new rows below existing data. Never clears. Headers are written if the sheet is empty.
Other kinds
merge, tracked, and scd2 are not supported — the blueprint validates this at startup. For merge semantics, use @kind: merge in a mart model and @kind: table + @sink: gsheets in a sync model:
-- models/mart/customers.sql
-- @kind: merge
-- @unique_key: customer_id
SELECT customer_id, name, email FROM staging.customers
-- models/sync/customers_sheet.sql
-- @kind: table
-- @sink: gsheets('spreadsheet_id', 'Customers')
SELECT * FROM mart.customers
Merge logic happens in DuckLake. The sheet always gets the full current state via full replace.
How It Works
Fetch
- Reads values from
GET /v4/spreadsheets/{id}/values/{range}withUNFORMATTED_VALUErender option - First row becomes column headers
- Returns all rows in a single page (no pagination — Sheets API returns all values at once)
Push
- table kind: clears target range on first batch, writes headers + data
- append kind: checks if sheet is empty and writes headers if needed, then appends rows
- Only
insertchange types are written —deleteevents (from table kind’s truncate) are ignored - Uses
RAWvalue input option — values are stored as-is
Blueprint
lib/gsheets.star
RESERVED_COLUMNS = ()
API = {
"base_url": "https://sheets.googleapis.com/v4",
"auth": {"provider": "google-sheets"},
"timeout": 30,
"retry": 3,
"backoff": 2,
"fetch": {
"args": ["spreadsheet_id", "range"],
"supported_kinds": ["table"],
},
"push": {
"args": ["spreadsheet_id", "range"],
"supported_kinds": ["table", "append"],
"batch_size": 1000,
"batch_mode": "atomic",
},
}
def fetch(spreadsheet_id, range, page=None):
resp = http.get("/spreadsheets/" + spreadsheet_id + "/values/" + range, params={
"valueRenderOption": "UNFORMATTED_VALUE",
"dateTimeRenderOption": "FORMATTED_STRING",
})
if not resp.ok:
fail("sheets read: " + resp.text)
values = resp.json.get("values", [])
if len(values) < 2:
return {"rows": [], "next": None}
headers = values[0]
rows = []
for row_values in values[1:]:
row = {}
for i, header in enumerate(headers):
row[header] = row_values[i] if i < len(row_values) else None
rows.append(row)
return {"rows": rows, "next": None}
def push(rows=[], batch_number=1, kind="table", columns=[], spreadsheet_id="", range=""):
if len(rows) == 0:
return
base = "/spreadsheets/" + spreadsheet_id + "/values/" + range
# Filter to inserts only (table kind sends delete + insert, we only need inserts)
inserts = []
for r in rows:
if r.get("__ondatra_change_type", "insert") == "insert":
inserts.append([r.get(h, "") for h in columns])
if len(inserts) == 0:
return
if kind == "table":
# Full replace: clear + write
if batch_number == 1:
http.post(base + ":clear", json={})
data = [list(columns)] + inserts
else:
data = inserts
resp = http.put(base, params={"valueInputOption": "RAW"}, json={"values": data})
if not resp.ok:
fail("sheets write: " + resp.text)
elif kind == "append":
# Append: write headers if empty, then append
if batch_number == 1:
resp = http.get(base, params={"valueRenderOption": "UNFORMATTED_VALUE"})
if not resp.ok or not resp.json.get("values"):
http.put(base, params={"valueInputOption": "RAW"}, json={"values": [list(columns)]})
resp = http.post(base + ":append", params={
"valueInputOption": "RAW",
"insertDataOption": "INSERT_ROWS",
}, json={"values": inserts})
if not resp.ok:
fail("sheets append: " + resp.text)
Notes
- Authentication — OAuth via
ondatrasql auth google-sheets. Requiresspreadsheetsscope for read+write. Tokens refresh automatically. - Spreadsheet ID — the long string in the URL:
https://docs.google.com/spreadsheets/d/{spreadsheet_id}/edit - Sink args — target spreadsheet ID and sheet tab are passed directly in the
@sinkdirective:@sink: gsheets('spreadsheet_id', 'Sheet1') - Headers — first row of the sheet is always treated as column headers
- Value rendering — fetch uses
UNFORMATTED_VALUEto get raw numbers/dates instead of formatted strings - Supported kinds — fetch:
tableonly (Sheets API always returns all rows). Push:table(full replace) andappend. Validated at startup. - Empty cells — returned as
Nonein fetch, written as""in push - Rate limits — Google Sheets API allows 300 requests per minute per project. The API dict configures retry with backoff for 429 responses.
- Runtime kwargs filtering — The blueprint only declares the parameters it uses. The runtime automatically filters kwargs to match.
See API Dict Reference for all configuration options and Set Up OAuth for authentication setup.
OndatraSQL
