Google Sheets

Read from and write to Google Sheets with OndatraSQL. Fetch spreadsheet data into your pipeline, push results back to a sheet. OAuth with browser-based login.

googlespreadsheetsoauth

Model path models/raw/sheet_data.sql

Kind table

Vendor google

Auth oauth

Source type rest-api

Read from and write to Google Sheets. Fetch spreadsheet data into your DuckLake pipeline, transform it with SQL, push results back to another sheet. OAuth with browser-based login — no service account needed.

What This Blueprint Gives You

Fetch — read any Google Sheet as a SQL table: FROM gsheets('spreadsheet_id', 'Sheet1')
Push — write pipeline results back to a sheet via @sink: gsheets('spreadsheet_id', 'Sheet1')
table kind: full replace (clear + write)
append kind: add rows below existing data
First row treated as headers — column names come from the sheet
OAuth via ondatrasql auth google-sheets — browser-based, tokens refresh automatically

Quick Start

1. Authenticate

ondatrasql auth google-sheets

Your browser opens. Approve access. Done — one time per project.

2. Add the blueprint

Copy lib/gsheets.star into your project’s lib/ directory.

3. Fetch from a sheet

-- models/raw/sheet_data.sql
-- @kind: table

SELECT *
FROM gsheets('1SbbFq-5H349xL7sSDGlv_DBAQTAG4grX_H2hhm_cHGM', 'Sheet1')

The spreadsheet ID is the long string in the Google Sheets URL between /d/ and /edit.

4. Run

ondatrasql run raw.sheet_data

[OK] raw.sheet_data (table, backfill, 20 rows, 1.1s — first run)

Datum	Impressions
2021-01-01	65443
2021-01-02	65432
2021-01-03	632

5. Push to another sheet

Create a sync model with the target spreadsheet ID and sheet tab as sink args:

-- models/sync/sheet_output.sql
-- @kind: table
-- @sink: gsheets('1DYJCOdMwmbru2YB2t9HTj9BNlXQPtVrcc5rJGL-9_SM', 'Sheet1')

SELECT *
FROM raw.sheet_data

ondatrasql run

[OK] raw.sheet_data (table, backfill, 20 rows, 1.1s — first run)
[OK] sync.sheet_output (table, backfill, 20 rows, 2.3s — first run)

The target sheet is cleared and populated with the pipeline output — headers included.

Fetch options

Different sheet tab

FROM gsheets('spreadsheet_id', 'Revenue')

Specific range

FROM gsheets('spreadsheet_id', 'Sheet1!A1:D100')

Transform in staging

-- models/staging/daily_metrics.sql
-- @kind: table

SELECT
    Datum::DATE AS date,
    Impressions::BIGINT AS impressions
FROM raw.sheet_data

Push modes

Full replace (table)

-- @kind: table
-- @sink: gsheets('spreadsheet_id', 'Sheet1')
SELECT * FROM mart.summary

Clears the sheet and writes all rows on every run. Headers are written automatically on the first batch.

Append only

-- @kind: append
-- @sink: gsheets('spreadsheet_id', 'Log')
SELECT * FROM mart.notifications

Adds new rows below existing data. Never clears. Headers are written if the sheet is empty.

Other kinds

merge, tracked, and scd2 are not supported — the blueprint validates this at startup. For merge semantics, use @kind: merge in a mart model and @kind: table + @sink: gsheets in a sync model:

-- models/mart/customers.sql
-- @kind: merge
-- @unique_key: customer_id
SELECT customer_id, name, email FROM staging.customers

-- models/sync/customers_sheet.sql
-- @kind: table
-- @sink: gsheets('spreadsheet_id', 'Customers')
SELECT * FROM mart.customers

Merge logic happens in DuckLake. The sheet always gets the full current state via full replace.

How It Works

Fetch

Reads values from GET /v4/spreadsheets/{id}/values/{range} with UNFORMATTED_VALUE render option
First row becomes column headers
Returns all rows in a single page (no pagination — Sheets API returns all values at once)

Push

table kind: clears target range on first batch, writes headers + data
append kind: checks if sheet is empty and writes headers if needed, then appends rows
Only insert change types are written — delete events (from table kind’s truncate) are ignored
Uses RAW value input option — values are stored as-is

Blueprint

lib/gsheets.star

RESERVED_COLUMNS = ()

API = {
    "base_url": "https://sheets.googleapis.com/v4",
    "auth": {"provider": "google-sheets"},
    "timeout": 30,
    "retry": 3,
    "backoff": 2,
    "fetch": {
        "args": ["spreadsheet_id", "range"],
        "supported_kinds": ["table"],
    },
    "push": {
        "args": ["spreadsheet_id", "range"],
        "supported_kinds": ["table", "append"],
        "batch_size": 1000,
        "batch_mode": "atomic",
    },
}


def fetch(spreadsheet_id, range, page=None):
    resp = http.get("/spreadsheets/" + spreadsheet_id + "/values/" + range, params={
        "valueRenderOption": "UNFORMATTED_VALUE",
        "dateTimeRenderOption": "FORMATTED_STRING",
    })
    if not resp.ok:
        fail("sheets read: " + resp.text)

    values = resp.json.get("values", [])
    if len(values) < 2:
        return {"rows": [], "next": None}

    headers = values[0]
    rows = []
    for row_values in values[1:]:
        row = {}
        for i, header in enumerate(headers):
            row[header] = row_values[i] if i < len(row_values) else None
        rows.append(row)

    return {"rows": rows, "next": None}


def push(rows=[], batch_number=1, kind="table", columns=[], spreadsheet_id="", range=""):
    if len(rows) == 0:
        return

    base = "/spreadsheets/" + spreadsheet_id + "/values/" + range

    # Filter to inserts only (table kind sends delete + insert, we only need inserts)
    inserts = []
    for r in rows:
        if r.get("__ondatra_change_type", "insert") == "insert":
            inserts.append([r.get(h, "") for h in columns])

    if len(inserts) == 0:
        return

    if kind == "table":
        # Full replace: clear + write
        if batch_number == 1:
            http.post(base + ":clear", json={})
            data = [list(columns)] + inserts
        else:
            data = inserts
        resp = http.put(base, params={"valueInputOption": "RAW"}, json={"values": data})
        if not resp.ok:
            fail("sheets write: " + resp.text)

    elif kind == "append":
        # Append: write headers if empty, then append
        if batch_number == 1:
            resp = http.get(base, params={"valueRenderOption": "UNFORMATTED_VALUE"})
            if not resp.ok or not resp.json.get("values"):
                http.put(base, params={"valueInputOption": "RAW"}, json={"values": [list(columns)]})
        resp = http.post(base + ":append", params={
            "valueInputOption": "RAW",
            "insertDataOption": "INSERT_ROWS",
        }, json={"values": inserts})
        if not resp.ok:
            fail("sheets append: " + resp.text)

Notes

Authentication — OAuth via ondatrasql auth google-sheets. Requires spreadsheets scope for read+write. Tokens refresh automatically.
Spreadsheet ID — the long string in the URL: https://docs.google.com/spreadsheets/d/{spreadsheet_id}/edit
Sink args — target spreadsheet ID and sheet tab are passed directly in the @sink directive: @sink: gsheets('spreadsheet_id', 'Sheet1')
Headers — first row of the sheet is always treated as column headers
Value rendering — fetch uses UNFORMATTED_VALUE to get raw numbers/dates instead of formatted strings
Supported kinds — fetch: table only (Sheets API always returns all rows). Push: table (full replace) and append. Validated at startup.
Empty cells — returned as None in fetch, written as "" in push
Rate limits — Google Sheets API allows 300 requests per minute per project. The API dict configures retry with backoff for 429 responses.
Runtime kwargs filtering — The blueprint only declares the parameters it uses. The runtime automatically filters kwargs to match.

See API Dict Reference for all configuration options and Set Up OAuth for authentication setup.