What building bilig taught me about spreadsheet agents

I thought the hard part would be formulas.

That was only half right. Formula behavior matters, and Excel compatibility is full of traps, but the problem that kept biting me was more basic: after an agent edits a workbook, how do you know the edit really happened?

That question shaped bilig. The public package, @bilig/headless, is a WorkPaper API for building sheets, writing formulas, recalculating values, saving a workbook, restoring it, and reading the result back without driving a browser.

That sounds like infrastructure, but the product question is trust.

Screenshots are weak proof

A screenshot is useful for a person. It is bad evidence for an agent.

It cannot tell you whether a value came from a literal or a formula. It cannot prove the workbook recalculated after a write. It misses hidden sheets, named expressions, metadata, tables, persisted state, and stale viewport bugs.

I have hit the stale viewport case enough times that I do not treat it as a corner case anymore. The engine says the write applied. A direct readback matches. The visible grid is still one revision behind, clipped, or missing the cell that matters.

So the loop has to be stricter:

Read the workbook state.
Apply the smallest coherent change.
Verify through the workbook API.
Verify the rendered range is fresh.
Keep an undo or restore path.
Refuse to call it done if one of those checks is missing.

Slower than a demo, but closer to something I would let near a model with money, dates, and formulas in it.

The grid should not be the database

The browser grid still matters. People need selection, formatting, freeze panes, formula bars, copy/paste, and the spatial feel of a spreadsheet.

But for an agent, pixels are the wrong contract.

A workbook API gives the model operations with names: write this range, set this formula, recalculate, read the value, serialize the workbook, restore it, verify the downstream cell. Those operations can be logged, tested, replayed, and undone. A screenshot cannot do that.

@bilig/headless exists for that reason. It gives services and coding agents a workbook surface without pretending that browser automation is the source of truth.

"Applied" is not the same as "trusted"

One design change I care about is treating write receipts as product data, not debug noise.

A useful receipt should say what range changed, what revision it moved from and to, what the authoritative readback returned, whether the rendered readback caught up, what warnings were produced, and whether an undo token exists.

A receipt sounds bureaucratic until something fails. Then it is the only way to tell the difference between:

the edit never applied,
the edit applied but the formula is wrong,
the edit applied but the browser is stale,
the edit applied and rendered correctly, but undo was not proven.

Those are different failures. A workbook tool has to preserve that difference instead of reporting all of them as success.

Real workbooks make claims smaller

Toy workbooks are useful for development. They are not enough for confidence.

bilig has been growing corpus tooling around public workbook files because real files make the claims sharper. A mismatch is a repro seed. A skipped volatile formula is not a pass. A verifier startup failure is not the same thing as an unsupported workbook. A cached checkpoint is only valid if it still matches the same artifact.

That work is slow, but it keeps the project honest. Compatibility stops being a claim and becomes a list of cases someone can inspect.

Rendering still counts

Headless APIs do not remove the need for a good spreadsheet UI.

The browser side of bilig has its own hard problems: projected viewport patches, worker-backed state, local persistence, selection actions, change panels, and a renderer path built around tiled rendering and TypeGPU. That work feels different from formula correctness, but it hits the same trust boundary.

Can the user scroll without lag? Does an edit become visible quickly? Does text overflow across tile boundaries? Do frozen panes behave? Are retained tiles fresh, or do they only look plausible?

For spreadsheets, the visual layer is part of correctness. The agent can use the API as truth, but the human still has to believe the grid.

Where I landed

bilig is not a finished Excel clone. It does not claim full formula parity or perfect XLSX fidelity. The current claim is narrower:

@bilig/headless gives services and agents a typed workbook surface for formula-backed models, structural edits, persistence, readback, and verification. The larger repo is building the browser runtime, renderer, sync, agent tools, and corpus checks around the same idea.

The conclusion I keep coming back to is narrow: spreadsheet automation is not about making an agent type into cells. It is about giving the agent a contract that survives being wrong.

When the edit is real, prove it. When the rendered view is stale, say so. When a formula is unsupported, keep the mismatch. When undo exists, carry the token and verify the restore.

That version of spreadsheet automation I would trust.