2026-04-17

AI Builders Digest — 2026-04-17

PODCASTS

Latent Space — "Notion's Token Town: 5 Rebuilds, 100+ Tools, MCP vs CLIs and the Software Factory Future — Simon Last & Sarah Sachs of Notion"
https://www.youtube.com/@LatentSpacePod

The Takeaway: Notion has been rebuilding their AI agent framework since 2022, learning the hard way that the key to working with AI isn't building elaborate abstractions — it's stripping everything down to what the model actually wants, whether that's markdown or SQL, and building a culture where deleting your own code is a feature, not a failure.

Simon Last and Sarah Sachs from Notion sat down with Latent Space to talk through five years of internal AI development, and the result is one of the most honest, dense conversations about what it actually takes to ship AI products inside a mature company. Simon, who's been working on this since GPT-4 access arrived in late 2022, describes the early attempts as "too early" — they tried fine-tuning models on Notion-specific function calling before function calling even existed as a concept, and the models were simply too dumb and context windows too short. The real unlock came around Sonn-3.5/3.6 early last year, and they've been shipping ever since.

One of the most counterintuitive ideas Sarah pushed: give the models what they want, not what makes sense for your internal system. Early versions of Notion's agent used elaborate XML formats that mapped losslessly to Notion blocks. The model hated it. They switched to SQLite queries — "give the models what they want" — and quality jumped. The same principle applied to markdown over custom formats. The lesson: fight your instinct to expose the full complexity of your system; hide everything unnecessary.

On the culture side, Notion runs what Sarah calls the "Simon Vortex" — a small team of senior engineers who cycle through frontier projects at high velocity, prototyping and rewriting constantly. "We rebuild our harness three or four times," she says. "The second rule of engineering leadership is build a team that's comfortable deleting their own code." New features ship as prototypes first, evaluated internally by everyone at the company using Notion with feature flags, and promoted to full products only after they prove themselves. Security review comes in before anything else — "they build better product if they're involved early." The team size for core AI infrastructure is about 50 people, but every product engineering team at Notion is now also responsible for making their features work for agents, not just humans. "Over time, a majority of our traffic will be coming from agents using our interface, not humans. Our objective is to make it so the whole product org is building for agents."

On MCP vs CLIs, Simon is bullish on both but draws a sharp distinction: CLIs are inherently self-bootstrapping — if something breaks, the agent can debug and fix itself in the same environment. MCP can't do that. "If you use Chrome DevTools MCP and the transport gets messed up, the agent has no way to fix itself," he says. "MCP is just the dumb simple thing that works." Sarah adds that Notion uses both deliberately — for some integrations like GitHub and Linear, MCP is fine; for search and Slack, they built in-house because MCP couldn't deliver the quality they needed.

They also coined the term "Notion's Last Exam" — the inverse of a unit test. Instead of testing things they expect the model to pass, they maintain a set of evaluations that currently pass only 30% of the time, specifically to give honest feedback to frontier labs about where models are still failing. "We hit a point where our evals were saturated — we couldn't give insightful feedback anymore," Sarah explains. They now have a data scientist, a model behavior engineer (MBE), and a dedicated eval engineer working on this full-time. The MBE role is notable — they started with linguists and literature PhDs who could judge whether model outputs looked good, and evolved into a hybrid role mixing data science, testing, PM, and prompt engineering. "You don't need an engineering background to be the best at this job."

Sarah also made a sharp observation about the current model market: "There's a no-man's land right now where reasoning models were six months ago — Haiku and nano haven't caught up. Labs aren't incentivized to fill the whole triangle of intelligence, price, and latency. They're just the cheapest." Notion is actively investing in open-source models to fill that gap.

Finally, Simon described their vision for a "software factory" — multiple agents collaborating to develop, debug, review, and deploy code with minimal human intervention. Specs live in markdown, verification happens through testing layers, and bugs automatically file themselves into a Notion database that a manager agent monitors. When one team member had 30 custom agents generating 70 notifications a day, the solution was to build one manager agent above them all — now five notifications a day instead of 70.

Reply to adjust your delivery settings or summary style.