2026-04-08
AI Builders Digest — 2026-04-08
X / Twitter
Alex Albert, Research at Anthropic
Anthropic dropped Claude Mythos Preview — a new model released just two months after Claude Opus 4.6. The model is available to launch partners in Project Glasswing. Albert called Glasswing "possibly the most consequential event in the AI industry" he's witnessed in nearly three years at Anthropic — phrasing that doesn't come lightly from someone inside the lab.
- https://x.com/alexalbert__/status/2041579938537775160
- https://x.com/alexalbert__/status/2041579950332113155
Kevin Weil, VP Science at OpenAI
OpenAI's Prism project is moving fast. Prism's Paper Review feature is built to act like a careful technical reviewer — catching issues in math, derivations, notation, units, and whether a paper's claims are actually supported by results. It also catches cross-section inconsistencies and flags citation issues. The review outputs as an editable LaTeX file directly in the project workspace. Prism is powered by Codex, and a designer lead is being hired to push the science-acceleration product further.
- https://x.com/kevinweil/status/2041573802212303053
- https://x.com/kevinweil/status/2041592093718749659
Thariq, Claude Code at Anthropic
After running around 10 customer discovery calls and reading through transcripts, Thariq's biggest takeaway: teams waste a lot of tokens on open-ended verification that doesn't meaningfully improve output quality. He's planning to write more on how to verify efficiently — a topic that's become increasingly critical as teams scale agent usage.
Also promoting a technical writing workshop in SF in two weeks, co-hosted with swyx and MilksandMatcha.
Cat Wu, Claude Code at Anthropic
Anthropic's own team published their internal Claude Code power-user tips as a /powerup command — sharing the team's favorite CLI features publicly. Good signal that the best tooling insights often come from the people building inside the company.
Guillermo Rauch, CEO at Vercel
Spoke at Y Combinator and came out more bullish than ever. Called it "exceptional founders, best city, best time, best opportunity to build in generations."
Nan Yu, Head of Product at Linear
An observation that's landing well: designers and engineers often do well thinking through abstract product questions — but take away the IDE or Figma and they immediately dive into building exactly what was asked for. His conclusion: more designers should become PMs, because they'd be good at it.
Peter Yang, Product at Roblox
Noticed Amazon shipping a bunch of AI infrastructure plays and seems cautiously intrigued — linked two AWS-focused announcements with a "wew i'm actually going to try this." Also quote-tweeted curiosity about whether Anthropic has been using Mythos internally to move at their "recent insane velocity."
- https://x.com/petergyang/status/2041678988318543908
- https://x.com/petergyang/status/2041675995665612954
swyx
Three quick hits: giving due credit to the Latent Space podcast team for the Ryan Lopopolo / OpenAI harness engineering episode, commenting on Simon Willison's naming distinction between "prompt injection" and "lethal trifecta," and general commentary on Amazon's AI infrastructure moves.
- https://x.com/swyx/status/2041568051041063118
- https://x.com/swyx/status/2041739250421436591
- https://x.com/swyx/status/2041675995665612954
Podcasts
Latent Space — "Extreme Harness Engineering for Token Billionaires"
Ryan Lopopolo, OpenAI Frontier team, on what it actually looks like to run a software team where the agents do everything — including managing themselves.
The Takeaway
OpenAI's Frontier team built a 1M+ line codebase with zero human-written code over five months — and discovered that humans, not agents, became the bottleneck.
Ryan Lopopolo works on frontier product exploration at OpenAI, building enterprise agent deployment infrastructure. His background spans Snowflake, Brex, Stripe, and Citadel — companies that live and die by customer infrastructure at scale. So when he took on a greenfield internal tool project, he applied the same rigor, with one twist: he wouldn't write any code himself.
The experiment lasted five months. The result: a million lines of code, 1,500+ PRs, and a team operating at a pace he estimates was 10x faster than if he'd done it manually. The key constraint that forced the breakthrough was exactly what you'd expect — when the model couldn't build what he asked for, he'd step in, decompose the task into smaller building blocks, and feed those back. After the first month and a half of moving 10x slower than normal, the tools and scaffold had matured to the point where the agent could assemble everything.
"The models are trivially parallelizable. As many GPUs and tokens as I am willing to spend, I can have capacity to work on my codebase. The only fundamentally scarce thing is the synchronous human attention of my team. There are only so many hours in the day. I have to eat lunch. I would like to sleep."
The team moved to post-merge code review — meaning PRs merge autonomously, and humans read afterward for oversight rather than gatekeeping. Build times were forced under one minute (from 12+ minutes) because Codex 5.3 introduced background shells, making the model less patient with blocking scripts. The team iterated through make → Bazel → Turbo → NX in a week. They chose NX not out of preference but because it hit the constraint.
The architecture itself inverted the traditional scaffolding pattern: instead of setting up an environment for the coding agent to enter, the agent is the entry point, and it boots its own stack. This is fundamentally different from pre-reasoning models, which needed predefined state transitions. The new pattern gives the model the full box and lets it make intelligent choices.
Symphony — the Elixir-based orchestration layer that manages multiple agents — came out of a specific pain point. With 5.2, the team hit 5-10 PRs per engineer per day, but the constant context-switching between terminal windows to drive each agent was exhausting. Symphony spawns a daemon per task, handles rework autonomously (if a PR isn't mergeable after review, it trash-cans the worktree and starts fresh), and keeps a human loop only for the actual decision: merge or rework.
On what models still can't do well: going from a blank product idea to a playable prototype in one shot, and the gnarliest refactorings. These still require synchronous human steering. But the trajectory is clear — every model release pushes further into what was previously considered human-only complexity.
"We went from low complexity tasks to low complexity and big tasks in both these directions. This is what it means to not bet against the model."
The full spec for Symphony and the harness engineering framework is published as a "ghost library" — a spec so detailed that pointing it to a coding agent with Codex lets it reproduce the entire system locally.
Reply to adjust your delivery settings or summary style.