2026-04-10

AI Builders Digest — 2026-04-10

X / TWITTER

Andrej Karpathy — AI Researcher, Founding Team at OpenAI, Ex-Director of AI at Tesla

Someone recently suggested that the OpenAI moment was so big because it's the first time a large group of non-technical people experienced the latest agentic models. But Karpathy sees a growing gap in how people understand AI capability depending on their tier of use. Many tried the free tier of ChatGPT last year and let that inform their views — but those older models don't reflect what's possible with state-of-the-art agentic models today, especially Codex and Claude Code. Even the $200/month tier has capabilities that are "peaky" in technical domains like programming and math, which is why the biggest improvements are most visible to professional users in those areas. His takeaway: the two groups of people are speaking past each other. The free Advanced Voice Mode fumbles simple queries, while Codex will autonomously restructure an entire codebase or find and exploit vulnerabilities — both simultaneously true, both reflecting very different models.

"These free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code."

有人最近说，OpenAI 那一刻之所以影响这么大，是因为这是非技术人群第一次亲身体验到最新的智能体模型。但 Karpathy 认为，不同使用层级的人对 AI 能力的认知存在巨大鸿沟。许多人去年试过免费版 ChatGPT，就以此来判断 AI 的能力——但那些旧模型根本无法代表当前最先进的智能体模型，尤其是 Codex 和 Claude Code。即便是每月 200 美元的最顶级模型，其能力也主要体现在编程、数学等高度技术性的领域，这正是为什么普通用户的感知和专业人士的感知差距如此之大。他的结论是：这两群人其实在鸡同鸭讲。免费的 Advanced Voice Mode 会在简单问题上翻车，而 Codex 可以自主运行一小时，完整重构整个代码库，甚至发现并利用系统漏洞——两个事实同时成立，各自反映的是完全不同的模型。

X (formerly Twitter)

Andrej Karpathy (@karpathy) on X

Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is

Thariq — Claude Code at Anthropic

Thariq thinks "prompting" will remain an incredibly high-leverage skill, similar to writing or public speaking. It's the skill of talking to agents, mediated by the harness. His main goal is to grow the bandwidth between humans and agents, to help us understand each other better.

"It is the skill of talking to agents, mediated by the harness. My main goal is to grow the bandwidth between humans and agents, to help us understand each other better."

Thariq 认为"提示词工程"将一直是一个极高杠杆的技能，就像写作或公开演讲一样。它本质上是人与智能体之间的交流技能，只是通过 harness 作为媒介。他个人的主要目标是扩大人类和智能体之间的带宽，帮助彼此更好地理解对方。

X (formerly Twitter)

Thariq (@trq212) on X

I think "prompting" will keep being an incredibly high-leverage skill, like writing or public speaking. It is the skill of talking to agents, mediated by the harness. My main goal is to grow the bandwidth between humans and agents, to help us understand each other better.

Aaron Levie — CEO at Box

Levie thinks everyone substantially underestimates the total demand for software and automation in areas that don't feel like "software." Most companies haven't been able to bring automation to most areas of work because it was too complex or costly. Outside of tech or large banks, companies have to ration engineering resources very selectively. Agents bring down the cost of doing this work, and because many parts are now possible, companies will light these projects up. CPG and retail connecting marketing stacks, pharma automating more tests and simulations, bankers running 10x analyses on every scenario, healthcare bringing automation to every step of the process. The jobs argument will be wrong.

"This is why there demand will continue for anyone technical enough to execute this work, and why the jobs arguments will be wrong."

Levie 认为所有人都大幅低估了软件和自动化在那些看起来"不像软件"的领域的总需求。大多数公司之所以无法将自动化带入大多数工作场景，是因为以前太复杂或太贵。除了科技公司或大银行，大多数企业的工程资源都需要极度节约，这意味着大部分事情根本没资金去做。智能体大幅降低了这些工作的成本，而且许多以前不可能自动化的环节现在已经成为可能，企业会纷纷启动这些项目。消费品和零售打通营销系统，制药业自动化更多测试和模拟，银行家对每个场景进行十倍数量的分析，医疗在每个流程环节引入自动化……这正是为什么对技术人才的需求会持续旺盛，也是为什么"AI 导致失业"的论点站不住脚。

X (formerly Twitter)

Aaron Levie (@levie) on X

I think everyone is substantially underestimating the total demand for software and automation in areas that don’t feel like “software”. Not talking about software that’s another app on your phone. Software that just automates things for companies all day long. Most companies

Levie also points out that prompting should just encapsulate giving the agent everything it needs to perform the task — like giving clear instructions to a brand new colleague who just joined your team. This is high leverage.

同时他也指出，提示词本质上就是给智能体提供完成任务的全部所需——就像给刚加入团队的新同事清晰交代工作一样。这本身就是一个高杠杆的技能。

X (formerly Twitter)

Aaron Levie (@levie) on X

The idea that prompting would be useless is like if giving clear instructions to a brand new colleague who just joined your team is useless. “Prompting” should just encapsulate the entirely of giving the agent everything it needs to perform the task. This is high leverage.

Guillermo Rauch — CEO at Vercel

Rauch declares that "Agentic Infrastructure is the future of the cloud." Three layers: First, coding agents like Claude Code and Codex need infra that "clicks" for agents, not just developers. Second, deploying agents is like deploying pages — long-running compute, sandboxes, and token delivery networks are the building blocks. Third, Vercel itself becomes an agent: self-configuring (serverless), plus self-healing, self-optimizing, self-securing, with the agent holding the pager.

"Agentic Infrastructure will make existing companies more efficient and support the next generation of AI-native startups."

Rauch 宣称"智能体基础设施是云计算的未来"。三个层面：第一，Claude Code 和 Codex 这样的编码智能体需要为智能体"量身打造"的基础设施，而不是只服务于人类开发者。第二，部署智能体就像部署网页一样——长时间运行的计算、沙箱和 token 传输网络是新的构建块。第三，Vercel 本身也将变成一个智能体：在原有自我配置（serverless）的基础上，增加自我修复、自我优化、自我安全防护，智能体承担 on-call 的角色。

X (formerly Twitter)

Guillermo Rauch (@rauchg) on X

Agentic Infrastructure is the future of the cloud ① For coding agents If you use Claude Code, Codex, Cursor, you need infra that 'clicks' for your agents, not just devs. ② To deploy agents Pages → Agents. Long-running compute, sandboxes, and our token delivery network are

Alex Albert — Research at Anthropic

Allowing Sonnet to "phone a friend" (call Opus) increases performance while also reducing total cost, since it reduces tokens spent trying to solve more complex tasks.

让 Sonnet "打电话给朋友"（即调用 Opus）会同时提升性能并降低成本，因为它减少了大模型在尝试解决更复杂任务时消耗的 token 数量。

X (formerly Twitter)

Alex Albert (@alexalbert__) on X

Allowing Sonnet to "phone a friend" (i.e. call Opus) increases performance while also reducing total cost since it reduces tokens spent trying to solve more complex tasks

Peter Yang — Product at Roblox

Training my kids for an AGI-proof career.

从小培养孩子应对 AGI 时代的职业技能。

X (formerly Twitter)

Peter Yang (@petergyang) on X

Training my kids for an AGI proof career

Josh Woodward — VP at Google (Gemini / Google Labs)

In under 50 days, over 100 million songs have been generated on Gemini. They're unlocking the full power of their music model Lyria 3: up to 5 full-length tracks per day (~3 mins each), plus images and video. "You have the ideas. Gemini has the tools."

不到 50 天，Gemini 上已生成超过 1 亿首歌曲。他们正在解锁音乐模型 Lyria 3 的全部能力：每天最多生成 5 首完整曲目（约 3 分钟每首），同时支持图片和视频。"你有创意，Gemini 有工具。"

X (formerly Twitter)

Josh Woodward (@joshwoodward) on X

Create your first full song today. For free. In under 50 days, over 100 million songs have been generated on @GeminiApp. To celebrate, we’re unlocking the full power of our music model, Lyria 3. Here’s what you get starting today: 🎵 Generate up to 5 full-length tracks every

Garry Tan — President & CEO at Y Combinator

Tan released GBrain — an MIT-licensed open source tool that gives OpenClaw or Hermes Agent perfect total recall across 10,000+ markdown files. It's his actual OpenClaw/Hermes Agent setup. Works on Hermes Agent with the same install script.

Tan 发布了 GBrain——一个 MIT 许可的开源工具，可以让 OpenClaw 或 Hermes Agent 在 10000+ 个 markdown 文件中实现完美的完全记忆。这是他本人实际使用的 OpenClaw/Hermes Agent 配置。同样的安装脚本也适用于 Hermes Agent。

X (formerly Twitter)

Garry Tan (@garrytan) on X

If you want your OpenClaw or Hermes Agent to be able to have perfect total recall of all 10,000+ markdown files, GBrain is here to help. It's exactly my OpenClaw/Hermes Agent setup. MIT-licensed open source. Hope it helps you build your mini-AGI. https://t.co/yFpFU4pn5b

No notable posts

Swyx (builder at aidotengineer, latentspacepod) — shared AGI Pill bottle lore from an event, no substantive posts
Cat Wu (claude code + cowork at Anthropic) — announced faster Claude Code setup with Bedrock and Vertex
Ryo Lu (Design at Cursor) — shared "less is more" design quote
Matt Turck (VC at FirstMarkCap) — shared database comparison (SurrealDB > EdgeDB > MongoDB), made a crypto winter joke
Zara Zhang — noted AI is most useful outside her comfort zone (coding) rather than inside it (writing), which she enjoys doing manually
Nikunj Kothari — shared a side project, joked about AI detection getting it wrong on his "mostly hand written" post

PODCASTS

Unsupervised Learning — Ep 84: OpenAI's Chief Scientist on Continual Learning Hype, RL Beyond Code, & Future Alignment Directions

Noam Brown 是 OpenAI 首席科学家，也是 RL 领域最重要的研究者之一——他主导了 OpenAI o1/o3 的推理模型开发，以及与扑克和 Dota 相关的早期 RL 研究。这期播客是目前为止最深度的一次对话，触及了他对整个 AI 发展轨迹的看法。

持续学习不是新问题

很多人以为 RL 在代码和数学之外的领域难以扩展是因为缺乏"持续学习"能力，但 Brown 的观点出人意料：持续学习一直都是 GPT 模型的核心能力——"learning to learn in context"从 GPT-2 时代就存在了，而且 OpenAI 一直在沿着这条路推进。在他看来，炒作持续学习解决方案的创业公司并没有真正理解他们在解决什么。

代码是 RL 的完美训练场

Brown 解释了为什么代码领域 RL 进步如此之快：它的奖励函数是可验证的——单元测试通过就是通过，失败就是失败。相比之下，医疗、法律、金融领域的任务虽然也可以有奖励函数，但验证难度大得多。这既是为什么 o1/o3 在代码和数学上突飞猛进，也是为什么这些能力向其他领域扩展需要更长时间。

研究级 AI 的定义

Brown 给出了一个精确的定义：研究实习生和完全自主的研究员之间的区别，在于任务的时间跨度——你需要给模型多少具体的指导。他说，按照这个标准，"不完全是今年"，但在未来几年内实现完全自主的 AI 研究员"并非不可能"。他已经观察到 GPT-5.2 Pro 产生了一些真正有影响力的小 ideas，虽然还很微小，但这是一个明确的信号。

对 AI 安全的独特视角

Brown 最有趣的技术观点之一是"链式思维监控"的价值。由于推理模型的思考过程（chain of thought）不直接受训练信号监督，它本质上是一个用于观察模型真实动机的窗口——既能看到它真正在想什么，也能发现它可能试图隐藏的东西。这与机械可解释性（mechanistic interpretability）的思路相通，但优势在于思维是自然语言，更容易理解。Brown 认为，随着模型工作时间越来越长，这种能力会同步扩展。

给构建者的建议

Brown 被问到公司是否应该自己做 RL（用开源模型加自有数据进行领域微调）。他的回答很直接：contextual learning（上下文学习）正在变得越来越好，最终可能比专门跑 RL pipeline 更高效。你仍然需要收集数据、定义任务，但把这些喂给模型的上下文，可能比在自己的模型上做 RL 更有价值。

一个令人回味的预测

Brown 提到他 PhD 研究的一个问题，看到模型在一个小时内就想出了他需要一两周才能想到的 ideas。他形容这种感觉"非常奇怪"——就像看 AlphaGo 下出那些不应该出现的创新走法一样。"有趣的东西不应该无限期地出现，但它确实在发生了。"

OpenAI's Chief Scientist Noam Brown joins Unsupervised Learning for a wide-ranging conversation covering RL scaling, alignment, AI for science, and what AGI timelines actually look like from inside the lab.

持续学习不是新问题

代码是 RL 的完美训练场

研究级 AI 的定义

对 AI 安全的独特视角

Brown 最有趣的技术观点之一是"链式思维监控"的价值。由于推理模型的思考过程（chain of thought）不直接受训练信号监督，它本质上是一个用于观察模型真实动机的窗口——既能看到它真正在想什么，也能发现它可能试图隐藏的东西。这与机械可解释性（mechanistic interpretability）的思路相通，但优势在于思维是自然语言，更容易理解。 Brown 认为，随着模型工作的时间越来越长，这种能力会同步扩展。

给构建者的建议

一个令人回味的预测

YouTube

Unsupervised Learning: With Jacob Effron

On Unsupervised Learning we probe the sharpest minds in AI in search for the truth about what’s real today, what will be real in the future and what it all means for businesses and the world. If you’re a builder, researcher or investor navigating the AI world, this podcast will help you deconstruct and understand the most important breakthroughs and see a clearer picture of reality. Subscribe to this show to stay up to date on our latest episodes. Unsupervised Learning is a podcast by Redpoint Ventures, an early-stage venture capital fund that has invested in companies like Snowflake, Stripe, and Mistral. Hosted by Redpoint investor Jacob Effron alongside Patrick Chase, Jordan Segall and Erica Brescia.

Generated through the Follow Builders skill: https://github.com/zarazhangrui/follow-builders