The Deep Feed

01 — Lenny's Newsletter

The Agentic Shift: Engineering Beyond the Human Limit

How Braintrust is turning AI from a coding assistant into a rigorous infrastructure architect

By Claire Vo · 12 min read

Editor's note: As AI moves from writing snippets to managing infrastructure, the definition of a senior engineer is changing.

The traditional image of software engineering involves a human sitting before a terminal, carefully weighing the trade-offs of a database index or a new microservice architecture. This process is slow, prone to fatigue, and limited by the cognitive load of a single mind. But a shift is occurring. We are moving away from using AI as a mere autocomplete for code and toward a model where autonomous agents handle the heavy lifting of technical experimentation. Ankur Goyal, CEO of Braintrust, describes a reality where agents don't just suggest lines of code; they run week-long benchmark experiments across database formats and execution engines. They work while the engineer sleeps, testing every possible permutation of a system to find the one that actually performs. This isn't just about speed; it's about a level of exhaustive rigor that no human could ever maintain without burning out.

The Agent Line

To manage this transition, engineers must learn to draw what Goyal calls the 'agent line.' This is the boundary between what requires human judgment and what can be delegated to a tireless agent. Decisions involving high-level product direction or complex stakeholder management remain human domains. However, the technical execution—the 'how' of a specific implementation—is increasingly falling below that line. If you can encode 'what good looks like' into a scoring function, an agent can iterate on the implementation until it meets that standard. This turns the role of the engineer from a builder into a designer of evaluation systems. The goal is no longer to write the code, but to build the feedback loops that allow the code to write itself correctly.

The best teams won’t just use AI to write more code; they’ll build the systems that let AI improve the quality of the product itself.

This shift requires a massive investment in Continuous Integration (CI) and evaluation (evals). In the old world, an 'eval' might have been a simple unit test. In the age of AI, an eval is a sophisticated benchmark that measures the quality, accuracy, and performance of an agent's output against a set of high-fidelity standards. Without these, teams fall into the trap of 'vibe checks'—subjectively deciding if an AI's response feels right. Vibe checks do not scale. They lead to regression, where fixing one bug introduces three more. To move fast, you must replace intuition with measurable, repeatable scoring functions.

Framework for AI Delegation

Define the 'what' (the outcome) clearly before delegating the 'how'.
Build a scoring function that captures expert taste.
Use agents to run exhaustive, multi-day benchmarks.
Treat your CI/CD pipeline as the primary driver of engineering velocity.

Ultimately, the competitive advantage in the next decade won't go to the company with the most engineers, but to the company with the best evaluation infrastructure. The engineers who thrive will be those who can translate human expertise—the subtle 'taste' of a designer or the deep logic of a staff engineer—into the mathematical constraints that guide autonomous agents. We are moving from a world of manual construction to a world of automated orchestration.

Key Takeaway

Engineering velocity in the AI era is determined by the quality of your evaluation systems, not the speed of your typing.

02 — Lenny's Newsletter

The Mythos Trap: High Intelligence, Low Utility

A review of Anthropic's Claude Fable 5 and the cost of extreme thoroughness

By Lenny Rachitsky · 10 min read

Editor's note: Not all intelligence is useful for all tasks. Knowing when to use a 'smart' model versus a 'fast' one is a critical business decision.

Anthropic's release of Claude Fable 5 marks a new tier in the model hierarchy: the 'Mythos-class.' This is a model designed for extreme reasoning and technical depth, capable of outperforming almost everything else on the market in complex benchmarks. But as Claire Vo's recent testing reveals, raw intelligence is not a universal solvent. Fable 5 behaves like a hyper-thorough, slightly pedantic senior engineer. It will investigate every possible edge case and verify every detail before it dares to suggest a solution. While this makes it a powerhouse for hard technical problems and complex vision tasks, it also makes it a liability for the rapid, iterative work that defines most product development.

The Cost of Perfection

The friction of Fable 5 is both economic and functional. At $50 per million output tokens, it is an expensive tool. Using it for simple tasks is a waste of capital. More importantly, its thoroughness often becomes a hindrance. When asked to produce product specifications or PRDs, Fable 5 produces documents so dense and laden with internal references that they become nearly unreadable. It gets lost in the details, losing the ability to communicate the 'big picture' to human stakeholders. In the rush to be 120% sure, it fails to be 100% useful. It lacks the ability to be 'just enough'—the quality that allows a team to ship an MVP and iterate based on real user data.

Sometimes you need a model that is a little less thorough to actually ship something useful quickly.

The model also shows surprising weaknesses in areas where one might expect strength. Its design capabilities are rudimentary, producing uninspired and aesthetically poor layouts. Its execution is overly conservative; when asked to deliver an MVP, it often produces something so narrow that it provides almost no value to a customer. This suggests that the safety guardrails integrated into the Mythos-class models might be dampening the creative and entrepreneurial spirit required for rapid prototyping. It is a model built for stability and correctness, not for the messy, creative leaps required in early-stage product work.

When to Deploy Fable 5

Hard technical problems requiring extreme detail.
Long-horizon reasoning tasks.
Complex vision tasks like PDF parsing and document formatting.
Deep research where accuracy outweighs speed.

The takeaway for product leaders is a lesson in matching intelligence to complexity. Using Fable 5 for everything is a recipe for high costs and slow shipping. The goal is to build a tiered intelligence architecture: use cheap, fast models for the bulk of the work, and reserve the Mythos-class models for the specific, high-stakes problems where their depth justifies their cost and their slow, methodical nature.

Key Takeaway

Intelligence is a resource to be managed, not a feature to be maximized at all costs.

03 — Cal Newport

The Theater of Productivity

Why AI isn't fixing work, because work was already broken

By Study Hacks · 7 min read

Editor's note: We often blame new technology for old problems. AI is merely a mirror reflecting our existing inefficiencies.

There is a growing narrative that AI is disrupting the workplace and causing chaos. But a closer look suggests a different reality. A recent survey of 6,000 digital workers found a striking paradox: while employees claimed AI saved them an average of 11 hours a week, only 13% reported any actual improvement in company performance. This gap suggests that the time 'saved' by AI is not being reinvested into productive work. Instead, it is being swallowed by new forms of inefficiency that are simply modern versions of old problems.

Botsitting and Tool Toggling

One reason for this inefficiency is a new phenomenon: 'botsitting.' This is the time workers spend waiting for AI agents to complete tasks, essentially staring at a progress bar. Furthermore, the sheer number of tools required to get a usable AI output creates a massive cognitive tax. Sixty per cent of workers report running queries across multiple different AI tools just to find a decent response. This constant toggling between applications mimics the fragmentation caused by email and Slack years ago. We haven't gained time; we've just changed the nature of the distraction.

AI isn't creating new problems; it is magnifying the types of problems that have long existed.

Then there is the issue of 'workplace theater'—or what Cal Newport calls 'pseudo-productivity.' This is the act of performing work to satisfy bosses and colleagues rather than focusing on the actual output. AI provides new props for this performance. It is easy to look busy while managing a swarm of AI agents, but if those agents aren't driving meaningful results, the work remains hollow. The technology allows us to move faster through the motions, but it doesn't necessarily move us closer to the goal.

The Three Drivers of AI Inefficiency

Botsitting: The idle time spent waiting for agentic completion.
Tool Fragmentation: The cognitive load of managing multiple AI interfaces.
Workplace Theater: Using AI to perform the appearance of work without substance.

The silver lining is that the arrival of AI has forced business leaders to finally pay attention to these systemic failures. For years, the inefficiencies of digital work were accepted as the cost of doing business. Now, as companies look to justify their AI investments, they are being forced to confront the fact that their workflows are fundamentally broken. The solution isn't more AI; it's better design of how humans and machines actually interact.

Key Takeaway

Efficiency is not the same as productivity; don't mistake moving faster for doing more.

04 — Lenny's Newsletter

The Proven, Better, New Framework

Lessons in product success from Zynga founder Mark Pincus

By Lenny Rachitsky · 9 min read

Editor's note: True innovation often starts with imitation. The most successful products aren't always the most original.

Mark Pincus, the founder of Zynga, has built a career on creating massive consumer hits like Words With Friends and FarmVille. His success isn't a result of constant, radical reinvention, but rather a disciplined adherence to a specific pattern. He calls it the 'Proven, Better, New' framework. The logic is deceptively simple: first, find something that is already proven to work in the market. Second, make it better—so much better that users feel an immediate, visceral 'yes' to the improvement. Only then do you add something truly new. This approach mitigates the massive risk inherent in trying to invent a new category from scratch.

The Fallacy of the Great Idea

Pincus offers a sobering perspective on intuition: your instincts are likely right 95% of the time, but your specific ideas are wrong 75% of the time. This distinction is vital for founders. It means you should trust your sense of what users want, but be extremely skeptical of the specific features or products you think will get them there. Successful product development is an exercise in constant course correction. It is about being willing to 'kill hope before hope kills you'—abandoning a cherished idea the moment the data suggests it won't work.

Your instincts are right 95% of the time, but your ideas are wrong 75% of the time.

This philosophy leads to a counterintuitive strategy: being less ambitious in the initial stages can actually lead to more ambitious outcomes. By focusing on perfecting a known mechanic or a proven social interaction, you build a foundation of user trust and engagement. Once you have captured the market with a 'better' version of an existing concept, you have the resources and the audience to experiment with the 'new.' This is how Zynga dominated the social gaming era—not by inventing new genres, but by taking existing social behaviors and making them more engaging and accessible.

The Pincus Playbook

Identify a proven market behavior.
Execute on the 'Better'—improve the core loop until it is undeniable.
Introduce the 'New' only after the foundation is solid.
Kill failing ideas quickly to preserve capital and morale.

For modern entrepreneurs, especially those working in the AI space, this is a crucial reminder. The temptation to build something entirely unprecedented is high, but the path to scale often lies in taking a known human need and applying a superior technological layer to it. Don't try to invent a new way to communicate; just make the current way of communicating significantly more effective.

Key Takeaway

Don't hunt for original ideas; hunt for proven patterns that you can execute better than anyone else.

05 — simonwillison.net

The Human Bottleneck

Why AI won't replace engineers, but will redefine their value

By Simon Willison · 8 min read

Editor's note: The fear of automation is as old as the industrial revolution. In software, the bottleneck is shifting from syntax to systems.

The anxiety surrounding AI and job displacement in software engineering is intense. If a model can write code faster than a human, the logical conclusion seems to be mass layoffs. However, the data does not support this. In jurisdictions where companies are required to disclose AI-related layoffs, the numbers remain remarkably low. This is because the core of software engineering has never actually been about the act of typing code into a computer. Coding is merely the final, most visible stage of a much more complex cognitive process.

Beyond the Syntax

When we break down what an engineer actually does, the bottlenecks that resist automation become clear. AI is excellent at the 'typing' phase, but it struggles with the three pillars of professional engineering: deciding what to build, verifying that it is correct, and maintaining a deep, contextual understanding of the business and the environment. An AI can generate a function, but it cannot sit in a meeting with a product manager to understand why a specific feature is necessary for a specific customer segment. It cannot be held accountable for the security or reliability of a system in a way that a human professional can.

The value I produce will still be reliant on how deeply I understand both the problems and the solutions.

The role of the engineer is shifting from a 'writer' to a 'reviewer and architect.' As AI handles the grunt work of boilerplate and syntax, the engineer's value moves up the stack. The focus shifts to specification, verification, and system design. The bottleneck is no longer how fast you can write code, but how accurately you can define the problem and how effectively you can verify the solution. This requires a deeper, more holistic understanding of the entire stack, from the business logic down to the infrastructure.

The Three Pillars of Human Engineering Value

Specification: Deciding and defining exactly what needs to be built.
Verification: Ensuring the output is correct, secure, and maintainable.
Contextual Understanding: Navigating the business, technical, and human environment.

The engineers who will be replaced are those who view themselves as mere translators of requirements into syntax. The engineers who will thrive are those who embrace their role as the ultimate authority on the 'why' and the 'how' of a system. AI is a tool that increases your leverage, but your leverage is still limited by the depth of your understanding.

Key Takeaway

Software engineering is a problem-solving profession that happens to use code; the code is the easy part.

06 — The Marginalian

The Seamstress of the Sea

Jeanne Villepreux-Power and the invention of the scientific gaze

By Maria Popova · 15 min read

Editor's note: True discovery requires more than just intelligence; it requires the invention of new ways to observe the world.

In the early 19th century, the scientific world was obsessed with a mystery: the shell of the *Argonauta argo*. Naturalists debated whether the shell was a permanent part of the creature or a temporary home, like that of a hermit crab. But observing this creature was nearly impossible. The argonaut is a skittish, shy animal that retreats into the depths at the slightest hint of approach. For decades, scientists were limited to studying dead, preserved specimens, which offered only a static and incomplete picture of the animal's life. They were looking at the world through a keyhole, missing the very essence of the subject they sought to understand.

The Invention of the Aquarium

Jeanne Villepreux-Power, a self-taught naturalist and former seamstress, realized that to solve the mystery, she had to change the method of observation. She didn't just need better microscopes; she needed a way to bring the living subject into a controlled environment without destroying its nature. To do this, she pioneered the concept of the aquarium. She constructed elaborate, anchored cages off the coast of Sicily, allowing her to observe living argonauts in their natural habitat through observation windows. Later, she moved this capability ashore, creating tanks that allowed for long-term, undisturbed study. She didn't just observe the animal; she engineered a way to see it.

She didn't just observe the animal; she engineered a way to see it.

Her work was a triumph of both patience and technical ingenuity. For ten years, she rowed her boat to these cages, often in long skirts and cold water, to feed and monitor the creatures. By observing the living animal, she was able to prove that the shell was not a hand-me-down, but a biological marvel produced by the female itself. Her observations of the shell's growth and the animal's behavior provided the first true understanding of this cephalopod's life cycle. She moved biology from the realm of the specimen to the realm of the living system.

The Methodology of Discovery

Identify the limitation of current observation methods.
Engineer a new environment that preserves the subject's natural state.
Prioritize long-term, longitudinal observation over single snapshots.
Combine technical invention with disciplined, repetitive study.

Villepreux-Power's legacy is a reminder that breakthroughs often come not from better answers, but from better questions and better ways of looking. In our current era of rapid technological change, we often focus on the speed of our processing. But Villepreux-Power teaches us that the most profound insights come when we take the time to build the frameworks that allow us to see the world as it actually is, rather than how we expect it to be.

Key Takeaway

The most significant scientific leaps are often made by those who invent new ways to observe reality.