The Deep Feed

01 — Lenny's Newsletter

The Agent Line: Engineering Beyond the Typing Phase

How Braintrust is turning AI from a chatbot into a tireless infrastructure engineer

By Claire Vo · 12 min read

Editor's note: A deep look at how top-tier engineering teams are moving past simple code generation to actual autonomous system optimization.

The current discourse around AI coding often focuses on the novelty of generating a single function or a small script. This is a shallow way to view the technology. For companies like Notion, Stripe, and Vercel, the real value does not lie in writing boilerplate, but in the ability to manage complex, high-stakes infrastructure. Ankur Goyal, CEO of Braintrust, argues that the next frontier is not just 'writing code' but using agents to perform the kind of exhaustive, repetitive benchmarking that no human engineer has the patience or time to execute. Imagine a week-long experiment where an agent tests every possible database index, column store format, and execution engine to find the one configuration that makes a query run 10% faster. This is not just assistance; it is a fundamental shift in how technical problems are solved.

The Agent Line Framework

To manage this shift, Goyal introduces the 'agent line.' This is a mental model for deciding which parts of a technical workflow can be handed off to an autonomous agent and which require human oversight. Below the line are tasks that are repetitive, require massive data processing, or involve testing thousands of permutations. Above the line are the decisions involving architecture, business intent, and the final accountability for what is shipped. The goal is to move as much as possible below the line, allowing engineers to focus on the high-level direction rather than the tedious grind of manual testing and verification.

There is no excuse to skip rigorous benchmarking now that agents can run them tirelessly.

One of the most significant hurdles in AI adoption is the 'vibe check'—the tendency for engineers to look at an AI output and decide if it 'looks right.' This approach is dangerous for production systems. Instead, Goyal advocates for 'evals,' or evaluations. Evals are the modern equivalent of a Product Requirements Document (PRD). They encode exactly what 'good' looks like into a scoring function. By building these functions, teams can turn subjective taste—such as a designer's eye for layout—into a repeatable, automated metric. This allows quality to scale beyond the limited attention span of a single human expert.

Key components of an AI-accelerated engineering workflow:

Continuous Integration (CI) for AI agents to ensure code quality remains high
Automated benchmarking to replace manual performance testing
Rigorous evaluation functions to replace subjective 'vibe checks'
The use of background agents to run long-horizon experiments

Ultimately, the speed of an engineering team in the AI era will be determined by its CI/CD pipeline. If you can't verify what an agent produces, the agent becomes a liability rather than an asset. The highest leverage move for a CTO today is not buying more tokens, but fixing the feedback loops that allow those tokens to be measured and validated. Speed without verification is just a faster way to break your production environment.

Key Takeaway

The value of AI in engineering is found in its ability to perform exhaustive, boring, and high-scale verification that humans cannot match.

02 — Lenny's Newsletter

The Mythos Paradox

Testing the limits of Anthropic's Fable 5

By Lenny Rachitsky · 10 min read

Editor's note: A critical review of the new high-intelligence model tier and why more 'intelligence' isn't always better for product work.

Anthropic has released Fable 5, a model that sits in a new, expensive tier of intelligence. It is designed to act like a 'seasoned engineer'—thorough, autonomous, and obsessively detailed. While it crushes benchmarks, particularly in coding and vision, the real-world application of such a model reveals a strange tension. When you ask Fable 5 to perform a task, it doesn't just give you an answer; it investigates every corner of the problem to ensure it is 120% certain. This level of rigor is a superpower for deep technical work, but it can be a massive bottleneck for the fast-paced, iterative nature of product development.

The Cost of Over-Thoroughness

At $10 per million input tokens and $50 per million output tokens, Fable 5 is a luxury tool. Using it for every task is a recipe for financial disaster. The model's tendency to be 'too thorough' means it can become wrapped around the axle of detail. In testing, Claire Vo found that while the model is brilliant at parsing complex PDFs or creating precise handwriting worksheets, its writing for product specs is nearly unreadable. It produces dense, hyper-detailed blocks of text that make it impossible to see the forest for the trees. For a product manager who needs a clear, concise PRD, Fable 5 is often the wrong tool.

Sometimes you need a model that is a little less thorough to actually ship something useful quickly.

There is also a surprising failure in the realm of aesthetics. Despite its high benchmark scores, Fable 5's design output is remarkably poor. When tasked with designing a UI, it defaulted to basic, uninspired layouts using a limited color palette. This suggests a gap between logical reasoning and the ability to grasp human notions of style and visual hierarchy. It can solve a complex mathematical problem, but it cannot design a beautiful button. This reinforces the need for a tiered approach to AI: use the 'Mythos-class' models for the heavy lifting of logic and vision, but stick to lighter, more agile models for front-end and strategic tasks.

When to use Fable 5 vs. lighter models:

Use Fable 5 for: Hard technical problems, long-horizon reasoning, and complex document parsing
Use lighter models for: UI/UX design, rapid prototyping, and concise writing
Avoid Fable 5 for: High-volume, low-complexity tasks where cost-efficiency is key

The takeaway for product leaders is one of strategic deployment. Intelligence is a resource that must be managed. The goal is not to use the smartest model available, but to match the model's capability to the complexity of the task. Over-investing in intelligence for a simple task is as wasteful as under-investing in intelligence for a critical infrastructure problem.

Key Takeaway

High-intelligence models are specialized tools, not universal solutions; use them for deep reasoning, not for speed or style.

03 — Cal Newport

The Illusion of AI Productivity

Why saving 11 hours a week hasn't improved company performance

By Cal Newport · 7 min read

Editor's note: An analysis of why AI might be increasing the appearance of work without actually increasing the output.

A recent survey of 6,000 digital workers revealed a startling paradox: while employees claim AI saves them an average of 11 hours a week, only 13% report any actual improvement in company performance. This suggests that the time 'saved' by AI is not being converted into productive output. Instead, it is being swallowed by new forms of inefficiency. We are seeing the emergence of 'botsitting'—the time spent waiting for AI agents to complete tasks—and the cognitive tax of toggling between multiple AI tools to find a usable response. 60% of workers report running queries across several different models just to get a single decent result.

Workplace Theater and Pseudo-Productivity

The problem isn't necessarily the technology, but the culture of work it inhabits. Cal Newport argues that AI is simply magnifying existing issues like 'workplace theater.' This is the act of performing work—sending emails, appearing active on Slack, attending meetings—to signal productivity to managers, rather than focusing on the actual, difficult work of getting things done. AI provides new ways to perform this theater. It allows workers to generate more 'artifacts' of work (more emails, more documents, more summaries) without necessarily moving the needle on the company's core objectives.

AI isn't creating new problems; it is magnifying the types of problems that have long existed.

The tools of the last decade—email, Slack, and video conferencing—promised to make us more efficient, but they largely succeeded in making us more distracted. They increased the volume of communication while decreasing the depth of focus. AI is following the same trajectory. It offers a way to handle the 'grind' of digital tasks, but it also adds to the noise. If a worker saves 11 hours but spends those hours managing AI outputs or performing more visible versions of their old tasks, the net gain for the organization is zero.

The three drains on AI productivity:

Botsitting: The idle time spent waiting for agentic workflows to finish
Tool Toggling: The cognitive cost of jumping between multiple LLMs to find quality
Workplace Theater: Using AI to generate more visible, but less meaningful, work artifacts

For business leaders, the lesson is clear: don't measure AI success by the number of hours saved or the volume of content produced. Measure it by the quality of the outcomes. If the implementation of AI doesn't lead to better products, faster shipping, or more solved problems, then it is just another layer of digital friction.

Key Takeaway

Efficiency in tool usage does not equal effectiveness in business outcomes; avoid the trap of using AI to perform more 'work theater'.

04 — Simon Willison

The Human Bottleneck

Why software engineering remains safe from mass automation

By Simon Willison · 8 min read

Editor's note: A look at why the most automatable profession is proving surprisingly resilient to job displacement.

The narrative that AI will cause mass layoffs in software engineering is failing to meet reality. Despite being a profession uniquely suited to disruption—one where the primary output is digital and highly structured—the data does not support the alarmist predictions. In New York, for example, companies are required to disclose if AI is the reason for layoffs under the WARN Act, and in the first year of this requirement, not a single company has cited AI as a reason for job cuts. AI is certainly speeding up the 'typing' phase of coding, but it is not replacing the engineer.

Beyond the Code

If writing code is no longer the primary bottleneck, what is? The real work of software engineering happens in the spaces that AI cannot yet navigate: deciding what to build, specifying requirements, and verifying that the solution actually works in a complex, real-world environment. These are tasks of judgment and accountability. An AI can generate a function, but it cannot take responsibility for a system failure that costs a company millions of dollars. It cannot sit in a meeting with stakeholders to understand the unstated business needs that drive a feature request.

The value I produce will still be reliant on how deeply I understand both the problems and the solutions.

The true value of an engineer lies in 'deep human understanding.' This is the ability to grasp the intricate relationship between the codebase, the business logic, and the physical or digital environment in which the software operates. An AI can suggest an optimization, but it doesn't understand why a specific architectural choice was made three years ago to accommodate a legacy client. It lacks the context that turns a coder into an engineer. As AI handles more of the syntax, the engineer's role shifts toward being a high-level architect and a rigorous verifier.

The three pillars of resilient engineering:

Specification: Deciding and defining exactly what needs to be built
Verification: Ensuring the output is correct, secure, and maintainable
Contextual Understanding: Navigating the business and technical history of a system

The engineers who will thrive are those who lean into these non-automatable skills. Instead of fighting the AI, they should use it to automate the low-value syntax so they can spend more time on the high-value architecture and problem-solving. The bottleneck hasn't disappeared; it has simply moved up the stack.

Key Takeaway

Software engineering is about decision-making and accountability, not just writing syntax; AI automates the latter, making the former more important.

05 — Stratechery

The National Security of Intelligence

Anthropic, the US Government, and the Mythos conflict

By Ben Thompson · 12 min read

Editor's note: An analysis of the escalating tension between AI labs and national security regulators.

The conflict between Anthropic and the U.S. government is not just a legal dispute; it is a fundamental clash over the nature of AI capability and control. When Anthropic announced that the government had issued an export control directive to suspend access to its Mythos and Fable models, it highlighted a growing reality: the most powerful AI models are increasingly viewed as dual-use technologies with significant national security implications. The government's concern is that these models, particularly Mythos, possess advanced cybersecurity capabilities that could be exploited by foreign actors to identify and attack critical infrastructure.

The Jailbreak Dilemma

Anthropic's defense rests on the idea that they can build sufficient guardrails to make powerful models safe for general use. They argue that 'Fable,' the safer version of Mythos, is a viable way to provide intelligence without providing a weapon. However, the government's intervention suggests a lack of confidence in this approach. The recent discovery of a 'jailbreak'—a method to bypass these safety constraints—has only intensified the debate. If a model can be tricked into performing the very tasks it was designed to prevent, the argument for 'safe' release becomes much harder to sustain.

If it's not powerful enough now, the next one will be.

There is a cynical view that Anthropic's safety concerns are a marketing tactic—a way to create a sense of scarcity and importance around their models. By claiming a model is 'too dangerous' to release, they build its reputation. However, this cynicism ignores the economic reality of the AI industry. The leading labs are spending tens of billions of dollars on models that are quickly commoditized by open-source alternatives. In this environment, differentiation is everything. Being the 'responsible' leader in a field of perceived reckless actors is a powerful way to build institutional and political capital.

Key drivers of the AI-Government conflict:

Dual-use capability: The overlap between high-level reasoning and cybersecurity expertise
The inevitability of jailbreaks: The technical difficulty of perfectly constraining a model
Economic competition: The race to dominate the AI market versus the need for national security controls

As models become increasingly capable of assisting in their own creation, the cycle of capability and regulation will only accelerate. The question is no longer whether AI will impact national security, but how much control the state can—or should—exert over the companies building the most powerful engines of intelligence.

Key Takeaway

The tension between AI innovation and national security is inevitable as models transition from simple tools to sophisticated agents capable of cyber warfare.

06 — Stratechery

The Renter's Strategy

Fox, Roku, and the battle for streaming leverage

By Ben Thompson · 9 min read

Editor's note: A look at how traditional media is attempting to reclaim power in the streaming era through platform acquisition.

The market's reaction to Fox's acquisition of Roku was predictably negative, but the move reveals a calculated shift in strategy. For years, traditional media companies have been at the mercy of rights holders and platform owners. They have been the content providers, forced to play by the rules of tech giants like Netflix or Amazon. By acquiring Roku, Fox is attempting to flip the script. They are moving from being a mere provider of content to being a renter of the platform itself, gaining direct leverage over the distribution layer.

Extraction vs. Leverage

The traditional model for media companies has been one of extraction: trying to get as much value as possible from their existing content libraries. But in a world dominated by streaming, extraction is not enough. You need distribution. Roku provides Fox with a direct interface to the consumer, a way to bypass the gatekeepers and control the advertising and subscription data. This isn't just about adding more channels to a box; it's about owning the ecosystem where the viewer makes their decisions.

Fox is trading extraction from rights holders for leverage as a renter.

However, this strategy is not without significant risk. Owning a platform like Roku requires a different set of competencies than producing television content. It requires expertise in software engineering, user experience, and data-driven advertising. Fox is essentially betting that it can successfully manage a tech company to protect its media business. If they fail to innovate on the platform side, they risk becoming a content provider on a platform that is losing relevance.

The strategic shift in media:

From content-only to content + distribution
From relying on third-party platforms to owning the platform layer
From maximizing library value to controlling the user interface

The success of this acquisition will be measured by whether Fox can turn Roku into a more powerful tool for their own media ecosystem. If they can, they will have found a way to survive the streaming wars. If they cannot, they will have simply bought a very expensive way to watch their own declining content.

Key Takeaway

In the streaming era, content is no longer king; distribution and the control of the user interface are the true sources of power.