📊 Full opportunity report: The Model Is Only 10%: The Real Lesson of the New SDLC on ThorstenMeyerAI.com — validation score, market gap, and execution plan.

TL;DR

A recent whitepaper by Google emphasizes that in AI development, the model itself accounts for only 10% of system behavior. The focus should be on harnessing, configuration, and context engineering, which constitute the remaining 90%. This shift has significant implications for AI strategy and costs.

A new whitepaper by Google researchers, Addy Osmani, Shubham Saboo, and Sokratis Kartakis, asserts that the model constitutes only about 10% of an AI system’s behavior. The paper emphasizes that harness and context engineering are far more critical in shaping AI performance, marking a paradigm shift in AI development strategies.

The whitepaper, titled The New SDLC With Vibe Coding, argues that the traditional focus on acquiring larger or more advanced models is misguided. For more on this topic, see The Model Is Only 10%. Instead, the authors highlight that the majority of system behavior depends on how the AI is configured, including prompts, tools, rules, and context management.

Evidence from experiments shows that changing only the harness or context can significantly improve AI performance, even with the same underlying model. Learn more about the importance of system configuration. For example, a coding agent moved from outside the Top 30 to the Top 5 on a benchmark by tweaking only its harness, not the model itself.

The paper introduces the concept of agentic engineering, where AI is integrated with structured verification, testing, and oversight, moving away from casual vibe coding toward disciplined, reliable systems.

Furthermore, the whitepaper discusses the economic implications, noting that costs are driven more by configuration and token economy than by the model size. High upfront investments in design and context management can lead to lower ongoing costs and higher reliability.

At a glance
reportWhen: published early 2026
The developmentGoogle’s new whitepaper highlights that the core of effective AI systems lies in harness and context engineering, not the model size, challenging common assumptions.
The Model Is Only 10% — The New SDLC With Vibe Coding
AI Dispatch · Field Notes
Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified
Vibe Coding
Casual prompts · “does it seem to work?” · disposable code · high risk
Structured AI-Assisted
Detailed prompts + constraints · manual testing · features in real codebases
Agentic Engineering
Formal specs · automated tests + evals + CI gates · production scale · low risk
Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.
The idea worth building your strategy around
Agent = Model + Harness
~10%
HARNESS — prompts · tools · context · hooks · sandboxes · observability
MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S
Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.
“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.
The economics: it’s a token-cost problem (CapEx vs OpEx)
Vibe Coding
Low CapEx · High OpEx
Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.
Agentic Engineering
High CapEx · Low OpEx
Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.
85%
of devs use AI coding agents (51% daily)
41%
of all new code is AI-generated
~90%
of agent behavior is the harness, not the model
+19%
longer on some tasks (METR) — verification is the cost
The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.
thorstenmeyerai.com

Impact of Harness and Context Engineering on AI Strategy

This development shifts the focus from model size and raw AI power to system configuration, harness design, and context management. For organizations, it means that building effective AI solutions depends more on how they structure and control AI interactions than on acquiring the latest, largest models. It also suggests that costs and reliability can be optimized through disciplined engineering practices, reducing long-term expenses and vulnerabilities.

Amazon

AI prompt engineering tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background of the Shift in AI Development Paradigms

Until early 2026, the industry largely equated AI performance with model size and complexity. Recent experiments and research, including this Google whitepaper, challenge this assumption, showing that configuration and engineering play a more decisive role. The paper builds on prior work emphasizing the spectrum of AI workflows, from vibe coding to agentic engineering, and highlights that cost and effectiveness are increasingly tied to system design.

Previous benchmarks and case studies demonstrate that even with the same models, performance and reliability can be vastly improved through better harness and context strategies, marking a significant evolution in AI development philosophy.

“The model constitutes only about 10% of what determines AI behavior; the harness and context engineering account for the rest.”

— Addy Osmani

Amazon

AI configuration management software

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Unanswered Questions About Practical Implementation

While the whitepaper provides strong evidence that harness and context are dominant factors, it remains unclear how these principles will be adopted across diverse industries and whether smaller organizations can implement such disciplined engineering at scale. The precise methods for optimizing harness and context in complex, real-world applications are still being developed.

Amazon

AI testing and verification tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Future Steps for AI Development and Industry Adoption

Expect ongoing research and case studies to refine best practices for harness and context engineering. Organizations are likely to invest more in system design, testing, and verification frameworks. Industry standards may emerge to guide disciplined AI development, emphasizing configuration over model size, with further benchmarks and tools to measure effectiveness.

Amazon

AI system monitoring tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Why is the model size less important than previously thought?

The whitepaper shows that system configuration, harness, and context management have a far greater impact on AI behavior than the underlying model size. Experiments demonstrate that performance improvements often come from better setup rather than larger models.

How does this shift affect AI development costs?

While initial investments in design, testing, and context engineering may be higher, the long-term costs decrease because ongoing token usage and maintenance are reduced. Proper configuration leads to more reliable and cost-efficient AI systems.

What does this mean for organizations using AI today?

Organizations should focus on building robust harnesses and managing context rather than solely acquiring larger or newer models. This approach enables better control, reliability, and cost savings in AI deployment.

Will this change how AI benchmarks are measured?

Yes, future benchmarks are likely to emphasize system configuration and engineering practices rather than just raw model performance, reflecting the shift in what determines AI effectiveness.

Source: ThorstenMeyerAI.com

You May Also Like

Brain‑Computer Interfaces: Controlling Devices With Thought

Harness the power of your mind with brain-computer interfaces and explore how this technology could redefine communication and accessibility for all.

Blue Origin’s New Glenn rocket exploded during a static fire test

Blue Origin’s New Glenn rocket experienced an explosion during a static fire test, damaging launch infrastructure and delaying future launches. Details are still emerging.

Best Thermal Paste and Pads for High-TDP GPUs

Top thermal interface materials for high-power GPUs running 24/7, including phase-change, traditional pastes, and reusable pads, with expert insights.

How Ambient Computing Changes Human Attention

Perhaps ambient computing enhances focus by seamlessly adapting to your environment, but its true impact on attention remains to be fully understood.