TL;DR

Google’s May 2026 whitepaper says AI coding is shifting software work from writing code to directing and verifying machine-generated output. The paper claims the agent harness, not the model alone, drives most behavior, making tests, evals, tools and context policies central to production use.

Google has published a May 2026 whitepaper, The New SDLC With Vibe Coding, that argues software engineering is moving from direct code writing toward intent-driven development, where AI agents generate code and humans focus on direction, verification and judgment.

The paper, by Addy Osmani, Shubham Saboo and Sokratis Kartakis, reports that as of early 2026, 85% of professional developers regularly use AI coding agents, 51% use them daily and about 41% of new code is AI-generated. Those adoption figures are attributed to the paper and its cited sources; the provided material does not include the underlying survey methods.

The authors frame vibe coding as one end of a spectrum rather than a label for all AI-assisted programming. At the casual end, developers use short prompts, light review and trial-and-error fixes. At the disciplined end, the paper describes agentic engineering: formal specs, automated tests, evals, CI gates, sandboxes and human review of architecture.

The paper’s central claim is that an agent is the model plus the harness around it. That harness includes prompts, tools, context rules, hooks, sub-agents, sandboxes and observability. The paper cites examples in which teams improved benchmark performance by changing the harness while keeping the same model, including a reported move from outside the top 30 to the top five on Terminal Bench 2.0 and a LangChain experiment that raised an agent score by 13.7 points.

AI Dispatch · Field Notes

Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified

Vibe Coding

Casual prompts · “does it seem to work?” · disposable code · high risk

Structured AI-Assisted

Detailed prompts + constraints · manual testing · features in real codebases

Agentic Engineering

Formal specs · automated tests + evals + CI gates · production scale · low risk

Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.

The idea worth building your strategy around

Agent = Model + Harness

~10%

HARNESS — prompts · tools · context · hooks · sandboxes · observability

MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S

Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.

“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.

The economics: it’s a token-cost problem (CapEx vs OpEx)

Vibe Coding

Low CapEx · High OpEx

Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.

Agentic Engineering

High CapEx · Low OpEx

Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.

85%

of devs use AI coding agents (51% daily)

41%

of all new code is AI-generated

~90%

of agent behavior is the harness, not the model

+19%

longer on some tasks (METR) — verification is the cost

The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.

thorstenmeyerai.com

Verification Moves To Center Stage

The argument matters because many teams still treat AI coding performance as a model-selection problem. The Google paper says that approach misses the larger engineering system that shapes agent behavior.

If the paper’s framing holds, software teams may need to spend less time waiting for the next model release and more time building repeatable controls around generated code. That means higher upfront work on specs, tests, evals and context management, but lower rework if agents produce usable output earlier.

The paper also frames the shift as an economics problem. Casual vibe coding may look cheap at first, but the source material says it can create later costs through token-heavy fix loops, fragile code and security remediation. Agentic engineering costs more to set up, according to the paper, but aims to make repeated feature delivery cheaper.

Coding with AI For Dummies (For Dummies: Learning Made Easy)

As an affiliate, we earn on qualifying purchases.

Karpathy Term Gets Narrowed

The term vibe coding was popularized by Andrej Karpathy in February 2025 to describe prompting an AI system, accepting much of what it returns and pasting errors back until the code works. The Google paper treats that workflow as useful for prototypes, disposable scripts and low-risk experiments, not as a sound default for production systems.

The source analysis from Thorsten Meyer AI says the paper is most useful when it separates casual prompting from structured AI-assisted development and agentic engineering. It also says the framework is mostly tool-agnostic, while Google’s product path points readers toward Gemini, Jules and the Agent Development Kit.

“generation is solved; verification, judgment, and direction are the new craft.”
— Osmani, Saboo and Kartakis, in Google’s May 2026 whitepaper

Amazon

automated testing software

As an affiliate, we earn on qualifying purchases.

Adoption Claims Need More Evidence

Several claims remain paper-sourced based on the material provided. It is not yet clear how the developer adoption figures were gathered, how AI-generated code was defined, or how representative the samples were.

The benchmark examples also need careful reading. The source says harness-only changes improved results on Terminal Bench 2.0 and in a LangChain experiment, but it does not provide full methods, task mix or reproducibility details. The cited METR finding that some AI-assisted tasks took 19% longer also leaves open how broadly that delay applies.

There is also a vendor-interest question. Thorsten Meyer AI describes the concepts as broadly useful while warning that Google’s on-ramps lead to its own AI tooling. Readers should separate the framework from any single provider’s product pitch.

Amazon

AI development sandbox

As an affiliate, we earn on qualifying purchases.

Harness Choices Move Into Practice

The next test is whether engineering teams turn the paper’s framework into production habits: written specs, stronger test suites, eval coverage, CI gates, context engineering and model routing. More independent benchmarks and case studies will be needed to show whether the claimed 10% model and 90% harness split holds across teams, languages and software domains.

Amazon

AI observability tools

As an affiliate, we earn on qualifying purchases.

Key Questions

What is the actual news development?

Google published a May 2026 whitepaper arguing that AI-assisted software development now depends on how teams direct and verify AI agents, not just which model they use.

What does model is only 10% mean?

The phrase refers to the paper’s claim that most agent behavior comes from the harness around a model: prompts, tools, context, rules, tests, evals, sandboxes and monitoring.

Is vibe coding the same as agentic engineering?

No. The paper treats vibe coding as the casual end of the AI coding spectrum. Agentic engineering is the more controlled form, with formal requirements, automated checks and human review.

Are the adoption numbers confirmed independently?

The figures in the source are attributed to the Google paper and its cited material. The provided source does not include enough detail to verify the sampling, definitions or methods behind those numbers.

Does the paper mean teams should use Google tools?

The framework can be read as tool-agnostic, according to Thorsten Meyer AI’s analysis. The same analysis says Google’s recommended path points toward Gemini, Jules and the Agent Development Kit, so readers should separate the engineering ideas from the vendor path.

Source: Thorsten Meyer AI

The Model Is Only 10%: The Real Lesson of the New SDLC

Author

The Genius Factory Team

Share article

The model is only 10%

Verification Moves To Center Stage

Coding with AI For Dummies (For Dummies: Learning Made Easy)