Shipping at Inference Speed: My Takeaways on Coding with AI Agents

I just finished reading Peter Steinberger’s Shipping at Inference Speed, and it’s fundamentally reshaping how I think about building software with AI agents. Here are my key takeaways.

The Bottleneck Has Shifted

The most striking realization from Steinberger’s post is that the bottleneck in software development has fundamentally changed. With tools like GPT-5.2 Codex, throughput is now constrained more by model latency and higher-level design choices than by manual coding. “Vibe-coding” has matured to the point where many prompts produce working code on the first try.

This resonates with my experience. I find myself spending less time writing code and more time thinking about architecture, component layout, and system design. The implementation details? Those increasingly get delegated to the model.

Codex vs. Opus: The Thorough Reader Wins

Steinberger’s comparison between Codex and Opus is illuminating. Codex tends to spend 10–15 minutes silently reading large amounts of project code before writing anything. This makes it slower wall-clock, but more reliable for big refactors or features.

Opus, by contrast, is more eager but often misses context. The result? Codex’s outputs need fewer follow-up fixes, so total human time is often lower even when Codex takes up to four times longer per run.

The lesson here is patience. That “slow” model that reads everything first might actually be the faster path to shipping.

The “Oracle” Pattern and GPT-5.2

I was intrigued by Steinberger’s “oracle” CLI wrapper around GPT-5 Pro. It uploads files and prompts, then does long-running, high-quality research across many websites. This became his secret weapon for solving stuck agent tasks.

With GPT-5.2, he notes that oracle is needed much less often because the base model one-shots most real-world coding tasks. The more recent knowledge cutoff (end of August vs. Opus’s mid-March) makes a meaningful difference.

The example that sold me: Codex converted VibeTunnel’s core forwarding system from TypeScript to Zig from a two-sentence prompt in a single 5-hour run. Earlier models repeatedly failed at this.

Parallel Projects as a Workflow Strategy

Steinberger runs 3–8 projects in parallel, heavily using Codex’s queueing to pipeline ideas. He treats himself—not the models—as the main bottleneck.

This flips conventional thinking. Instead of focusing on one project until completion, you queue up prompts across multiple projects and let inference time work for you. It’s treating AI agents like an async job queue.

What’s Still Hard

The hardest remaining problems aren’t the ones AI can easily solve:

Dependency selection: Picking durable, well-maintained libraries still requires human judgment
System design decisions: Protocols, client/server boundaries, and data flow are harder to offload to models
Architecture: These higher-level decisions remain firmly in the human domain

Practical Advice That Stuck With Me

A few tactical insights I’m taking away:

Start every product as a CLI: Build it so agents can call it directly and verify output, then layer UI on top. Steinberger built a YouTube-summarizing Chrome extension in a day after first nailing a summarization CLI.
Use short prompts, augmented with images: Simple directives like “fix padding” on a clipped component often work better than elaborate instructions.
Standardize your model config: Instead of juggling many model modes, pick one good setting and stick with it. Steinberger uses gpt-5.2-codex with model_reasoning_effort = "high" for almost everything.
Skip “plan mode”: He views older plan-mode styles as legacy hacks. Conversational planning followed by simple commands like “build” or “write plan to docs/*.md and build this” works better.

What This Means for Developers

We’re entering an era where implementation speed is no longer the limiting factor. Design thinking, system architecture, and high-level decision-making are becoming the primary skills that matter.

Language choice still matters—but differently. Steinberger uses TypeScript for web, Go for CLIs, and Swift for macOS/UI work, partly because agents handle these ecosystems and their tooling efficiently.

The developers who thrive will be those who learn to think at a higher level of abstraction while trusting the AI to handle the implementation details. It’s a significant mental shift, but the productivity gains are substantial.

Source: Shipping at Inference-Speed by Peter Steinberger

Shipping at Inference Speed: My Takeaways on Coding with AI Agents

The Bottleneck Has Shifted

Codex vs. Opus: The Thorough Reader Wins

The “Oracle” Pattern and GPT-5.2

Parallel Projects as a Workflow Strategy

What’s Still Hard

Practical Advice That Stuck With Me

What This Means for Developers

Ready to Transform Your Marketing?

Essential Cookies

Analytics Cookies

California Privacy Rights (CCPA/CPRA)

Global Privacy Control Detected