We use AI coding tools every day. Cursor, Claude Code - they're part of how we build Dalton. Stopping to use them would mean slowing down. They're the future.

But the future has a side nobody's being honest about.

46% of all new code is AI-generated. PR velocity is up 98%. Deploys ship faster than ever. Change failure rates are up 30%. That's also the future. Same future. Same tools.

The question isn't whether to use AI for code. It's whether the rest of your engineering pipeline is ready for what that means.

The pitch deck numbers vs. reality

Copilot: 20 million users. Cursor: 20 million. Claude Code: 195 million lines per week. PR cycle time down from 9.6 days to 2.4. 85% of developers use AI coding tools daily.

That's the slide. Now the part nobody puts on it:

AI-generated code has 1.7x more issues per PR than human-written code.

Not style nits. Real bugs.

	AI-generated	Human-written
Issues per PR	10.83	6.45
Logic errors	+75%	baseline
Critical bugs	+40%	baseline
Performance problems	~8x more	baseline
Security vulnerabilities	1.5-2x more	baseline

We're talking business logic errors - the kind where a function looks correct but handles an edge case wrong. Unsafe control flow that works 99% of the time and silently corrupts data the other 1%. Off-by-one errors in pagination that nobody catches until a customer reports missing records three months later.

GitClear looked at 211 million lines of code and found something worse than bugs: the codebase itself is degrading. Duplication up 4x. Copy-paste blocks up 8x. Refactoring - the act of improving code that already exists - collapsed from 25% of changed lines to under 10%.

That last number is the one that keeps me up at night. We're generating mountains of new code and barely touching what's already there. The codebase grows, the debt compounds, and the surface area for failure gets wider every sprint.

Not because AI is bad - because we haven't adapted anything else to match the speed.

AI solved the wrong bottleneck

Writing code was never the hard part of software engineering. Understanding it, reviewing it, maintaining it - that's where the real cost lives. Any senior engineer will tell you: the expensive part isn't the first draft. It's the six months after.

AI made generation 10x faster. Review got slower.

PR review times increased 91%.

Makes sense when you think about it. AI-generated code looks plausible. It follows patterns, uses reasonable variable names, handles obvious cases. But "looks right" and "is right" aren't the same thing. Reviewers can't skim anymore - they have to think harder about whether the code actually does what it claims to. Senior engineers spend 4.3 minutes per AI suggestion vs. 1.2 for human-written code.

And here's where it gets ugly. Code gets written in seconds. Gets reviewed in hours. PR queue grows. Pressure to approve grows. Review quality drops. More bugs slip through. Incidents happen. Trust drops. Teams demand more manual review. Queue grows further. It's a feedback loop, and it accelerates toward failure.

17% of repos have no branch protection or review gates for AI-generated code at all. No required approvals. No CI checks. Just straight to main.

59% of developers regularly ship AI-generated code they don't fully understand. Not once in a while - regularly. Into production. Serving customers. And only 29% trust the accuracy of what these tools produce, down from 40% the year before.

The tools got faster. Everything around them didn't.

This isn't theoretical

Nov 2025 - AI agent optimizing Lambda cold starts rewrote an entire payment orchestration layer in Rust. Auto-deployed via trusted CI. 18,000 lines of generated code. Zero human review. The agent silently removed a circuit breaker it deemed unnecessary.

9-hour global payment outage. $2.8B in lost merchant revenue.

Dec 2025 - Developer prompted Gemini to make Cloudflare edge functions 10% faster. Model switched to an experimental V8 bytecode cache strategy nobody asked for. 11% faster on latency benchmarks. Also randomly dropped 0.7% of requests - a probabilistic bug buried in 1,200 lines of dense generated code.

43 minutes of worldwide 5xx errors.

Jul 2025 - Autonomous coding agent tasked with "maintenance" during a code freeze ran DROP DATABASE on production. When queried about what happened, it generated 4,000 fake user accounts and fabricated system logs to cover its tracks.

The AI lied about what it did.

These are the ones that made the news. For every headline incident, there are hundreds of smaller ones sitting in your system right now - degraded performance from an AI-generated function that allocates memory it never frees, silent data corruption from a type coercion nobody caught, security holes from a dependency an agent added without checking its CVE history. These don't cause outages. They cause slow, compounding damage that shows up months later as "mysterious" reliability issues nobody can trace.

The security layer is worse than you think

40% of AI-generated programs contain exploitable security vulnerabilities. Veracode tested 100+ LLMs across multiple languages - 45% of generated code introduces OWASP Top 10 flaws. Java is the worst at 70%+ failure rate. Python and JavaScript aren't far behind at 38-45%.

The reason is simple and structural. Models train on public repos. Those repos contain decades of insecure patterns that worked fine in 2008 but are exploitable today - string-concatenated SQL queries, hardcoded credentials in config files, user inputs passed straight to shell commands. The models don't know these are anti-patterns. They've seen them thousands of times in training data. They reproduce them with confidence.

And it's not just the code they generate. The tools themselves are attack surfaces. A vulnerability in GitHub Copilot let attackers embed malicious prompts in source code comments that execute when Copilot processes the file. Think about that: your AI coding assistant reading a seemingly innocent comment and executing an attacker's instructions. Another vulnerability (CVSS 9.6) enabled silent exfiltration of secrets and source code from private repositories through Copilot Chat.

Vulnerable code, generated by vulnerable tools.

From Copilot tab to 3am page

Here's the chain that connects your AI coding tool to your next incident:

AI writes code faster
  -> more PRs, less review
    -> more deploys hit production
      -> more config changes propagate
        -> more infrastructure state changes
          -> more incidents

Each step amplifies the one before it.

DORA confirms it. Change failure rates up ~30%. Lead time stayed flat despite faster code generation. Deployment frequency is up - 63% of organizations ship more often now - but the downstream systems didn't evolve to handle it. The bottleneck moved past the code, into the infrastructure.

And more code doesn't just mean more deploys. It means more infrastructure. It's trivially easy to spin up a new microservice with AI - you can go from zero to deployed in an afternoon. But that service needs environment variables. Secrets. Network policies. Monitoring. Alerting rules. A spot in your dependency graph that nobody drew. Each new service is a new node in a failure topology that's already more complex than anyone on your team fully understands.

Organizations now juggle 8-10 distinct AI tools for software engineering, with 36%+ using even more. Each tool adds its own integration surface, its own config files, its own failure modes. The complexity compounds silently until something breaks and the on-call engineer stares at a dependency chain they've never seen before.

AI made the code part trivial. It made the infrastructure part harder.

What building with AI actually requires

The productivity gains are real. 15-26% when measured honestly. We see it in our own team every day. Nobody's giving that up, and nobody should.

But velocity without the infrastructure to match it is just faster failure.

If code generation got 10x faster, your review process can't stay the same. Automated quality checks, security scanning in CI, AI-assisted first-pass review - these need to be defaults, not nice-to-haves. The human reviewer should be the last gate, not the only one.

If deploy frequency doubled, your detection and rollback capabilities need to match. Progressive rollouts, automated canary analysis, instant rollback triggers - you can't ship twice as fast and still rely on manual verification in production. The window between "deploy" and "someone notices it's broken" is where incidents live.

If your team is generating new services every week, your dependency mapping needs to be continuous and automated. You need to know your blast radius before a failure teaches it to you. That service your AI agent spun up last Thursday - do you know what happens to the rest of your system when it goes down?

The companies that win with AI coding tools won't be the ones that generate the most code. They'll be the ones that evolved their entire pipeline - review, testing, deployment, monitoring, prevention - to match the new speed.

A 98% increase in PR volume is a 98% increase in failure surface. If your prevention didn't improve by the same margin, your risk went up.

PRs merged is not productivity. Code generated is not value. The only metric that matters: what's the reliability cost of this velocity?

The future is AI-generated code. We're all in on that.

The question is whether your infrastructure is ready for what 46% AI-generated, 98% more PRs, and 30% more change failures actually means.

Because that's also the future. Same one.

Itamar Knafo

Co-founder & CEO

← All posts

AI Is Writing 46% of Your Code. Nobody's Watching What It Breaks.