The 10 Failure Modes of AI-Assisted Coding
Every AI coding failure falls into one of 10 patterns. Learn them, and you can prevent them before they happen.
Why Failure Modes Matter
Most developers learn AI’s limitations through painful experience. A production bug here, a security vulnerability there, hours lost debugging hallucinated code. But these failures aren’t random — they follow predictable patterns.
We’ve catalogued 10 distinct failure modes from research, incident reports, and developer experience. Each one is preventable if you know what to look for.
The 10 Failure Modes
1. The Hallucination Spiral
Severity: CRITICAL
AI generates plausible but wrong code. You ask it to fix the error. It compounds the mistake. By turn 39, you have 693 lines of fabricated code (Surge AI documented this exact scenario).
Prevention: 2 corrections max. If the AI can’t get it right in 2 attempts, stop, rethink, and re-prompt from scratch.
2. The Comprehension Debt
Severity: CRITICAL
You ship code you don’t fully understand. It works — until it doesn’t. Now you’re debugging a system where the original “author” (the AI) can’t explain its own decisions.
Prevention: Document every AI-generated function. If you can’t explain a line, you can’t ship it.
3. Context Window Amnesia
Severity: HIGH
Long sessions cause the AI to “forget” earlier context. It contradicts its own earlier decisions, introduces inconsistencies, or loses track of your architecture.
Prevention: Use CLAUDE.md files, maintain handover documents, and watch for the signs: repeated questions, contradictory suggestions, loss of naming conventions.
4. The Automation Bias Trap
Severity: HIGH
You accept AI output because it “looks right” — the classic commission error. Or you miss a vulnerability because the AI didn’t flag it — the omission error. Parasuraman & Manzey (2010) documented this extensively.
Prevention: Systematic verification at every step. Not glancing — actually checking against your 5-layer verification stack.
5. The Confidence Delusion
Severity: HIGH
Stanford found developers WITH AI wrote less secure code while feeling MORE confident. The METR study found a 43-point gap between perceived and actual speed improvement. You literally cannot trust your own perception.
Prevention: Measure, don’t feel. Track actual metrics: bugs shipped, time to resolution, security issues found in review.
6. Security Blindness
Severity: CRITICAL
AI generates functional code, not secure code. 60-70% of AI-introduced vulnerabilities are BLOCKER severity (Sonar, 2026). The AI doesn’t think adversarially — it completes patterns, not threat models.
Prevention: Security review as a mandatory verification layer. Every AI-generated code path needs adversarial analysis.
7. The Sunk Cost Spiral
Severity: MEDIUM
You’ve invested 45 minutes in an AI conversation. It’s not working, but you keep going because of the time already invested. This is textbook sunk cost fallacy, amplified by the AI’s confident tone.
Prevention: The 2-correction rule. Time invested is irrelevant — only current trajectory matters.
8. Architecture Drift
Severity: HIGH
AI uses patterns from its training data, not patterns from YOUR codebase. Over time, each AI session introduces slightly different conventions, creating an inconsistent, unmaintainable codebase.
Prevention: CLAUDE.md with architecture decisions. Explicit style guides. Context files that encode YOUR patterns.
9. The Testing Illusion
Severity: MEDIUM
AI writes tests that pass but don’t actually verify behavior. Tests that check the implementation rather than the requirement. Green CI with zero real coverage.
Prevention: Review tests for meaningful assertions. Ask: “Would this test catch a real bug?“
10. The Productivity Theater
Severity: HIGH
DORA data shows: +98% individual output, +91% review time, +154% PR size, net delivery flat. You’re generating more code, but the team is spending all its time reviewing and fixing it.
Prevention: Measure team throughput, not individual output. The metric that matters is working software delivered, not lines generated.
The Pattern
Notice something? Every failure mode stems from the same root cause: trusting AI output without adequate human judgment.
The solution isn’t better AI. It’s better humans — specifically, humans trained in systematic verification.
- Take the Diagnostic to assess your vulnerability to these failure modes
- Read the Methodology for the complete prevention framework
- Explore the Evidence for the data behind each failure mode