All Articles AI security

Vibe Coding Has a Massive Security Problem: One in Three AI-Generated Apps Is Already Compromised

AI-assisted “vibe coding” is accelerating development velocity, but at a steep security cost. Real-world data shows that up to one in three AI-generated applications contains critical vulnerabilities, including exposed secrets, authentication bypasses, and insecure dependencies. Humans have finally built interns that work at machine speed and make security mistakes with equal enthusiasm. This article explains why AI-generated code is structurally prone to insecurity and provides a production-grade framework for shipping faster without compromising security.

April 16, 2026 20 min read Likhon
🎧 Listen to this article
Checking audio availability...

Vibe Coding Has a Massive Security Problem — And 1 in 3 Apps Is Already Compromised

By MD Bazlur Rahman Likhon | Senior Cloud & AI Engineer | brlikhon.engineer


The App That "Worked" Just Breached 50,000 Users

Picture this: a startup founder — smart, hustle-first, non-technical — discovers Lovable in early 2025. In three weekends, they vibe-code an MVP: user accounts, a dashboard, a basic API, email notifications. It works. They launch. Users sign up. Growth starts. Investors are interested.

Three months later, a security researcher emails them at 11 PM. Subject line: Your app is wide open.

What follows is a forensics nightmare. 847 hardcoded API keys sitting exposed in client-side JavaScript bundles. A full authentication bypass — changing a URL parameter from ?user_id=101 to ?user_id=102 grants access to anyone's account without a password. Sensitive PII: full names, email addresses, IBAN numbers, phone numbers — all accessible through unauthenticated public endpoints. Fifty thousand users. Every single one compromised.

This specific scenario is fictional. But it is not speculative. Escape.tech scanned 5,600 publicly available vibe-coded applications and found exactly this pattern — over 2,000 significant vulnerabilities, more than 400 exposed secrets, and 175 instances of exposed personally identifiable information including medical records, bank account numbers, and contact data. One in three of those apps contained serious security flaws that could be easily exploited by anyone with moderate technical skill.[^1][^2]

My name is Md Bazlur Rahman Likhon. I'm a Senior Cloud & AI Engineer with five Google Cloud certifications, four Azure certifications, and AWS AI/ML Scholar credentials. I build secure, production-grade AI systems — including CropMind, a multi-agent AI pipeline where security architecture was an engineering priority from day one, not an afterthought bolted on at launch.

This article is not an attempt to kill vibe coding. The tools are genuinely powerful, and the democratization of software development they enable is real. This is an attempt to save vibe coding from its own most dangerous blind spot — and to give every developer, founder, and engineering leader the framework to ship fast without burning everything down.


The Numbers Behind the Crisis

The data on AI-generated code security has been accumulating for years, but 2025 and early 2026 accelerated it from academic concern to operational emergency.

When Collins Dictionary named "vibe coding" its Word of the Year 2025, it was recognizing a genuine cultural inflection point: the moment AI-assisted development stopped being a niche developer tool and became a mainstream way to build software. Coined by Andrej Karpathy, the term captured something fundamental — the shift from deliberate, line-by-line programming to describing what you want and letting AI figure out the rest. By end of 2025, approximately 41% of all code globally was AI-generated or AI-assisted, and that number is projected to climb to 65% by 2027.

The velocity is extraordinary. The security posture is not keeping pace.

Veracode's 2025 GenAI Code Security Report tested code output from more than 100 large language models across 80+ coding tasks and found that 45% of AI-generated code samples contained security vulnerabilities — including OWASP Top 10 flaws. Cross-Site Scripting had an 86% failure rate. Log Injection hit 88%. The most uncomfortable finding in the Veracode report: newer and larger models did not produce more secure code than smaller ones, suggesting this is a structural problem baked into how AI generates code — not a temporary limitation that will scale away with the next model release.

Then came the CodeRabbit study. Released in December 2025, CodeRabbit's State of AI vs. Human Code Generation report analyzed 470 real-world pull requests from open-source GitHub projects, comparing AI-assisted code against human-authored code. The findings were not subtle. AI-generated code introduced 1.7 times more total issues than human-written code. Security flaws were 2.74 times more prevalent — with XSS vulnerabilities specifically running at 2.74×, insecure direct object references at 1.91×, and improper password handling at 1.88×. When normalized to 100 pull requests, critical security issues rose from 240 in human-authored code to 341 in AI co-authored code — a roughly 40% increase in the most serious category.

The SUSVIBES benchmark, published by researchers from Carnegie Mellon University and LeiLi Lab, provided perhaps the most clarifying data point of all. The benchmark evaluated frontier AI agents on 200 realistic coding tasks drawn from real GitHub repositories — tasks that covered 77 different security weakness types from the CWE database. The results exposed a critical distinction that most vibe coding discourse misses entirely.

The problem isn't that AI code doesn't work. The SUSVIBES benchmark showed that frontier agents can solve over 50% of real-world coding tasks functionally correctly. The problem is that over 80% of that functionally correct code still fails security tests. The code works. It just opens your users to attack.

The trajectory of real-world exploitation makes this concrete. Georgia Tech's Vibe Security Radar — run by the Systems Software & Security Lab — tracked CVEs formally attributed to AI-generated code beginning in May 2025. In January 2026: 6 CVEs. February 2026: 15. March 2026: 35. And the researchers estimate the real number is 5 to 10 times higher, because most AI tools leave no commit metadata, meaning AI-introduced vulnerabilities in open-source projects simply aren't attributed to their actual origin. The researchers estimate 400–700 AI-introduced vulnerabilities are already sitting in open-source projects, unattributed and unpatched.

Stanford University researchers, led by cryptographer Dan Boneh's team, added an especially unsettling dimension to this picture: developers who used AI coding assistants produced code with security vulnerabilities 40% of the time on security-sensitive tasks — and were more likely to believe they had written secure code than those who worked without AI assistance. The tool doesn't just introduce vulnerabilities. It actively undermines the developer's ability to perceive them.


Why AI-Generated Code Is Structurally Vulnerable

Understanding why this happens requires stepping back from the "AI is broken" narrative, because that framing misses the actual root cause.

AI coding assistants are trained on code scraped from the public internet — GitHub repositories, Stack Overflow threads, tutorial blogs, open-source projects, documentation examples. This training corpus contains extraordinary amounts of good code. It also contains years of legacy patterns, deprecated cryptographic implementations, unsafe query constructions, and the kind of "it works for this demo" shortcuts that developers write every day but would never commit to a production authentication system. The model learns from all of it.

More critically, an AI has no context about your architecture. It doesn't know your threat model. It doesn't know that this endpoint handles healthcare data, or that this function is the only guard between anonymous traffic and your entire user database. It doesn't know which fields in your schema are PII, or that your company is subject to PCI DSS compliance. When it generates code, it generates code that is plausible given the prompt — not code that is appropriate given your security requirements. Those two things can look identical at a glance and be catastrophically different in production.

The AI also optimizes for momentum. Its job is to produce a working solution to the stated problem, and it does that job well. The unstated requirements — sanitize this input, enforce authorization before reaching this logic, log this failure for your SIEM, rotate this credential — are precisely the requirements that don't make it into prompts because non-security-expert developers don't know to ask for them. OWASP's own research found that 63% of AI-coded applications fail to properly validate user inputs — the most fundamental defensive layer in web security.

The five most common vulnerability patterns in AI-generated code follow directly from these structural gaps:

Vulnerability Pattern How AI Introduces It Real-World Consequence
Unsanitized user inputs AI generates direct string interpolation into queries without parameterization SQL injection and XSS attacks; full database exfiltration
Hardcoded credentials and API keys AI scaffolds secrets inline for "working" examples; no rotation awareness Secrets exposed in client-side bundles; 400+ found in 5,600 apps scanned
Missing authorization checks AI implements authentication (who are you?) but skips authorization (what can you do?) Auth bypass; any authenticated user accesses any other user's data
Weak cryptography AI reuses MD5, SHA1, or plaintext for passwords because these patterns are abundant in training data Password database crackable in hours; token forgery
Silent auth failures AI returns error messages without logging, alerting, or audit trails Brute-force attacks go undetected; no forensics capability after a breach

The Contrast: Bad Code vs. Secure Code

Nothing makes this more concrete than side-by-side code. Below is a representative example of what AI tools typically generate for a basic login endpoint — and what that same endpoint should actually look like.

BAD — Vibe-coded authentication (what AI typically generates)

def login(username, password):
    user = db.query(f"SELECT * FROM users WHERE name='{username}'")
    if user and user.password == password:  # plaintext comparison
        session['user'] = username
        return redirect('/dashboard')
    return "Invalid credentials"

GOOD — Secure implementation (what it should look like)

def login(username, password):
    user = db.query("SELECT * FROM users WHERE name = ?", (username,))  # parameterized
    if user and bcrypt.checkpw(password.encode(), user.hashed_password):
        session['user_id'] = user.id  # store ID, not username
        logger.info(f"Successful login: {username}")
        return redirect('/dashboard')
    logger.warning(f"Failed login attempt: {username}")  # audit trail
    return "Invalid credentials"

The differences are not cosmetic. In the bad version, the f-string query construction enables SQL injection: an attacker entering ' OR '1'='1 as a username bypasses authentication entirely. The plaintext password comparison means the database stores raw passwords — one breach exposes every user's credentials across every site where they reuse passwords. The session stores a mutable username instead of an immutable ID, making session manipulation trivial. And there is no logging anywhere — a brute-force attack hitting this endpoint thousands of times per hour would generate no observable signal.

The secure version uses parameterized queries that make SQL injection structurally impossible. bcrypt.checkpw() uses a cryptographic hash that is computationally infeasible to reverse and automatically handles salting. The session stores user.id, which cannot be socially engineered. And every login attempt — successful or failed — is logged with enough context for a SIEM to detect anomalous patterns.

This is what MD Bazlur Rahman Likhon means when he says that AI-generated code often works while remaining fundamentally insecure. Both functions above return the right user to the right dashboard on valid credentials. Only one of them keeps 50,000 users safe.


The 6 Things You Should Never Vibe Code

Not all code carries equal security stakes. The following six categories are where the structural limitations of AI-generated code intersect most dangerously with real-world consequences:

1. Authentication systems. AI reliably generates code that authenticates users — verifying identity. What it consistently misses is authorization — verifying permissions. It produces predictable session tokens, skips expiration logic, and rarely implements account lockout or anomaly detection. The consequence: any user who authenticates can access any other user's resources. This is the most common critical vulnerability pattern in vibe-coded applications.

2. Payment processing. AI doesn't understand what it doesn't know about financial systems. It misses race conditions, double-spend edge cases, refund logic bypass paths, and PCI DSS requirements around data handling. The consequence: a missed edge case in payment logic becomes a financial fraud vector that operates silently until someone notices the discrepancy in monthly reconciliation — or until a researcher does.

3. Sensitive data handling. AI has no context-aware sense of which fields in your schema constitute PII, PHI, or PCI data. It doesn't know that date_of_birth combined with zip_code is a HIPAA-relevant combination. It logs everything, caches everything, and passes everything through error messages. The consequence: 175 instances of PII were exposed in production endpoints across Escape.tech's scan — medical records, IBANs, phone numbers — all accessible without authentication.

4. Access control logic. Confused deputy attacks, privilege escalation paths, and indirect object reference vulnerabilities require a developer to think adversarially about their own system. AI generates the happy path. It is rarely trained to ask "what happens if an authenticated user modifies this parameter?" The consequence: horizontal privilege escalation where a user can access any other user's data by changing a single URL parameter.

5. SQL/NoSQL query builders. Even when AI uses parameterized queries for simple SELECT statements, it often falls back to string interpolation for complex dynamic queries with variable WHERE clauses, ORDER BY fields, or search functionality. The consequence: prompt injection at the data layer — your search bar becomes a SQL injection point that AI code review misses because the logic looks intentional.[^8][^20]

6. Cryptographic implementations. AI training data is full of examples using MD5 for checksums, SHA1 for file verification, and base64 for "encoding" (not encryption) sensitive values. It reuses these patterns confidently in security-sensitive contexts because they worked in the non-security-sensitive contexts where it learned them. The consequence: password databases crackable in hours with commodity hardware; "encrypted" data that is actually just encoded and trivially reversible.


The Secure Vibe Coding Framework: 6 Rules That Let You Ship Fast Without Burning Down

The goal is not to abandon AI-assisted development. It is to integrate it with the security discipline that production systems require. Here is the framework MD Bazlur Rahman Likhon applies when building AI systems — including the architecture decisions behind CropMind:

Rule 1: Treat AI as a Junior Developer — Review Everything

A junior developer who codes enthusiastically and reviews nothing is a liability. AI code has the same profile: energetic, productive, and in need of senior engineering oversight before it ships. Every pull request containing AI-generated code deserves a human review with specific attention to the five vulnerability patterns in the table above. This is not about distrust — it is about understanding what AI does and does not know about your specific system.

Rule 2: Never Let AI Write the Security-Critical Components

The six categories above — authentication, payments, sensitive data, access control, query builders, and cryptography — should be written by engineers who understand the threat model, not prompted into existence. AI can assist with everything surrounding these components: the UI, the error handling UX, the test scaffolding, the documentation. The security-critical logic itself requires deliberate, human-authored implementation.

Rule 3: Run Automated Security Scans Before Every Merge

SAST tools are not optional overhead — they are the automated safety net that catches what human review misses, especially at scale and velocity. Semgrep runs as a GitHub Actions workflow and blocks merges on high-severity findings using rule-based pattern matching. Snyk Code uses a semantic code graph to detect cross-file vulnerability patterns including injection and insecure dependencies. CodeRabbit integrates into the PR review itself, providing inline security annotations before a human even opens the diff. Layering at least two of these tools — one for custom code, one for dependency CVEs — covers the primary attack surface of AI-generated code at the commit stage.

Rule 4: Secret Detection as a Mandatory Gate

Given that Escape.tech found 400+ exposed secrets across 5,600 vibe-coded applications, secret detection is not a best practice — it is a baseline requirement. Gitleaks runs as a pre-commit hook, scanning for 150+ known credential patterns before code enters git history. TruffleHog goes deeper in the CI pipeline, using entropy analysis and active credential verification to test whether detected secrets are still live. The recommended architecture: Gitleaks at commit time for speed, TruffleHog in CI for depth and verification, with periodic TruffleHog full-history sweeps on all repositories where AI tooling has been active.

Rule 5: Build a Secure Prompt Library

The gap between a naive prompt and a security-aware prompt produces dramatically different code. Teams who invest in a shared library of prompts that include explicit OWASP requirements are not just writing better prompts — they are encoding institutional security knowledge into the development workflow.

A naive prompt: "Write a login endpoint."

A secure prompt: "Write a login endpoint that follows OWASP Top 10 guidelines. Use parameterized queries to prevent SQL injection. Hash passwords with bcrypt. Store only user ID in the session, never username or email. Log all authentication attempts — both successful and failed — with timestamp, username, and IP address. Return a generic error message on failure. Implement rate limiting to block brute-force attempts."

The second prompt still needs human review. But it starts from a dramatically safer baseline. MD Bazlur Rahman Likhon treats this prompt library as a living engineering asset — one that evolves as the threat landscape and the tools evolve.

Rule 6: Test Against OWASP Top 10 Before Launch

Before any vibe-coded application goes to production, it should be tested against the six highest-impact OWASP Top 10 categories:

  • Broken Access Control (A01): Test whether any authenticated user can access another user's resources by modifying IDs, parameters, or paths[^26]
  • Cryptographic Failures (A02): Verify passwords are hashed, sensitive data is encrypted at rest and in transit, and no plaintext credentials appear in logs or error messages
  • Injection (A03): Validate that all query construction uses parameterized statements; test search fields and filter parameters specifically
  • Insecure Design (A04): Evaluate whether the application's architecture has rate limiting, account lockout, and session management
  • Security Misconfiguration (A05): Check default credentials, verbose error messages, exposed debug endpoints, and misconfigured cloud storage permissions

This testing step is where the majority of the critical vulnerabilities Escape.tech found would have been caught before they reached production.


Bazlur's Take: How I Approach AI-Assisted Code in Production

When I build production systems like CropMind — a multi-agent AI pipeline that processes real agricultural data with real stakes — AI accelerates my velocity. It does not own my security posture.

The distinction matters. In CropMind, AI-generated code handles data transformation logic, API scaffolding, utility functions, and test generation. The authentication layer, the data access policies, the credential management, and the cloud IAM configurations are written by hand, reviewed against our threat model, and never committed directly from an AI output without deliberate modification. CropMind runs on Google Cloud infrastructure with security controls that were designed as engineering priorities, not patched in after complaints.

My code review discipline looks like this: every AI-generated function that handles user data gets a checklist review before it reaches the main branch — input validation, authorization check, logging, and error handling. Not because I distrust the AI, but because I know what it was trained on and I know what it doesn't know about our system. Code review is not a quality gate in my workflow. It is a security gate.

The Stanford finding that AI users are more likely to believe they wrote secure code is the most dangerous dynamic in this entire conversation. Competence and confidence need to track together. When AI produces clean, readable, functional code, it creates a psychological signal that the code is good — including secure. That signal is unreliable. The best engineers I know who use AI tools heavily are the ones who have internalized this gap and review AI code with more scrutiny than human code, not less.

Vibe coding will mature. The security tooling will improve. Better training data, better context injection, better security-aware prompting strategies will all raise the baseline. But the teams that survive this current era — the one where 35 CVEs are being attributed to AI-generated code per month and the real number is 5 to 10 times higher — are those who treat AI code as a starting point, not an endpoint. MD Bazlur Rahman Likhon's engineering philosophy, whether in CropMind or any other production system, begins and ends with that principle.


â“ Frequently Asked Questions

Q1: Is vibe coding actually dangerous, or is this overhyped?

The data makes the answer clear: it is genuinely dangerous in its current form, particularly for security-sensitive application components. Escape.tech's scan of 5,600 apps finding 2,000+ vulnerabilities and 400+ exposed secrets is not a theoretical projection — it is a measurement of real, deployed, publicly accessible applications. The SUSVIBES benchmark's finding that over 80% of functionally correct AI-generated code fails security tests means the "it works" heuristic developers typically rely on to gauge code quality is structurally unreliable for vibe-coded output. The risk is not hypothetical. It is already in production. The question is whether a specific engineering team has built the processes to manage it.

Q2: Can I use vibe coding in an enterprise environment?

Yes — with governance structures that the tools themselves do not provide out of the box. The six categories outlined above (authentication, payments, sensitive data, access control, query builders, cryptography) should have a clear policy: either AI-assisted with mandatory senior security review, or explicitly off-limits for AI authorship entirely. Enterprise environments also need to address what security researchers call "shadow AI" — developers using personal AI coding tools on company repositories without organizational visibility, potentially exposing proprietary code and architecture details to third-party model providers. A formal AI-in-development policy that covers tooling approval, secret handling, and mandatory SAST gate requirements turns a liability into a managed risk.

Q3: What SAST tool should I use to scan AI-generated code?

There is no single perfect tool — the practical answer is a layered approach. Start with Semgrep as your base layer: it is open-source, runs fast in CI, and its rule-based approach is highly configurable to your stack and security policies. Add Snyk Code as a second layer for its semantic code graph, which catches cross-file vulnerability patterns that pattern-matching tools miss — particularly insecure dependency chains and inter-function data flow issues. If you are using CodeRabbit for AI-assisted code review already, its security annotations in the PR diff provide inline feedback before human review begins. For infrastructure-as-code produced by AI agents, Checkov or tfsec add a layer specifically targeting misconfigured cloud resources — a category Escape.tech found to be pervasive in vibe-coded applications.

Q4: How do I find exposed secrets in my vibe-coded codebase right now?

Run TruffleHog against your full repository history immediately — not just the current HEAD. The command trufflehog git file://. --json will scan every commit in your local repository and output confirmed credentials with their source commits. TruffleHog's credential verification feature will test whether detected secrets are still active, prioritizing the most urgent remediations. Once you have a clean history, install Gitleaks as a pre-commit hook to prevent new secrets from entering the codebase. For any secrets that TruffleHog confirms as active, rotate them immediately — before removing them from the codebase, because rotation-before-removal is what stops a breach already in progress. GitHub's native secret scanning, if your repository is hosted there, provides a third independent signal with zero configuration overhead.[^34][^35][^36][^33]

Q5: Is AI-generated code less secure than human code?

Measurably, yes — at least by current models and current usage patterns. The CodeRabbit study found 2.74 times more security vulnerabilities per pull request in AI-assisted code compared to human-authored code. The Stanford research found that AI assistant users produced code with security vulnerabilities 40% of the time on security-sensitive tasks, compared to lower rates in the control group. The Veracode study found a 45% failure rate across 100+ models. This is not a condemnation of AI as a tool — it is a statement about the current gap between what AI optimizes for (functional correctness) and what production security requires (adversarial correctness). That gap is where human engineering judgment is irreplaceable.

Q6: What's the single most dangerous thing about vibe coding?

The most dangerous thing is not any specific vulnerability class. It is the false confidence problem identified in the Stanford research: developers who use AI coding assistants produce less secure code and believe they have produced more secure code. Combined with the SUSVIBES finding that code can be functionally correct and critically insecure simultaneously, this creates a scenario where an application passes all its tests, works perfectly in staging, launches successfully — and contains authentication bypasses and exposed secrets that an attacker will find in hours. The app worked. The team felt confident. Fifty thousand users were compromised. That is the actual danger of the current vibe coding era: not incompetence, but the invisible gap between "it works" and "it's safe."


📞 Work with MD Bazlur Rahman Likhon

If you are building AI-powered applications and need a security architecture review before launch — or want to build on a foundation that is production-grade from day one — MD Bazlur Rahman Likhon is available for consultation.

Whether you need to audit an existing vibe-coded codebase for the six vulnerability classes outlined above, establish a secure AI development framework for your engineering team, or design a multi-agent AI pipeline with security controls that match the stakes of production deployment, the approach is the same one used in systems like CropMind: AI-assisted velocity, human-owned security posture.

Book a free discovery call

Likhon - Gen AI Specialist

Senior Cloud and AI Engineer

Generative AI expert with 6+ years experience and 300+ certifications. Building LLM, RAG systems, and multi-cloud AI solutions.