• Agents
  • Pricing
  • Blog
Log in
Get started

Security for apps built with AI. Paste a URL, get a report, fix what matters.

Product

  • How it works
  • What we find
  • Pricing
  • Agents
  • MCP Server
  • CLI
  • GitHub Action

Resources

  • Guides
  • Blog
  • Docs
  • OWASP Top 10
  • Glossary
  • FAQ

Security

  • Supabase Security
  • Next.js Security
  • Lovable Security
  • Cursor Security
  • Bolt Security

Legal

  • Privacy Policy
  • Terms of Service
  • Cookie Policy
  • Imprint
© 2026 Flowpatrol. All rights reserved.
Home/OWASP Top 10/LLM Top 10/LLM10: Unbounded Consumption
LLM10CWE-400CWE-770CWE-730

The 'I woke up to a $40,000 OpenAI bill' bug
Unbounded Consumption

The bug where your chatbot turns into somebody else's free inference endpoint.

Affects almost every public LLM feature that ships without request limits.

Reference: LLM Top 10 (2025) — LLM10·Last updated April 7, 2026·By Flowpatrol Team
Unbounded Consumption illustration

Every LLM call costs real money. Most LLM features accept anonymous input, forward it to a paid API, and return the result. That is not a chatbot. That is a wallet with a text box on top — and the internet can type.

Unbounded Consumption is the LLM version of a denial-of-service bug, except the resource being exhausted is your credit card. Without caps on input size, output size, and request rate, any public LLM feature is a pay-per-token pipe that anyone on the internet can open.

What your AI actually built

You wanted a public demo of your AI feature, so you skipped the signup wall. Visitors type a message, your server forwards it to Claude or GPT, the reply comes back. It was supposed to be a taste. It works as advertised.

Nothing on the path limits how long the prompt can be, how many requests a single IP can send, or how many tokens any one response can burn. The upstream model has a 200k context window, and your bill scales with it.

This is a classic denial-of-wallet bug. Attackers do not need a vulnerability — they just need your endpoint and a for-loop. The model is happy to process 200k token prompts forever. You are the one paying for it.

How it gets exploited

A public 'try our AI' page with no account required. Each request is forwarded straight to a paid LLM API.

  1. 1
Find the endpoint
An attacker opens the network tab and sees POST /api/chat returning model output. No auth header, no CAPTCHA, no rate limit in sight.
  • 2
    Measure the cost
    They send one big prompt — 100k tokens of lorem ipsum — and the server happily forwards it. The response takes 40 seconds and the bill meter ticks.
  • 3
    Parallelize
    A ten-line script opens 200 concurrent connections, each sending a new 100k-token prompt. The server fans them all out to the upstream API.
  • 4
    Let it run overnight
    Eight hours later, your OpenAI dashboard shows $42,300 in usage. The attacker paid nothing. Your autopay succeeded.
  • A demo feature burned through a month of runway in a single night. No data was stolen — the damage was the invoice.

    Vulnerable vs Fixed

    Vulnerable — forward anything, pay for everything
    // app/api/chat/route.ts
    export async function POST(req) {
      const { message } = await req.json();
    
      const reply = await anthropic.messages.create({
        model: 'claude-3-5-sonnet',
        max_tokens: 4096,
        messages: [{ role: 'user', content: message }],
      });
    
      return Response.json({ reply });
    }
    Fixed — bounded input, bounded output, bounded rate
    // app/api/chat/route.ts
    import { rateLimit } from '~/lib/rate-limit';
    
    const MAX_INPUT_CHARS = 4000;
    
    export async function POST(req) {
      const ip = req.headers.get('x-forwarded-for') ?? 'unknown';
      const ok = await rateLimit.check(ip, { max: 20, window: '1h' });
      if (!ok) return new Response('Too many requests', { status: 429 });
    
      const { message } = await req.json();
      if (typeof message !== 'string' || message.length > MAX_INPUT_CHARS) {
        return new Response('Message too long', { status: 413 });
      }
    
      const reply = await anthropic.messages.create({
        model: 'claude-3-5-sonnet',
        max_tokens: 512,
        messages: [{ role: 'user', content: message }],
      });
    
      return Response.json({ reply });
    }

    Three caps, all boring, all essential. A ceiling on input length, a ceiling on output tokens, and a per-IP rate limit. None of these make the feature worse for real users — they just stop the feature from being a free backend for everyone else.

    A real case

    A public AI demo racked up a five-figure bill overnight

    A small team shipped an unauthenticated chat endpoint for their launch. By morning, a single scripted attacker had burned through tens of thousands in inference costs — no data stolen, just the invoice.

    Related reading

    Glossary

    Brute Force Protection (Rate Limiting)

    References

    • LLM10: Unbounded Consumption — official OWASP entry
    • OWASP Top 10 for LLM Applications (2025) — full list
    • CWE-400 on cwe.mitre.org
    • CWE-770 on cwe.mitre.org
    • CWE-730 on cwe.mitre.org

    Find out how much your chatbot costs per attacker.

    Flowpatrol probes every LLM endpoint for missing rate limits and unbounded inputs. Five minutes. One URL.

    Try it free