How does an AI code generator ship Unbounded Consumption?

Tutorials and code generators produce the happy-path version: accept input, call the model, return the response. Limits, quotas, and rate checks are always 'production concerns' that get added later. Later often means 'after the bill arrives.'

How do attackers find Unbounded Consumption bugs?

They look for unauthenticated or weakly-authenticated endpoints that hit a model, measure the per-request cost with a single probe, then parallelize. No bypass required — the endpoint is doing exactly what it was built to do, just faster and more often than you expected.

How does Flowpatrol detect Unbounded Consumption?

Flowpatrol sends progressively larger inputs and higher request rates against every LLM-backed endpoint we find, measuring when (or whether) the server starts saying no. If we can keep sending requests or keep sending large prompts without pushback, we report the exact threshold.

Unbounded Consumption — LLM10 in OWASP Top 10 for LLM Applications

Every LLM call costs real money. Most LLM features accept anonymous input, forward it to a paid API, and return the result. That is not a chatbot. That is a wallet with a text box on top — and the internet can type.

Unbounded Consumption is the LLM version of a denial-of-service bug, except the resource being exhausted is your credit card. Without caps on input size, output size, and request rate, any public LLM feature is a pay-per-token pipe that anyone on the internet can open.

What your AI actually built

You wanted a public demo of your AI feature, so you skipped the signup wall. Visitors type a message, your server forwards it to Claude or GPT, the reply comes back. It was supposed to be a taste. It works as advertised.

Nothing on the path limits how long the prompt can be, how many requests a single IP can send, or how many tokens any one response can burn. The upstream model has a 200k context window, and your bill scales with it.

This is a classic denial-of-wallet bug. Attackers do not need a vulnerability — they just need your endpoint and a for-loop. The model is happy to process 200k token prompts forever. You are the one paying for it.

How it gets exploited

A public 'try our AI' page with no account required. Each request is forwarded straight to a paid LLM API.

1
Find the endpoint
An attacker opens the network tab and sees POST /api/chat returning model output. No auth header, no CAPTCHA, no rate limit in sight.
2
Measure the cost
They send one big prompt — 100k tokens of lorem ipsum — and the server happily forwards it. The response takes 40 seconds and the bill meter ticks.
3
Parallelize
A ten-line script opens 200 concurrent connections, each sending a new 100k-token prompt. The server fans them all out to the upstream API.
4
Let it run overnight
Eight hours later, your OpenAI dashboard shows $42,300 in usage. The attacker paid nothing. Your autopay succeeded.

A demo feature burned through a month of runway in a single night. No data was stolen — the damage was the invoice.

Vulnerable vs Fixed

Vulnerable — forward anything, pay for everything

// app/api/chat/route.ts
export async function POST(req) {
  const { message } = await req.json();

  const reply = await anthropic.messages.create({
    model: 'claude-3-5-sonnet',
    max_tokens: 4096,
    messages: [{ role: 'user', content: message }],
  });

  return Response.json({ reply });
}

Fixed — bounded input, bounded output, bounded rate

// app/api/chat/route.ts
import { rateLimit } from '~/lib/rate-limit';

const MAX_INPUT_CHARS = 4000;

export async function POST(req) {
  const ip = req.headers.get('x-forwarded-for') ?? 'unknown';
  const ok = await rateLimit.check(ip, { max: 20, window: '1h' });
  if (!ok) return new Response('Too many requests', { status: 429 });

  const { message } = await req.json();
  if (typeof message !== 'string' || message.length > MAX_INPUT_CHARS) {
    return new Response('Message too long', { status: 413 });
  }

  const reply = await anthropic.messages.create({
    model: 'claude-3-5-sonnet',
    max_tokens: 512,
    messages: [{ role: 'user', content: message }],
  });

  return Response.json({ reply });
}

Three caps, all boring, all essential. A ceiling on input length, a ceiling on output tokens, and a per-IP rate limit. None of these make the feature worse for real users — they just stop the feature from being a free backend for everyone else.

A real case

A public AI demo racked up a five-figure bill overnight

A small team shipped an unauthenticated chat endpoint for their launch. By morning, a single scripted attacker had burned through tens of thousands in inference costs — no data stolen, just the invoice.

References

Find out how much your chatbot costs per attacker.

Flowpatrol probes every LLM endpoint for missing rate limits and unbounded inputs. Five minutes. One URL.

Try it free

What your AI actually built

How it gets exploited

A public 'try our AI' page with no account required. Each request is forwarded straight to a paid LLM API.

1
Find the endpoint
An attacker opens the network tab and sees POST /api/chat returning model output. No auth header, no CAPTCHA, no rate limit in sight.
2
Measure the cost
They send one big prompt — 100k tokens of lorem ipsum — and the server happily forwards it. The response takes 40 seconds and the bill meter ticks.
3
Parallelize
A ten-line script opens 200 concurrent connections, each sending a new 100k-token prompt. The server fans them all out to the upstream API.
4
Let it run overnight
Eight hours later, your OpenAI dashboard shows $42,300 in usage. The attacker paid nothing. Your autopay succeeded.

A demo feature burned through a month of runway in a single night. No data was stolen — the damage was the invoice.

Vulnerable vs Fixed

Vulnerable — forward anything, pay for everything

// app/api/chat/route.ts
export async function POST(req) {
  const { message } = await req.json();

  const reply = await anthropic.messages.create({
    model: 'claude-3-5-sonnet',
    max_tokens: 4096,
    messages: [{ role: 'user', content: message }],
  });

  return Response.json({ reply });
}

Fixed — bounded input, bounded output, bounded rate

// app/api/chat/route.ts
import { rateLimit } from '~/lib/rate-limit';

const MAX_INPUT_CHARS = 4000;

export async function POST(req) {
  const ip = req.headers.get('x-forwarded-for') ?? 'unknown';
  const ok = await rateLimit.check(ip, { max: 20, window: '1h' });
  if (!ok) return new Response('Too many requests', { status: 429 });

  const { message } = await req.json();
  if (typeof message !== 'string' || message.length > MAX_INPUT_CHARS) {
    return new Response('Message too long', { status: 413 });
  }

  const reply = await anthropic.messages.create({
    model: 'claude-3-5-sonnet',
    max_tokens: 512,
    messages: [{ role: 'user', content: message }],
  });

  return Response.json({ reply });
}

The 'I woke up to a $40,000 OpenAI bill' bug
Unbounded Consumption

What your AI actually built

How it gets exploited

Vulnerable vs Fixed

A real case

A public AI demo racked up a five-figure bill overnight

Related reading

Glossary

References

Find out how much your chatbot costs per attacker.

The 'I woke up to a $40,000 OpenAI bill' bug
Unbounded Consumption

What your AI actually built

How it gets exploited

Vulnerable vs Fixed

A real case

A public AI demo racked up a five-figure bill overnight

Related reading

Glossary

References

Find out how much your chatbot costs per attacker.

The 'I woke up to a $40,000 OpenAI bill' bugUnbounded Consumption

What your AI actually built

How it gets exploited

Vulnerable vs Fixed

A real case

A public AI demo racked up a five-figure bill overnight

Related reading

Glossary

References

Find out how much your chatbot costs per attacker.

The 'I woke up to a $40,000 OpenAI bill' bugUnbounded Consumption

What your AI actually built

How it gets exploited

Vulnerable vs Fixed

A real case

A public AI demo racked up a five-figure bill overnight

Related reading

Glossary

References

Find out how much your chatbot costs per attacker.

The 'I woke up to a $40,000 OpenAI bill' bug
Unbounded Consumption

The 'I woke up to a $40,000 OpenAI bill' bug
Unbounded Consumption