Every LLM call costs real money. Most LLM features accept anonymous input, forward it to a paid API, and return the result. That is not a chatbot. That is a wallet with a text box on top — and the internet can type.
Unbounded Consumption is the LLM version of a denial-of-service bug, except the resource being exhausted is your credit card. Without caps on input size, output size, and request rate, any public LLM feature is a pay-per-token pipe that anyone on the internet can open.
What your AI actually built
You wanted a public demo of your AI feature, so you skipped the signup wall. Visitors type a message, your server forwards it to Claude or GPT, the reply comes back. It was supposed to be a taste. It works as advertised.
Nothing on the path limits how long the prompt can be, how many requests a single IP can send, or how many tokens any one response can burn. The upstream model has a 200k context window, and your bill scales with it.
This is a classic denial-of-wallet bug. Attackers do not need a vulnerability — they just need your endpoint and a for-loop. The model is happy to process 200k token prompts forever. You are the one paying for it.
How it gets exploited
A public 'try our AI' page with no account required. Each request is forwarded straight to a paid LLM API.
- 1