RAG feels safe because it is 'just search.' But a vector store doesn't know who's asking. It returns the most similar chunks — and if chunks from every customer live in the same index, the most similar chunk might belong to someone else.
Vector and Embedding Weaknesses are the RAG-specific flavour of Broken Access Control. A shared vector store returns the nearest chunks regardless of who uploaded them. Without a hard tenant filter, similarity becomes authorization — and similarity does not care about ownership.
What your AI actually built
You built a RAG bot that ingests PDFs, emails, and notes and answers questions about them. The first version used one Pinecone index, one namespace, one embedding model. You uploaded a few test docs and it worked beautifully.
Then a second customer onboarded. Their docs went into the same index. The bot still works — until someone asks a question that happens to match the other customer's content more closely than their own. The retriever returns it. The model cites it. The user reads it.
Embeddings are a similarity function, not an authorization system. Without a hard filter by tenant on every query, 'cosine distance' becomes 'leak the nearest document regardless of who owns it.'
How it gets exploited
A multi-tenant RAG assistant where every customer uploads their own documents into a shared vector index.
- 1Sign up as tenant BThe attacker registers a new workspace on the free tier. They upload one innocuous doc so the app considers them a real tenant.
- 2Ask a generic questionThey ask the bot something broad: 'What are our Q4 revenue targets?' Their own workspace has nothing on that — but another customer's leaked board deck does.
- 3Watch the citationsThe retriever pulls chunks from the other tenant's documents. The model summarizes them and cites document IDs from a workspace the attacker does not belong to.
- 4Probe for moreThey iterate on questions. Salary bands, customer lists, internal roadmaps — anything their free-tier neighbour uploaded is now one embedding away.
A single shared index turned into a free-tier espionage tool. Every tenant was reading every other tenant's documents through polite, well-formatted answers.
Vulnerable vs Fixed
// rag/query.ts
export async function answer(question: string) {
const embedding = await embed(question);
const results = await index.query({
vector: embedding,
topK: 5,
// No filter — every tenant's chunks are in the pool.
});
return llm.chat({
messages: [
{ role: 'system', content: 'Answer using the provided context.' },
{ role: 'user', content: question + '\n\n' + results.join('\n') },
],
});
}// rag/query.ts
export async function answer(question: string, ctx: { tenantId: string }) {
const embedding = await embed(question);
const results = await index.query({
vector: embedding,
topK: 5,
filter: { tenantId: ctx.tenantId }, // hard scope on every query
});
return llm.chat({
messages: [
{ role: 'system', content: 'Answer using the provided context.' },
{ role: 'user', content: question + '\n\n' + results.join('\n') },
],
});
}One filter, enforced on every retrieval. Namespaces per tenant are stronger still — a separate index slot means a missing filter fails closed instead of leaking. Either way, the rule is: no similarity query without an identity attached.
A real case
A shared RAG index leaked one tenant's docs to another
A multi-tenant assistant with a single vector namespace returned a competitor's board deck to a free-tier user who simply asked about revenue targets.
Related reading
References
Find out if your RAG bot is leaking across tenants.
Flowpatrol seeds two workspaces and cross-tests every retrieval path. Five minutes. One URL.
Try it free