User-Level Access Control Architectures for Secure Multi-Tenant RAG Systems

December 12, 2025 9 min read

Executive Summary

RAG systems moving into production face serious data security issues because vector search ignores document ownership and access control.
The core challenge: ensure User A cannot retrieve User B’s content in semantic search, while still scaling to thousands–millions of users.
The document compares physical isolation (database-per-user) and logical isolation (RLS, payload filters, partitions, tenants) across PostgreSQL/pgvector, Qdrant, Milvus, and Weaviate, and covers orchestration-layer and authorization patterns.

Decisions

Decision: User-level security in RAG must be enforced before retrieval (pre-filtered search); post-filtering is considered unsafe for multi-user/multi-tenant scenarios.
Decision: Architectural choices depend on scale and compliance:
- PostgreSQL + pgvector + Row-Level Security for strict compliance and moderate scale.
- Qdrant with payload filtering and tiered multi-tenancy for high-performance, large B2C workloads.
- Milvus partition keys and Weaviate tenants for massive-scale, multi-tenant environments.
Decision: User identity and authorization must be propagated end-to-end (from API to retriever to vector DB); “global retriever” patterns are rejected as insecure.

Action Items

Owner? — Choose an access-control architecture (physical vs. logical isolation) appropriate to expected user count, compliance needs, and performance targets.
Owner? — If using PostgreSQL/pgvector:
- Design tables with explicit owner/user_id columns.
- Enable Row-Level Security and define policies tied to session context (e.g., app.current_user_id or auth.uid()).
Owner? — If using Qdrant:
- Store user/tenant metadata in payloads.
- Create payload indices (e.g., on user_id) and enforce filters in all search queries.
Owner? — If using Milvus:
- Define user_id (or similar) as a partition key in the schema.
- Ensure queries always include expressions like user_id == '<user>'.
Owner? — If using Weaviate:
- Enable multi-tenancy and create tenants per user/tenant.
- Ensure all queries specify .with_tenant("<tenant_id>").
Owner? — Update LangChain/LlamaIndex-based services to:
- Avoid global, context-free retrievers.
- Use runtime-configurable retrievers (e.g., configurable_fields) to inject per-user filters at request time.
Owner? — For complex enterprise permissions, integrate an external authorization system (e.g., OpenFGA, Permit.io) and adopt a scalable pattern (e.g., group-based metadata denormalization).

Open Questions

How many users and documents are expected, and what are the concrete latency/SLA targets? (Determines whether PostgreSQL is sufficient or if Qdrant/Milvus/Weaviate are required.)
What are the regulatory/compliance requirements (e.g., SOC2, HIPAA, GDPR) that might favor engine-enforced RLS vs. app-enforced filters?
Will the system primarily need simple ownership checks (user_id == owner) or complex, relationship-based permissions (teams, roles, folders, document labels)?
How will “global” or shared documents (e.g., public knowledge base) be modeled across isolation schemes (especially in database-per-user architectures)?

Main Ideas

1. Security Paradox in Vector Search

Traditional databases: access control is a deterministic pre-filter (ACLs, RBAC, RLS).
Vector search (ANN + HNSW, etc.) is agnostic to ownership; it only optimizes for semantic similarity.
Naive RAG:
- Single shared index.
- Queries like “financial projections” or “salary bands” can return other users’ confidential docs.
- Leads to data leakage and “context poisoning” in responses.
Multi-tenancy vs. user-level access:
- Multi-tenant (B2B): org-level isolation (e.g., Coca-Cola vs. Pepsi); usually fewer tenants, stronger isolation (separate DBs/containers).
- User-level (B2C/intra-org): identity-level isolation; many overlapping permissions, high user counts; heavy physical isolation per user does not scale easily.

2. Risk Types

Unauthorized retrieval (data leakage): user gets private documents they shouldn’t see.
Noisy neighbor: heavy use from one user degrades performance for others; considered a security/availability issue.

3. Access Control Models for RAG

RBAC: permissions by role, stored as metadata (e.g., visible_to_roles).
ABAC: permissions based on attributes (department, location, etc.), evaluated at query time.
ReBAC: permissions derived from relationships (owner, manager, group membership); powerful but harder to implement efficiently in vector search.

4. Pre- vs. Post-Retrieval Filtering

Pre-filtering / native filtered search:
- Restricts candidate set before similarity search.
- Required for safe user-level isolation.
Post-filtering:
- First finds nearest neighbors, then removes disallowed results.
- Deemed unsafe for multi-user scenarios due to high leakage risk and skewed distributions.
Consensus: Pre-filtered search is mandatory for secure user-level access control.

Architectural Patterns

5. Physical Isolation: Database-per-User

Each user gets a separate DB/schema (e.g., Neon serverless Postgres).
Workflow:
- On signup: provision new DB/branch.
- Store embeddings per user in that DB.
- Middleware routes user requests to the correct DB connection.
Pros:
- Strong isolation, minimal risk of cross-user leakage.
- Noisy neighbor problem is confined to each user’s DB.
- Easy deletion (drop user’s DB).
Cons:
- Operational complexity at large scale (migrations, backups, connection overhead).
- Handling shared/global documents is awkward (duplication or secondary DB).
Best suited for: High-value B2B, strong data sovereignty requirements; less ideal for millions of B2C users.

6. Logical Isolation with Shared Infrastructure

6.1 PostgreSQL + pgvector + Row-Level Security

Use mature Postgres security model with vector search.
Key elements:
- Table includes user_id (owner).
- Enable RLS and define policies restricting access by session context (e.g., current_setting('app.current_user_id')).
- Application sets the session variable based on authenticated user before running vector query.
Benefits:
- Engine-enforced; protects even if queries are missing explicit WHERE user_id = ....
- Aligns with compliance-heavy environments.
Integrations:
- Supabase maps HTTP auth tokens to RLS (auth.uid()).

6.2 Qdrant: Payload Filtering + Tiered Multi-Tenancy

Payloads: JSON metadata stored with vectors; filters use these fields.
Implementation:
- Store user_id and other attributes in payload.
- Create payload index on user_id for performance.
- Always include payload filters in search.
Tiered multi-tenancy:
- Small tenants share shards; user-level isolation via filters.
- Large tenants can be moved to dedicated shards without changing query logic.
- Addresses noisy-neighbor issues.

6.3 Milvus: Partition Keys

Older approach (per-user partitions) doesn’t scale due to partition count limits.
Partition keys:
- Mark user_id as partition key in the schema.
- Milvus hashes the key to a fixed number of partitions.
- Queries that filter on user_id target specific partitions, improving performance.
Optimized primarily for certain index types (e.g., HNSW).

6.4 Weaviate: Native Multi-Tenancy (Tenants)

Multi-tenancy per class/collection:
- multiTenancyConfig.enabled = true creates tenant-level shards.
- Each tenant corresponds to a physical shard on disk.
Usage:
- Add tenants (users/orgs) via API.
- All queries must specify tenant context (with_tenant("...")).
Tenant states:
- Can mark tenants inactive/offloaded to control RAM usage for large but sparsely active user bases.

Orchestration & Middleware

7. Insecure Pattern: Global Retriever

A single, global retriever without user context searches across all data.
Common in tutorials but inherently unsafe for multi-user systems.

8. Secure Pattern: Per-Request Configuration

8.1 LangChain

Use configurable_fields to inject filters at runtime:
- Base retriever is created once.
- search_kwargs (including filters) are marked configurable.
- API endpoint:
  - Authenticates user (JWT, etc.).
  - Constructs DB-specific filter (e.g., Qdrant payload filter with user_id).
  - Passes filter via config on chain invocation.
Ensures each request is bound to the caller’s identity and allowed scope.

8.2 LlamaIndex

Per-request retriever instances or dynamic filters:
- Use MetadataFilters and metadata-supporting vector stores.
- Inject filters that encode user/tenant constraints on each query.

Advanced Authorization (ReBAC / FGA)

9. Need for Complex Permissions

Many enterprise scenarios exceed simple ownership:
- Group-based access (departments, teams).
- Folder hierarchies and labels (confidential, internal, public).
- Manager/relationship rules.

10. Integration Patterns with FGA (OpenFGA, Permit.io)

Pattern A: Check-then-Query (read-time listing of resource IDs)
- Ask FGA for allowed document IDs, then filter by IDs in vector search.
- Works only when the allowed set is small; not scalable for millions of docs.
Pattern B: Synced Filter / Write-Time Denormalization
- When permissions change, compute and store access attributes (e.g., allowed groups) in document metadata.
- At query time:
  - Ask FGA which groups the user belongs to.
  - Filter on small group set (e.g., allowed_groups IN [...]) in vector DB.
- Scales better, because filters depend on groups (small), not document counts (large).

Performance Considerations

11. Impact of Filtering on Vector Search

Pre-filtering may cause non-linear performance behavior:
- High selectivity (tiny subset): often fast; may brute-force small candidate sets.
- Medium selectivity: can be slowest; HNSW traversal is disrupted by many filtered-out nodes.
- Low selectivity (most of data visible): graph behaves normally; fast.
DB-specific optimizations:
- Qdrant: cardinality estimation and switching strategies (payload index scans vs. HNSW).
- Elasticsearch: bitset caching for fast repeated filtered queries.
- Milvus: partition keys to limit search space.

Comparative Overview

12. Tool Comparison (User-Level Access Control)

PostgreSQL + pgvector
- Isolation: Row-Level Security (logical).
- Security: Engine-enforced, high assurance.
- Complexity: Low (SQL policies).
- Scalability: High with proper indexing; less effective for extreme-scale high-traffic workloads.
- Best for: Compliance-heavy systems, B2B/B2C hybrids where SQL familiarity is strong.
Qdrant
- Isolation: Payload filtering; tiered multi-tenancy via shards.
- Security: High, but app-enforced (filters must always be applied).
- Scalability: Very high; good for high-throughput B2C.
- Best for: Real-time, large-scale systems needing dynamic filtering and noisy-neighbor mitigation.
Milvus
- Isolation: Partition keys; logical/physical via hashed partitions.
- Security: High when filters are used consistently.
- Scalability: Very high for large user counts.
- Best for: Massive-scale multi-tenant knowledge bases.
Weaviate
- Isolation: Tenants mapped to physical shards.
- Security: High; shard-based separation.
- Scalability: High, aided by tenant cold/offload states.
- Best for: Enterprise SaaS platforms needing fine-grained multi-tenancy with resource management.

Conclusion

User-level access control in RAG is an architectural concern spanning ingestion, storage, retrieval, and orchestration layers.
Post-filtering is inadequate for secure multi-user setups; pre-filtered, identity-aware retrieval is mandatory.
PostgreSQL RLS is a strong starting point for secure RAG; specialized vector databases become important at larger scales or stricter latency requirements.
Future architectures will likely combine:
- An authorization “agent” to determine allowed data scope.
- A retrieval “agent” operating over a pre-filtered, identity-constrained vector space.