USER-LEVEL ACCESS CONTROL ARCHITECTURES FOR SECURE MULTI-TENANT RAG SYSTEMS

User-Level Access Control Architectures for Secure Multi-Tenant RAG Systems

9 min read

Executive Summary

  • RAG systems moving into production face serious data security issues because vector search ignores document ownership and access control.
  • The core challenge: ensure User A cannot retrieve User B’s content in semantic search, while still scaling to thousands–millions of users.
  • The document compares physical isolation (database-per-user) and logical isolation (RLS, payload filters, partitions, tenants) across PostgreSQL/pgvector, Qdrant, Milvus, and Weaviate, and covers orchestration-layer and authorization patterns.

Decisions

  • Decision: User-level security in RAG must be enforced before retrieval (pre-filtered search); post-filtering is considered unsafe for multi-user/multi-tenant scenarios.
  • Decision: Architectural choices depend on scale and compliance:
    • PostgreSQL + pgvector + Row-Level Security for strict compliance and moderate scale.
    • Qdrant with payload filtering and tiered multi-tenancy for high-performance, large B2C workloads.
    • Milvus partition keys and Weaviate tenants for massive-scale, multi-tenant environments.
  • Decision: User identity and authorization must be propagated end-to-end (from API to retriever to vector DB); “global retriever” patterns are rejected as insecure.

Action Items

  • Owner? — Choose an access-control architecture (physical vs. logical isolation) appropriate to expected user count, compliance needs, and performance targets.
  • Owner? — If using PostgreSQL/pgvector:
    • Design tables with explicit owner/user_id columns.
    • Enable Row-Level Security and define policies tied to session context (e.g., app.current_user_id or auth.uid()).
  • Owner? — If using Qdrant:
    • Store user/tenant metadata in payloads.
    • Create payload indices (e.g., on user_id) and enforce filters in all search queries.
  • Owner? — If using Milvus:
    • Define user_id (or similar) as a partition key in the schema.
    • Ensure queries always include expressions like user_id == '<user>'.
  • Owner? — If using Weaviate:
    • Enable multi-tenancy and create tenants per user/tenant.
    • Ensure all queries specify .with_tenant("<tenant_id>").
  • Owner? — Update LangChain/LlamaIndex-based services to:
    • Avoid global, context-free retrievers.
    • Use runtime-configurable retrievers (e.g., configurable_fields) to inject per-user filters at request time.
  • Owner? — For complex enterprise permissions, integrate an external authorization system (e.g., OpenFGA, Permit.io) and adopt a scalable pattern (e.g., group-based metadata denormalization).

Open Questions

  • How many users and documents are expected, and what are the concrete latency/SLA targets? (Determines whether PostgreSQL is sufficient or if Qdrant/Milvus/Weaviate are required.)
  • What are the regulatory/compliance requirements (e.g., SOC2, HIPAA, GDPR) that might favor engine-enforced RLS vs. app-enforced filters?
  • Will the system primarily need simple ownership checks (user_id == owner) or complex, relationship-based permissions (teams, roles, folders, document labels)?
  • How will “global” or shared documents (e.g., public knowledge base) be modeled across isolation schemes (especially in database-per-user architectures)?

Main Ideas

1. Security Paradox in Vector Search

  • Traditional databases: access control is a deterministic pre-filter (ACLs, RBAC, RLS).
  • Vector search (ANN + HNSW, etc.) is agnostic to ownership; it only optimizes for semantic similarity.
  • Naive RAG:
    • Single shared index.
    • Queries like “financial projections” or “salary bands” can return other users’ confidential docs.
    • Leads to data leakage and “context poisoning” in responses.
  • Multi-tenancy vs. user-level access:
    • Multi-tenant (B2B): org-level isolation (e.g., Coca-Cola vs. Pepsi); usually fewer tenants, stronger isolation (separate DBs/containers).
    • User-level (B2C/intra-org): identity-level isolation; many overlapping permissions, high user counts; heavy physical isolation per user does not scale easily.

2. Risk Types

  • Unauthorized retrieval (data leakage): user gets private documents they shouldn’t see.
  • Noisy neighbor: heavy use from one user degrades performance for others; considered a security/availability issue.

3. Access Control Models for RAG

  • RBAC: permissions by role, stored as metadata (e.g., visible_to_roles).
  • ABAC: permissions based on attributes (department, location, etc.), evaluated at query time.
  • ReBAC: permissions derived from relationships (owner, manager, group membership); powerful but harder to implement efficiently in vector search.

4. Pre- vs. Post-Retrieval Filtering

  • Pre-filtering / native filtered search:
    • Restricts candidate set before similarity search.
    • Required for safe user-level isolation.
  • Post-filtering:
    • First finds nearest neighbors, then removes disallowed results.
    • Deemed unsafe for multi-user scenarios due to high leakage risk and skewed distributions.
  • Consensus: Pre-filtered search is mandatory for secure user-level access control.

Architectural Patterns

5. Physical Isolation: Database-per-User

  • Each user gets a separate DB/schema (e.g., Neon serverless Postgres).
  • Workflow:
    • On signup: provision new DB/branch.
    • Store embeddings per user in that DB.
    • Middleware routes user requests to the correct DB connection.
  • Pros:
    • Strong isolation, minimal risk of cross-user leakage.
    • Noisy neighbor problem is confined to each user’s DB.
    • Easy deletion (drop user’s DB).
  • Cons:
    • Operational complexity at large scale (migrations, backups, connection overhead).
    • Handling shared/global documents is awkward (duplication or secondary DB).
  • Best suited for: High-value B2B, strong data sovereignty requirements; less ideal for millions of B2C users.

6. Logical Isolation with Shared Infrastructure

6.1 PostgreSQL + pgvector + Row-Level Security

  • Use mature Postgres security model with vector search.
  • Key elements:
    • Table includes user_id (owner).
    • Enable RLS and define policies restricting access by session context (e.g., current_setting('app.current_user_id')).
    • Application sets the session variable based on authenticated user before running vector query.
  • Benefits:
    • Engine-enforced; protects even if queries are missing explicit WHERE user_id = ....
    • Aligns with compliance-heavy environments.
  • Integrations:
    • Supabase maps HTTP auth tokens to RLS (auth.uid()).

6.2 Qdrant: Payload Filtering + Tiered Multi-Tenancy

  • Payloads: JSON metadata stored with vectors; filters use these fields.
  • Implementation:
    • Store user_id and other attributes in payload.
    • Create payload index on user_id for performance.
    • Always include payload filters in search.
  • Tiered multi-tenancy:
    • Small tenants share shards; user-level isolation via filters.
    • Large tenants can be moved to dedicated shards without changing query logic.
    • Addresses noisy-neighbor issues.

6.3 Milvus: Partition Keys

  • Older approach (per-user partitions) doesn’t scale due to partition count limits.
  • Partition keys:
    • Mark user_id as partition key in the schema.
    • Milvus hashes the key to a fixed number of partitions.
    • Queries that filter on user_id target specific partitions, improving performance.
  • Optimized primarily for certain index types (e.g., HNSW).

6.4 Weaviate: Native Multi-Tenancy (Tenants)

  • Multi-tenancy per class/collection:
    • multiTenancyConfig.enabled = true creates tenant-level shards.
    • Each tenant corresponds to a physical shard on disk.
  • Usage:
    • Add tenants (users/orgs) via API.
    • All queries must specify tenant context (with_tenant("...")).
  • Tenant states:
    • Can mark tenants inactive/offloaded to control RAM usage for large but sparsely active user bases.

Orchestration & Middleware

7. Insecure Pattern: Global Retriever

  • A single, global retriever without user context searches across all data.
  • Common in tutorials but inherently unsafe for multi-user systems.

8. Secure Pattern: Per-Request Configuration

8.1 LangChain

  • Use configurable_fields to inject filters at runtime:
    • Base retriever is created once.
    • search_kwargs (including filters) are marked configurable.
    • API endpoint:
      • Authenticates user (JWT, etc.).
      • Constructs DB-specific filter (e.g., Qdrant payload filter with user_id).
      • Passes filter via config on chain invocation.
  • Ensures each request is bound to the caller’s identity and allowed scope.

8.2 LlamaIndex

  • Per-request retriever instances or dynamic filters:
    • Use MetadataFilters and metadata-supporting vector stores.
    • Inject filters that encode user/tenant constraints on each query.

Advanced Authorization (ReBAC / FGA)

9. Need for Complex Permissions

  • Many enterprise scenarios exceed simple ownership:
    • Group-based access (departments, teams).
    • Folder hierarchies and labels (confidential, internal, public).
    • Manager/relationship rules.

10. Integration Patterns with FGA (OpenFGA, Permit.io)

  • Pattern A: Check-then-Query (read-time listing of resource IDs)
    • Ask FGA for allowed document IDs, then filter by IDs in vector search.
    • Works only when the allowed set is small; not scalable for millions of docs.
  • Pattern B: Synced Filter / Write-Time Denormalization
    • When permissions change, compute and store access attributes (e.g., allowed groups) in document metadata.
    • At query time:
      • Ask FGA which groups the user belongs to.
      • Filter on small group set (e.g., allowed_groups IN [...]) in vector DB.
    • Scales better, because filters depend on groups (small), not document counts (large).

Performance Considerations

11. Impact of Filtering on Vector Search

  • Pre-filtering may cause non-linear performance behavior:
    • High selectivity (tiny subset): often fast; may brute-force small candidate sets.
    • Medium selectivity: can be slowest; HNSW traversal is disrupted by many filtered-out nodes.
    • Low selectivity (most of data visible): graph behaves normally; fast.
  • DB-specific optimizations:
    • Qdrant: cardinality estimation and switching strategies (payload index scans vs. HNSW).
    • Elasticsearch: bitset caching for fast repeated filtered queries.
    • Milvus: partition keys to limit search space.

Comparative Overview

12. Tool Comparison (User-Level Access Control)

  • PostgreSQL + pgvector
    • Isolation: Row-Level Security (logical).
    • Security: Engine-enforced, high assurance.
    • Complexity: Low (SQL policies).
    • Scalability: High with proper indexing; less effective for extreme-scale high-traffic workloads.
    • Best for: Compliance-heavy systems, B2B/B2C hybrids where SQL familiarity is strong.
  • Qdrant
    • Isolation: Payload filtering; tiered multi-tenancy via shards.
    • Security: High, but app-enforced (filters must always be applied).
    • Scalability: Very high; good for high-throughput B2C.
    • Best for: Real-time, large-scale systems needing dynamic filtering and noisy-neighbor mitigation.
  • Milvus
    • Isolation: Partition keys; logical/physical via hashed partitions.
    • Security: High when filters are used consistently.
    • Scalability: Very high for large user counts.
    • Best for: Massive-scale multi-tenant knowledge bases.
  • Weaviate
    • Isolation: Tenants mapped to physical shards.
    • Security: High; shard-based separation.
    • Scalability: High, aided by tenant cold/offload states.
    • Best for: Enterprise SaaS platforms needing fine-grained multi-tenancy with resource management.

Conclusion

  • User-level access control in RAG is an architectural concern spanning ingestion, storage, retrieval, and orchestration layers.
  • Post-filtering is inadequate for secure multi-user setups; pre-filtered, identity-aware retrieval is mandatory.
  • PostgreSQL RLS is a strong starting point for secure RAG; specialized vector databases become important at larger scales or stricter latency requirements.
  • Future architectures will likely combine:
    • An authorization “agent” to determine allowed data scope.
    • A retrieval “agent” operating over a pre-filtered, identity-constrained vector space.