nDSG-compliant RAG for Swiss SMBs: what it actually means
Retrieval-Augmented Generation (RAG) is by far the most common AI architecture in production SMB projects. Instead of training a language model on the entire internet, you give it your own knowledge base: contracts, product manuals, helpdesk tickets, wiki entries. The model answers questions with reference to those documents — and that's exactly the step where data protection questions arise that many Swiss SMBs underestimate.
What RAG actually does — in two paragraphs for non-engineers
Greatly simplified: your documents are split into small pieces (chunks), each chunk is translated into a numeric representation (vector) by a second AI model (the embedding model) and stored in a vector database. When someone asks a question, the question is also converted into a vector, the mathematically nearest document chunks are fetched, and they're sent — together with the original question — to a language model (e.g. Claude or GPT-4). The model answers based on those chunks.
Practical consequence: with every query, excerpts of your original documents leave your infrastructure — at least when the language model is a cloud provider. That's the moment when Swiss data protection law (revFADP / nDSG), data processing, and data residency get concrete.
The four data flows that matter for nDSG
In a typical RAG setup for a Swiss SMB, four data flows arise. Each must be assessed differently from a privacy perspective:
- Indexing: when initially or periodically populating the vector database, document chunks are sent to an embedding model. If the embedding model runs in the cloud (e.g. OpenAI text-embedding-3), the chunks leave your infrastructure — even if not stored there, they must briefly be processed.
- Persistence: the vector database itself (e.g. Postgres with pgvector, Qdrant, Pinecone) stores the chunks, often in plaintext alongside the vector. Where does this database run? Switzerland, EU, US?
- Query at runtime: the user's question and the retrieved chunks go to the language model. Again: where does that model run, under what processing agreement?
- Logging and audit: every query produces log data: who asked, what was retrieved, what was the answer. These logs are essential for compliance — and must themselves be protected.
What "nDSG-compliant" actually means
The revised Swiss Federal Act on Data Protection has been in force since September 2023. For a RAG project, five points matter most:
- Processing register (Art. 12 nDSG): you must document which personal data you process for which purposes. With RAG that means knowing which document types are indexed and whether they contain personal data.
- Data processing (Art. 9 nDSG): any third party processing personal data (embedding model, LLM, vector DB host) is a processor and needs a written agreement — analogous to DPAs under EU GDPR.
- Cross-border transfers (Art. 16 nDSG): transfers to third countries without an adequate level of protection require additional safeguards — standard contractual clauses (EDPC-recognised) or the Swiss-US Data Privacy Framework.
- Privacy by design (Art. 7 nDSG): mandatory. Concretely: mask sensitive fields, log accesses, pseudonymise where possible.
- Data protection impact assessment (Art. 22 nDSG): required for high-risk processing. RAG with personal data in a cloud LLM typically qualifies.
Architecture decisions with privacy impact
You'll make three decisions early in any RAG project — and each has data-protection consequences:
1. Which embedding model?
Cloud embedding models (OpenAI, Voyage, Cohere) are excellent in quality and fast. But they process every chunk externally. Local embedding models (e.g. nomic-embed-text, bge-m3 via Ollama or Hugging Face TEI) are 5–15% less precise but run on your own hardware. For sensitive domains — HR records, patient data, mandate documents — the local variant is often the only defensible choice.
2. Where does the vector database run?
Postgres with pgvector on a Swiss Hetzner or Infomaniak server means Swiss data residency and is trivial under nDSG. Pinecone or Weaviate-Cloud means US hosting and requires DPAs plus a justification for cross-border transfer. Both work — but the compliance story differs significantly, and so do monthly costs.
3. Which LLM at runtime?
The day-to-day relevant question: does every answer go to Anthropic (US, with EU region available), to OpenAI (US, with EU region available), or to a locally hosted model (Llama 3, Mistral via Ollama)? Local models are today good enough for many SMB use cases, especially when the RAG context is well curated. For complex reasoning tasks, cloud LLMs are still superior — but then you need data masking upstream.
Checklist for executives before a RAG project
If your CIO or an external vendor proposes a RAG project, clarify these seven points before signing:
- Which document types will be indexed, and do they contain personal data?
- Which embedding model will be used — cloud or local?
- Where does the vector database physically run, and who has access?
- Which LLM provider at runtime, in which region, under which data processing agreement?
- How are sensitive fields masked before reaching the LLM?
- What does the audit log look like, and how long are queries retained?
- How does exit work: do you get the index, database dump, and configuration back when changing vendors?
If the vendor answers any of these questions with "we'll deal with that later" — take it as a warning sign. Privacy in a RAG context isn't a layer you slap on at the end. It's an architectural early decision.
Where Turivus comes in
For every RAG project we start with a Swiss SMB, the first week is pure data-flow analysis: which documents, which personal data, which masking rules, which provider mix, which audit requirements. Only after that do we design the architecture — and we document it so your data protection officer can use it without translation work. That's slower than "let's just install LangChain and see", but it's the only variant that gets an SMB into productive, compliance-tight operation.
Ready for an initial conversation?
30-minute initial call