A whitepaper on L1 Wiki + L2 RAG hybrid retrieval for multi-tenant AI SaaS
This whitepaper documents Baiyuan Technology’s engineering practice (2024–2026) of building the Baiyuan RAG Knowledge Platform — a multi-tenant knowledge retrieval infrastructure shared by three product lines:
The core architectural contribution is a two-layer retrieval system:
tsvector BM25, fused by Reciprocal Rank Fusion (k=60). Handles the rest with hybrid precision.Combined, the platform reports 40–68% LLM token-cost reduction versus naive single-layer RAG across Pilot tenants (2026 Q1), while cutting hallucination rate by 57% and P95 latency by 51%.
When enterprises adopt generative AI for customer service, knowledge retrieval, or regulatory compliance, they face five engineering problems:
The Baiyuan RAG Knowledge Platform is a unified engineering response. The whitepaper explains why the architecture is the way it is, not just what it does.
flowchart TB
subgraph Products[Three Product Lines]
CS[AI Customer Service<br/>chat.baiyuan.io]
GEO[GEO Platform<br/>geo.baiyuan.io]
PIF[PIF AI<br/>pif.baiyuan.io]
end
subgraph Shared[Shared Infrastructure]
RAG[(Baiyuan RAG Platform<br/>rag.baiyuan.io)]
ENT[Brand Entity Graph<br/>Schema.org @id]
end
CS -->|Q&A traffic| RAG
GEO -->|hallucination repair| RAG
PIF -->|regulatory KB| RAG
RAG --> ENT
GEO --> ENT
PIF --> ENT
Fig 0: The three pillars sharing the RAG platform
| Term | Definition |
|---|---|
| L1 Wiki | LLM-compiled structured summaries in PostgreSQL; keyed by slug; queried in ~50 ms |
| L2 RAG | pgvector cosine + BM25 tsvector + RRF fusion |
| RRF | Reciprocal Rank Fusion: score(d) = Σ 1/(k + rank_i(d)), k=60 |
| Wiki Compile | Offline batch job that builds wiki_pages from chunks |
| Wiki Lint | Daily cron that validates Wiki for fact conflict, missing citations |
| Three-Layer Isolation | App header + PostgreSQL RLS + SQL WHERE, defense-in-depth |
| Handoff | AI→human handover five-state machine (ai_active / pending / agent_active / ended) |
| NLI | Natural Language Inference three-way classification for hallucination check |
| GEO | Generative Engine Optimization (sister product) |
| PIF | Product Information File (cosmetics regulatory; sister product) |
| Reader | Suggested Path |
|---|---|
| B2B Decision-Makers (CIO/CTO) | Ch 1, 2, 9, 10, 11 |
| Engineering Leads & Architects | Ch 2, 5, 6, 9, 10 |
| Backend Engineers | Ch 3, 4, 5, 7, 8 |
| AI/Academic Researchers | Ch 3, 4, 12 |
| Operations/CS Adopters | Ch 2, 8, 11 |
APA 7
Lin, V. (2026). Baiyuan RAG Knowledge Platform: A whitepaper on L1 Wiki + L2 RAG hybrid retrieval for multi-tenant AI SaaS. Baiyuan Technology. https://github.com/baiyuan-tech/rag-whitepaper
BibTeX
@techreport{lin2026baiyuanrag,
author = {Lin, Vincent},
title = {Baiyuan RAG Knowledge Platform: A Whitepaper on L1 Wiki + L2 RAG Hybrid Retrieval for Multi-Tenant AI SaaS},
institution = {Baiyuan Technology},
year = {2026},
url = {https://github.com/baiyuan-tech/rag-whitepaper},
note = {v1.0}
}
CC BY-NC 4.0. Free to share, translate, and quote with attribution. Commercial use (e.g., republishing the full book, embedding in paid courses) requires permission from services@baiyuan.io.
Baiyuan Technology Co., Ltd. · https://baiyuan.io · services@baiyuan.io