Baiyuan RAG Knowledge Platform Whitepaper

Baiyuan RAG Knowledge Platform — Technical Whitepaper

A whitepaper on L1 Wiki + L2 RAG hybrid retrieval for multi-tenant AI SaaS

中文版 · 日本語版

Executive Summary

This whitepaper documents Baiyuan Technology’s engineering practice (2024–2026) of building the Baiyuan RAG Knowledge Platform — a multi-tenant knowledge retrieval infrastructure shared by three product lines:

Baiyuan AI Customer Service SaaS (chat.baiyuan.io)
Baiyuan GEO Platform (geo.baiyuan.io) — see sister whitepaper at https://github.com/baiyuan-tech/geo-whitepaper
Baiyuan PIF AI (pif.baiyuan.io) — cosmetics Product Information File automation for Taiwan TFDA compliance

The core architectural contribution is a two-layer retrieval system:

L1 Wiki — DB-cached, LLM-compiled, structured summaries keyed by slug. Answers ~40% of typical queries in under 500 ms at zero LLM cost.
L2 RAG — pgvector cosine search + PostgreSQL tsvector BM25, fused by Reciprocal Rank Fusion (k=60). Handles the rest with hybrid precision.

Combined, the platform reports 40–68% LLM token-cost reduction versus naive single-layer RAG across Pilot tenants (2026 Q1), while cutting hallucination rate by 57% and P95 latency by 51%.

Part I — Problem & Architecture

Part II — Core Algorithms

Part III — Engineering Architecture

Part IV — Ecosystem Integration

Part V — Reality Check

Appendices

Problem Statement

When enterprises adopt generative AI for customer service, knowledge retrieval, or regulatory compliance, they face five engineering problems:

Hallucination and factual inaccuracy — unacceptable in regulated domains (legal, medical, financial)
Token cost explosion — at high QPS, LLM API bills become uncontrollable
Multi-tenant data isolation — A’s knowledge must never reach B’s retrieval
Heterogeneous knowledge sources — PDFs, web pages, databases, APIs, Excels all must be ingested
Per-product-line waste — three separate RAGs = 3× engineering cost

The Baiyuan RAG Knowledge Platform is a unified engineering response. The whitepaper explains why the architecture is the way it is, not just what it does.

The Three Pillars

flowchart TB
    subgraph Products[Three Product Lines]
        CS[AI Customer Service<br/>chat.baiyuan.io]
        GEO[GEO Platform<br/>geo.baiyuan.io]
        PIF[PIF AI<br/>pif.baiyuan.io]
    end
    subgraph Shared[Shared Infrastructure]
        RAG[(Baiyuan RAG Platform<br/>rag.baiyuan.io)]
        ENT[Brand Entity Graph<br/>Schema.org @id]
    end
    CS -->|Q&A traffic| RAG
    GEO -->|hallucination repair| RAG
    PIF -->|regulatory KB| RAG
    RAG --> ENT
    GEO --> ENT
    PIF --> ENT

Fig 0: The three pillars sharing the RAG platform

Key Terminology

Term	Definition
L1 Wiki	LLM-compiled structured summaries in PostgreSQL; keyed by slug; queried in ~50 ms
L2 RAG	pgvector cosine + BM25 tsvector + RRF fusion
RRF	Reciprocal Rank Fusion: `score(d) = Σ 1/(k + rank_i(d))`, k=60
Wiki Compile	Offline batch job that builds wiki_pages from chunks
Wiki Lint	Daily cron that validates Wiki for fact conflict, missing citations
Three-Layer Isolation	App header + PostgreSQL RLS + SQL WHERE, defense-in-depth
Handoff	AI→human handover five-state machine (ai_active / pending / agent_active / ended)
NLI	Natural Language Inference three-way classification for hallucination check
GEO	Generative Engine Optimization (sister product)
PIF	Product Information File (cosmetics regulatory; sister product)

Who Should Read This

Reader	Suggested Path
B2B Decision-Makers (CIO/CTO)	Ch 1, 2, 9, 10, 11
Engineering Leads & Architects	Ch 2, 5, 6, 9, 10
Backend Engineers	Ch 3, 4, 5, 7, 8
AI/Academic Researchers	Ch 3, 4, 12
Operations/CS Adopters	Ch 2, 8, 11

Citation

APA 7

Lin, V. (2026). Baiyuan RAG Knowledge Platform: A whitepaper on L1 Wiki + L2 RAG hybrid retrieval for multi-tenant AI SaaS. Baiyuan Technology. https://github.com/baiyuan-tech/rag-whitepaper

BibTeX

@techreport{lin2026baiyuanrag,
  author      = {Lin, Vincent},
  title       = {Baiyuan RAG Knowledge Platform: A Whitepaper on L1 Wiki + L2 RAG Hybrid Retrieval for Multi-Tenant AI SaaS},
  institution = {Baiyuan Technology},
  year        = {2026},
  url         = {https://github.com/baiyuan-tech/rag-whitepaper},
  note        = {v1.0}
}

License

CC BY-NC 4.0. Free to share, translate, and quote with attribution. Commercial use (e.g., republishing the full book, embedding in paid courses) requires permission from services@baiyuan.io.

Baiyuan Technology Co., Ltd. · https://baiyuan.io · services@baiyuan.io

This site is open source. Improve this page.