Map first, details later. This chapter is the skeleton for the next eleven.
Baiyuan RAG Knowledge Platform is a shared AI knowledge infrastructure built on PostgreSQL + pgvector (storage), Redis (cache), Node.js (API), multi-tenant isolation (security), and L1 Wiki + L2 RAG (retrieval). Three product lines (CS / GEO / PIF) access it via X-RAG-API-Key + X-Tenant-ID.
sequenceDiagram
autonumber
participant Client
participant GW as Gateway
participant Auth
participant Cache as Redis
participant L1 as L1 Wiki
participant L2 as L2 pgvector+BM25
participant LLM
participant Audit
Client->>GW: POST /api/v1/ask
GW->>Auth: verify key + tenant
GW->>Cache: lookup
alt Cache hit
Cache-->>Client: return (0.1s)
else Cache miss
GW->>L1: slug query
alt L1 hit
L1-->>GW: wiki body
else L1 miss
GW->>L2: vector+BM25+RRF
L2-->>GW: top-K chunks
GW->>LLM: chunks + question
end
LLM-->>GW: answer
GW->>Cache: store (TTL=600s)
GW->>Audit: log
GW-->>Client: answer + sources
end
Fig 2-1: /api/v1/ask sequence
About 2/3 of queries finish before hitting LLM generation — this is the core of token economics.
| Table | Purpose | Key Fields |
|---|---|---|
tenants |
Tenant master | id, api_key, plan, quota |
knowledge_bases |
KB per tenant | id, tenant_id, is_default |
documents |
Source docs | id, kb_id, doc_type, status |
chunks |
Splits | id, document_id, content, fts (tsvector generated) |
embeddings |
Vectors | chunk_id, embedding vector(1536) |
wiki_pages |
L1 pages | id, kb_id, slug, body |
queries |
Audit log | id, tenant_id, question, from_wiki, latency_ms |
All tenant-scoped tables enable PostgreSQL Row-Level Security (Ch 6).
flowchart TB
GW[Gateway Node.js] --> MW[Middleware]
MW --> ASK[Ask Service<br/>L1→L2 orchestrator]
GW --> INGEST[Ingestion Worker]
ASK --> PG[(PostgreSQL + pgvector)]
ASK --> RD[(Redis)]
ASK --> LLM[OpenAI/Claude/Gemini]
INGEST --> PG
WIKIC[Wiki Compiler<br/>nightly] --> PG
WIKIC --> LLM
WIKIL[Wiki Linter<br/>daily] --> PG
Fig 2-2: Component layout
| Product | Uses RAG For | Feeds RAG With | Special Need |
|---|---|---|---|
| AI CS | Q&A, handoff summary | FAQ, product manual | SSE, <3s latency |
| GEO | Hallucination repair GT | Brand bio, team, services | NLI, strict citation |
| PIF AI | Ingredient/toxicology lookup | PubChem/ECHA/TFDA | Traceable citation, version lock |
Shared points:
tenant_id maps to one brand across three products@id cross-reference (Ch 9)https://rag.baiyuan.io| Decision | Choice | Alternatives | Why |
|---|---|---|---|
| Vector store | pgvector | Pinecone, Qdrant, Milvus | Same Postgres — txn, ops simplicity |
| Main DB | PostgreSQL 16 | MySQL, CockroachDB | Mature pgvector, RLS, JSONB |
| FTS | PG tsvector | Elasticsearch | One fewer service |
| Fusion | RRF (k=60) | Weighted avg, ColBERT | Robust, no tuning |
| Cache | Redis 7 | Memcached | Shared, precise TTL |
| Language | Node.js (TS) | Python, Go | Same stack as chat-gateway |
| Wiki LLM | Claude Sonnet 4.6 | Smaller model | Offline, quality matters |
| Answer LLM | Router (multi) | Single vendor | Cost/availability spread |
| Deploy | Docker Compose / Lightsail | Kubernetes | Tenant scale, lower overhead |
| Auth | Header-based API key | OAuth | Product-to-product call |
Every choice is a trade-off. Ch 12 revisits which may need revision.
The Chat Widget for the AI Customer Service line is a ~35KB JavaScript bundle embedded on every page of customer websites. At scale (100 tenants × 10K daily page views), this yields ~1M widget loads per day. Serving each hit from the Lightsail origin nginx would make it the platform’s first bottleneck.
The platform uses a two-tier cache architecture — origin + CDN edge — so that end-user requests almost never reach origin.
flowchart LR
Browser[Customer Browser<br/>1-day cache]
Edge[Cloudflare Edge<br/>300+ PoP / 1-year cache]
Nginx[Origin nginx<br/>Lightsail]
FS[/usr/share/nginx/<br/>html/widget/]
Browser -- MISS --> Edge
Edge -- MISS --> Nginx
Nginx -- alias --> FS
Best case: browser cache serves instantly (< 10ms). Cold start: CF edge returns in < 60ms TTFB from the Taipei PoP. Worst case: the first request in a region pays one origin round-trip, after which the regional PoP serves subsequent hits.
Origin nginx returns for /widget/*:
Cache-Control: public, max-age=86400, s-maxage=31536000, immutable
Access-Control-Allow-Origin: *
| Directive | Audience | Meaning | Rationale |
|---|---|---|---|
max-age=86400 |
Browser | Revalidate after 1 day | Support rapid bug-fix rollout |
s-maxage=31536000 |
Shared CDN | 1 year | Edge HIT rate → 100%, origin rarely fetched |
immutable |
Browser | No revalidation during TTL | Skip conditional GET, cut RTT |
A Cloudflare Cache Rule overrides edge TTL to 1 year (Override origin → 1 year), guaranteeing long edge retention even on the Free plan.
The widget loads cross-origin, so Access-Control-Allow-Origin: * is required. This is a public resource — no secrets — and tenant identity is passed at runtime via window.BAIYUAN_WIDGET.tenantKey.
Current strategy: versionless URL + short browser TTL.
chat-widget.v{SEMVER}.js and extend browser TTL to 1 yearmax-age/home/ubuntu/cs-widget/dist/, mounted read-only into nginx; docker compose up -d --no-deps nginx hot-reloads| Metric | Value (Taipei PoP, CF HIT) |
|---|---|
| TTFB | < 60ms |
| Total | < 70ms |
| Origin fetch rate | < 0.1% |
| Edge HIT rate | > 99.9% |
Navigation: ← Ch 1 · 📖 Contents · Ch 3 →