Baiyuan RAG Knowledge Platform Whitepaper

Chapter 11 — Anonymized Tenant Observations

Numbers don’t lie. But they can stay selectively silent. This chapter puts both the good and the ugly in print.

11.1 Collection & Anonymization

All figures from Baiyuan Pilot: 12 tenants, ~1.2M queries in Q1 2026. Principles: aggregate only, industry-delidentified, no absolute-number stacking, granularity useful for engineers but not for architecture copying.

11.2 Case A — E-commerce CS (AI CS SaaS)

Context: Consumer electronics brand, annual revenue > USD 5M, Taiwan market. Widget + LINE deployed.

Metric Before After (3 mo)
Daily tickets 120 38 (−68%)
First response time 18 min 0.8 s
L1 hit rate 52%
Cache hit rate 31%
Monthly LLM spend USD 680
CSAT 4.1/5 4.3/5
Handoff rate 100% 11%

Observations:

Lesson learned: Week 1 hallucination event — AI said free-shipping threshold was NT$500, actually NT$800. Root cause: L2 retrieved an old FAQ chunk. Fix: elevated “shipping policy” to L1 Wiki with monthly revalidation.

11.3 Case B — SaaS Tech Docs (AI CS)

Context: B2B SaaS with 300+ articles across API docs, integration guides, SDK samples. Developer self-serve.

Metric Value
Monthly queries 120,000
L1 hit rate 38%
L2 with Rerank 18%
Avg answer length 340 chars
With code block 61%
Follow-up rate 22%

Observations:

11.4 Case C — Cosmetics Brand (PIF AI)

Context: Mid-sized skincare brand, 14 SKUs needing 2026 Q1 PIF filing.

Metric Consultant PIF AI
Per-SKU time 30 workdays 4 workdays
Per-SKU cost USD 3,500 USD 600
Regulation update tracking Monthly manual Weekly auto
Citation traceability 60–70% 100%
TFDA first-pass approval 70% 88%
Monthly LLM spend USD 320

Observations:

Lesson: ECHA had a major 2026/02 update; old Wiki expired overnight. Added “source-change alert” — tenant Dashboard now shows “7 PIF filings cite expired data, review recommended.”

11.5 Case D — B2B Consulting (GEO + RAG Coupled)

Context: B2B strategy consultancy; 10 partner bios, 30 research reports, 12 industry analyses. GEO for AI visibility + RAG for internal search. Both share the same brand facts.

Metric W0 W6
AI citation rate (ChatGPT) 18% 41%
AI citation rate (Perplexity) 22% 58%
Fact accuracy (NLI) 67% 94%
Hallucination events / week 12 2
Avg repair latency 6.2 days
Internal CS hit rate 72% 89%

Most striking: Week 3 system caught Perplexity saying Partner Alice “graduated from Harvard” — actually Stanford. GEO triggered:

  1. Generate ClaimReview
  2. Inject into RAG Wiki (partner bio page)
  3. AXP shadow doc updated
  4. 6 days later Perplexity changed to “Stanford”
  5. No human action needed

This is the concrete value of deep integration.

11.6 Cross-Case Patterns

Metric A B C D
L1 hit 52% 38% 62% 41%
Cache hit 31% 22% 14% 26%
Monthly cost $680 $450 $320 $520
Main hallucination Numbers Nonexistent endpoint None (NLI catches) Person facts
Handoff rate 11% N/A 24% N/A

Conclusion 1: Structure drives L1 hit. FAQs / regulations → 50%+. Dev docs / free Q&A → 30–40%.

Conclusion 2: NLI pays off for regulated/academic domains. +18% cost, hallucination → 0.

Conclusion 3: GEO + RAG coupling shifts “brand AI health” overall. A single metric misleads.

Conclusion 4: Token cost absolute ≠ cost ratio. E-commerce at $680 is 0.016% of revenue. PIF at $20 per $600 filing is 3.3%. PIF demands aggressive optimization.


Key Takeaways

References


Navigation: ← Ch 10 · 📖 Contents · Ch 12 →