Baiyuan RAG Knowledge Platform Whitepaper

Chapter 12 — Limitations, Open Problems, Future Work

The honest chapter. What we haven’t solved, what we might reverse, what’s coming.

12.1 Engineering Limitations

12.1.1 pgvector Scale Ceiling

Per-tenant embeddings at ~100k volume: HNSW P95 < 120 ms. Past 5M, performance degrades. If any tenant hits 5M, evaluate Qdrant/Milvus migration or sharding.

12.1.2 Chinese Tokenization Long Tail

zh_parser (SCWS-based) misses new words, brand names, product names. We patch with synonym dictionaries — ongoing maintenance burden. Experimental alternative: LLM-time tokenization — better accuracy, +100 ms, higher cost.

12.1.3 No Multimodal Yet

Text only. Real knowledge is mixed:

CLIP-style multimodal embedding experimental; targeted 2026 Q3.

12.1.4 Single Region

Deployed only in AWS Tokyo. EU compliance needs EU region. Docker Compose architecture doesn’t support multi-region; needs K8s refactor.

12.2 Algorithmic Limitations

12.2.1 Wiki Compile LLM Bias

LLM-authored Wiki has systematic biases: Western-centric examples, inconsistent transliteration, stale post-training-cutoff knowledge. Our mitigations (strict “chunks-only” instruction, cross-chunk consistency lint) partially help but don’t fix the root.

12.2.2 RRF k=60 Is Empirical

The paper suggests 60, but gives no theoretical justification. We haven’t run sufficient A/B to validate it for Chinese. Worth revisiting.

12.2.3 Intent Classifier Drift

GPT-4o-mini misclassifies vague openings (“I was wondering…”). Knowledge → smalltalk means the customer gets a polite non-answer. Fix direction: expanded training set + confidence threshold + dual-path on low confidence.

12.2.4 Chinese NLI Availability

English NLI (DeBERTa-v3-NLI) is excellent. Chinese NLI quality varies; we use mDeBERTa-multi + human audit at ~85% accuracy. Production-grade Chinese NLI is an open problem.

12.3 Commercial Limitations

12.3.1 Pricing-Cost Misalignment

Current pricing by message count. Actual cost varies:

High-precision tenants underpay; low-precision overpay. 2026 Q3: precision-tier pricing.

12.3.2 Cross-Product Cost Attribution

Shared infra is wonderful; “GEO-triggered RAG repair token usage” is hard to attribute. Currently GEO API calls count against RAG tenant quota — financially imprecise.

12.3.3 Breaking Changes Cost

Upgrading embedding model (text-embedding-3-small-large) requires full re-embed. For a large tenant: USD 2,000+. We’ve deferred such upgrades — tech debt accumulates.

12.4 Open Problems

12.4.1 Wiki Fresh vs Stale Balance

How often to compile?

Current: fingerprint + weekly lint + manual trigger — no clean theory.

12.4.2 User Authority vs RAG Authority

Customer says “your website says CEO is Bob.” RAG Wiki says “Alice.” Who wins?

This is a trust chain problem with no engineering answer yet.

12.4.3 Does Long-Context LLM End RAG?

Claude 200k, Gemini 2M — tempting to “stuff everything in prompt.” Our position: RAG doesn’t die, it mutates.

L1 Wiki becomes the tool to align LLM attention precisely, rather than a substitute for vector retrieval.

12.4.4 Multimodal L1 Wiki

Text Wiki is natural. What is a Wiki for images / video / audio?

No unified answer.

12.5 Next 12-Month Roadmap

Tentative (subject to market feedback):

Quarter Item Priority
2026 Q2 Multimodal embedding (CLIP-style) High
2026 Q2 Rerank default-on evaluation Medium
2026 Q2 GEO ↔ RAG Wiki patch API launch High
2026 Q3 Precision-tier pricing High
2026 Q3 EU region deployment (K8s) Medium
2026 Q3 Japanese NLI self-training Medium
2026 Q4 Long-context + Wiki hybrid strategy Medium
2026 Q4 Self-hosted edition Low

12.5.1 Living Document

This book is a living document: minor versions each quarter, major annually. GitHub Issues capture reader feedback. Updates in CHANGELOG.md.


Key Takeaways

References


Navigation: ← Ch 11 · 📖 Contents · Appendix A →