Baiyuan GEO Platform Whitepaper

Baiyuan GEO Platform Whitepaper — Executive Summary

This ~1,400-word executive summary distills the whitepaper’s five core contributions for readers short on time. The full English edition — 13 chapters + 5 appendices (~28,000 words) — is now available at en/ch01-geo-era.md onward. A Traditional Chinese edition (~30,000 characters) is the original source.

License: CC BY-NC 4.0 Full zh-TW edition PDF

1. What This Book Is

A practitioner’s report on building Baiyuan GEO Platform, a SaaS that measures and improves how brands are mentioned in generative-AI responses (ChatGPT, Claude, Gemini, Perplexity, and ~10 others). The book covers the algorithms, architectures, and engineering trade-offs we made between 2024 and 2026 while operating the platform for paying customers in Taiwan’s B2B and local-business markets.

It is not a product brochure. It is not a user manual. It is an engineering memoir — sometimes unflattering — written with the assumption that other teams will attempt similar systems and benefit from our misfires.

2. Why It Matters

Generative AI has quietly rewritten the rules of brand visibility:

When a prospect asks “what are the best B2B CRM tools?”, the AI generates a paragraph containing a handful of brand names. Brands not in that paragraph simply do not exist in that customer’s decision path. Traditional SEO metrics — keyword rankings, backlinks, Core Web Vitals — measure a world that is collapsing.

We call the new discipline GEO (Generative Engine Optimization). It is not SEO’s next version; it is a parallel engineering problem with different inputs (AI response text), different levers (structured entities, fact-check records, multi-source trust signals), and different failure modes (hallucinations, model-version drift, training-data blind spots).

3. Core Technical Contributions

The whitepaper’s 13 chapters document six distinct contributions that, to our knowledge, have not been published together elsewhere (Ch 13 multimodal GEO and Appendix E platform branching are v1.1 additions):

3.1 Seven-Dimension Citation Scoring

A single “citation rate” metric conflates brands of very different health — equal mentions can mean “mentioned last, briefly, on two platforms” or “mentioned first, in depth, on seven platforms.” We decompose GEO health into seven orthogonal dimensions:

Dimension Measures
Citation Rate Share of intent queries that mention the brand
Position Quality Whether the brand appears at sentence start, middle, or tail
Query Coverage Diversity of intent types (best-of, comparison, how-to, recommendation)
Platform Breadth Fraction of the 15 monitored AI platforms that mention the brand
Sentiment Directional tone of each mention
Content Depth Length and factual density of the brand description
Consistency Cross-platform standard deviation — proxy for whether AI consensus has converged

Weights are deliberately undocumented in the book to discourage metric gaming — a principle borrowed from PageRank’s historical opacity.

3.2 Stale Carry-Forward for Signal Continuity

Any system that calls external AI providers will experience partial outages — rate limits, regional failures, quota exhaustion. The naive response (counting failures as zero, or dropping failed providers from the denominator) conflates pipeline health with brand state and produces wildly noisy dashboards.

Our solution: detect when a platform scan fails 100%, look back up to 200 rows for the last successful score, carry that value forward with an explicit isStale flag, and surface “⚠ stale 14h” tooltips in the UI. Downstream analytics continue using the carried value. The pattern generalizes to any “high-frequency sampling, unreliable source” signal system (IoT, market data, social monitoring).

3.3 AXP — A Shadow Document Protocol for AI Bots

Modern customer websites are rendered for humans: client-side JavaScript, animated UI, cookie banners, tracker SDKs. AI bots (GPTBot, ClaudeBot, PerplexityBot, 22 others we track) cannot parse this cleanly. Our answer is AXP (AI-ready eXchange Page) — a three-layer clean document (pure HTML + Schema.org JSON-LD + RAG-ready Markdown) delivered at the CDN edge via a Cloudflare Worker that detects AI bot User-Agents and swaps in the shadow document.

The same URL serves two entirely different payloads depending on who is reading. Result from our 5-brand pilot: AI-bot traffic increased 3–5× within two weeks of deploying AXP; citation-rate improvement followed about 2–3 weeks later.

3.4 Twenty-Five Industry × Three-Layer @id Schema.org

Schema.org is often treated as a few rich-result tags. We treat it as the entity-identity layer of an AI knowledge graph. Every brand entity is:

A completeness algorithm with separate weight tables for physical vs online businesses drives a progressive-disclosure Wizard for new brands and a Dashboard banner for existing brands below 80% completion.

3.5 Closed-Loop Hallucination Remediation

The industry’s typical approach — detect brand hallucinations, notify the customer — is an abdication. Customers don’t know how to fix AI hallucinations.

Our closed loop is fully automated:

  1. Detect — Extract atomic claims from AI responses, run three-way Natural Language Inference (entailment / contradiction / neutral / opinion) against combined knowledge sources (website scrape + RAG knowledge base + manual ground truth)
  2. Validate — ChainPoll vote (3× LLM re-runs) for uncertain classifications (confidence 0.5–0.8)
  3. Remediate — For confirmed contradictions, generate a Schema.org ClaimReview node, inject into AXP + RAG knowledge base + (future) Google Business Profile LocalPosts
  4. Verify — Two-tier rescan: 4-hour sentinel against search-type AI (Perplexity, ChatGPT Search, AI Overview) + 24-hour full scan against knowledge-type AI (Claude, Gemini, DeepSeek, Kimi, etc.)
  5. Converge — A hallucination is declared resolved only after N consecutive scans confirm absence, including equivalent paraphrases

Central to this is a design principle we flag repeatedly: neutral is not a hallucination. Knowledge-source silence about a claim is not proof the claim is false; treating it as such generates cascading false positives and poisons the remediation cycle.

4. Architecture Snapshot

The platform runs on:

5. What This Is Not

The whitepaper is deliberate about its limits:

6. Who Wrote This

Baiyuan Technology (百原科技) — a Taiwan-based B2B SaaS company. Lead engineer and author: Vincent Lin, CTO. The whitepaper is released under CC BY-NC 4.0: you may cite, translate, and build on the material for any non-commercial purpose. Commercial reuse — paid courses, commercial training datasets, bundled products — requires a license (contact services@baiyuan.io).

7. Status and Roadmap

Deliverable Status
Traditional Chinese edition (zh-TW v1.0 draft) ✅ Published
PDF editions (zh-TW / en / ja, auto-built on each main push) Available on Releases
GitHub Pages web edition ✅ Live at baiyuan-tech.github.io/geo-whitepaper
CITATION.cff + BibTeX ✅ Ready (generic type per CFF 1.2.0 schema)
Full English edition (en/) ✅ Complete — Executive Summary + 13 chapters + 5 appendices (~28,000 words)
Zenodo DOI registration ⚪ Planned for v1.0 final
Inclusion in Google Scholar / Semantic Scholar ⚪ After DOI

The full English edition will not be a direct translation; chapter-level adaptation will be used where the source material assumes Taiwan-market context.

8. How to Engage

9. Citation

@techreport{lin2026baiyuangeo,
  author      = {Lin, Vincent},
  title       = {Baiyuan GEO Platform: A Whitepaper on Building a SaaS for Generative Engine Optimization},
  institution = {Baiyuan Technology},
  year        = {2026},
  url         = {https://github.com/baiyuan-tech/geo-whitepaper},
  note        = {v1.0-draft}
}

APA 7:

Lin, V. (2026). Baiyuan GEO Platform: A whitepaper on building a SaaS for generative engine optimization (v1.0-draft) [Technical report]. Baiyuan Technology. https://github.com/baiyuan-tech/geo-whitepaper


Navigation: 📖 Full repo index · Traditional Chinese edition (Ch 1) →