This ~1,400-word executive summary distills the whitepaper’s five core contributions for readers short on time. The full English edition — 13 chapters + 5 appendices (~28,000 words) — is now available at
en/ch01-geo-era.mdonward. A Traditional Chinese edition (~30,000 characters) is the original source.
A practitioner’s report on building Baiyuan GEO Platform, a SaaS that measures and improves how brands are mentioned in generative-AI responses (ChatGPT, Claude, Gemini, Perplexity, and ~10 others). The book covers the algorithms, architectures, and engineering trade-offs we made between 2024 and 2026 while operating the platform for paying customers in Taiwan’s B2B and local-business markets.
It is not a product brochure. It is not a user manual. It is an engineering memoir — sometimes unflattering — written with the assumption that other teams will attempt similar systems and benefit from our misfires.
Generative AI has quietly rewritten the rules of brand visibility:
When a prospect asks “what are the best B2B CRM tools?”, the AI generates a paragraph containing a handful of brand names. Brands not in that paragraph simply do not exist in that customer’s decision path. Traditional SEO metrics — keyword rankings, backlinks, Core Web Vitals — measure a world that is collapsing.
We call the new discipline GEO (Generative Engine Optimization). It is not SEO’s next version; it is a parallel engineering problem with different inputs (AI response text), different levers (structured entities, fact-check records, multi-source trust signals), and different failure modes (hallucinations, model-version drift, training-data blind spots).
The whitepaper’s 13 chapters document six distinct contributions that, to our knowledge, have not been published together elsewhere (Ch 13 multimodal GEO and Appendix E platform branching are v1.1 additions):
A single “citation rate” metric conflates brands of very different health — equal mentions can mean “mentioned last, briefly, on two platforms” or “mentioned first, in depth, on seven platforms.” We decompose GEO health into seven orthogonal dimensions:
| Dimension | Measures |
|---|---|
| Citation Rate | Share of intent queries that mention the brand |
| Position Quality | Whether the brand appears at sentence start, middle, or tail |
| Query Coverage | Diversity of intent types (best-of, comparison, how-to, recommendation) |
| Platform Breadth | Fraction of the 15 monitored AI platforms that mention the brand |
| Sentiment | Directional tone of each mention |
| Content Depth | Length and factual density of the brand description |
| Consistency | Cross-platform standard deviation — proxy for whether AI consensus has converged |
Weights are deliberately undocumented in the book to discourage metric gaming — a principle borrowed from PageRank’s historical opacity.
Any system that calls external AI providers will experience partial outages — rate limits, regional failures, quota exhaustion. The naive response (counting failures as zero, or dropping failed providers from the denominator) conflates pipeline health with brand state and produces wildly noisy dashboards.
Our solution: detect when a platform scan fails 100%, look back up to 200 rows for the last successful score, carry that value forward with an explicit isStale flag, and surface “⚠ stale 14h” tooltips in the UI. Downstream analytics continue using the carried value. The pattern generalizes to any “high-frequency sampling, unreliable source” signal system (IoT, market data, social monitoring).
Modern customer websites are rendered for humans: client-side JavaScript, animated UI, cookie banners, tracker SDKs. AI bots (GPTBot, ClaudeBot, PerplexityBot, 22 others we track) cannot parse this cleanly. Our answer is AXP (AI-ready eXchange Page) — a three-layer clean document (pure HTML + Schema.org JSON-LD + RAG-ready Markdown) delivered at the CDN edge via a Cloudflare Worker that detects AI bot User-Agents and swaps in the shadow document.
The same URL serves two entirely different payloads depending on who is reading. Result from our 5-brand pilot: AI-bot traffic increased 3–5× within two weeks of deploying AXP; citation-rate improvement followed about 2–3 weeks later.
Schema.org is often treated as a few rich-result tags. We treat it as the entity-identity layer of an AI knowledge graph. Every brand entity is:
@type@id references — Organization → Service → Person (Physician / Attorney / etc. for specialized roles)sameAsA completeness algorithm with separate weight tables for physical vs online businesses drives a progressive-disclosure Wizard for new brands and a Dashboard banner for existing brands below 80% completion.
The industry’s typical approach — detect brand hallucinations, notify the customer — is an abdication. Customers don’t know how to fix AI hallucinations.
Our closed loop is fully automated:
ClaimReview node, inject into AXP + RAG knowledge base + (future) Google Business Profile LocalPostsCentral to this is a design principle we flag repeatedly: neutral is not a hallucination. Knowledge-source silence about a claim is not proof the claim is false; treating it as such generates cascading false positives and poisons the remediation cycle.
The platform runs on:
The whitepaper is deliberate about its limits:
Baiyuan Technology (百原科技) — a Taiwan-based B2B SaaS company. Lead engineer and author: Vincent Lin, CTO. The whitepaper is released under CC BY-NC 4.0: you may cite, translate, and build on the material for any non-commercial purpose. Commercial reuse — paid courses, commercial training datasets, bundled products — requires a license (contact services@baiyuan.io).
| Deliverable | Status |
|---|---|
| Traditional Chinese edition (zh-TW v1.0 draft) | ✅ Published |
| PDF editions (zh-TW / en / ja, auto-built on each main push) | ✅ Available on Releases |
| GitHub Pages web edition | ✅ Live at baiyuan-tech.github.io/geo-whitepaper |
| CITATION.cff + BibTeX | ✅ Ready (generic type per CFF 1.2.0 schema) |
| Full English edition (en/) | ✅ Complete — Executive Summary + 13 chapters + 5 appendices (~28,000 words) |
| Zenodo DOI registration | ⚪ Planned for v1.0 final |
| Inclusion in Google Scholar / Semantic Scholar | ⚪ After DOI |
The full English edition will not be a direct translation; chapter-level adaptation will be used where the source material assumes Taiwan-market context.
Cite this repository button (driven by CITATION.cff) — defaults to APA 7 and BibTeXen/, ja/, ko/ etc. are welcome@techreport{lin2026baiyuangeo,
author = {Lin, Vincent},
title = {Baiyuan GEO Platform: A Whitepaper on Building a SaaS for Generative Engine Optimization},
institution = {Baiyuan Technology},
year = {2026},
url = {https://github.com/baiyuan-tech/geo-whitepaper},
note = {v1.0-draft}
}
APA 7:
Lin, V. (2026). Baiyuan GEO Platform: A whitepaper on building a SaaS for generative engine optimization (v1.0-draft) [Technical report]. Baiyuan Technology. https://github.com/baiyuan-tech/geo-whitepaper
Navigation: 📖 Full repo index · Traditional Chinese edition (Ch 1) →