Theory is not enough. Data validates. What follows are aggregated observations from operating five live pilot brands for ~6 weeks, customer names and identifiable numbers de-identified.
The five pilot brands span B2B, B2C, physical, and pure online configurations:
| Code | Industry | Type | Market language | Entry GEO score |
|---|---|---|---|---|
| Brand A | B2B SaaS (marketing tech) | Online | bilingual zh / en | mid-tier |
| Brand B | Professional financial services | Online | primarily English | high-tier |
| Brand C | B2B SaaS (knowledge management) | Online | bilingual zh / en | mid-tier |
| Brand D | Restaurant chain, physical | Physical | Chinese | low-tier |
| Brand E | Baiyuan Technology itself (dogfooding) | Online | bilingual zh / en | low-tier (cold start) |
Entry GEO scores shown as low/mid/high tiers to preserve relative structure while redacting absolute values.
With only 5 samples we cannot make statistical claims. This chapter presents observations, not conclusions — the aim is to convey the real shape of operation.
Across six weeks all brands saw movement on all seven dimensions.
%%{init: {'theme':'base'}}%%
graph TD
W1["Week 1 aggregate average<br/>Citation: mid-low<br/>Position: mid<br/>Coverage: low<br/>Breadth: low<br/>Sentiment: neutral<br/>Depth: low<br/>Consistency: low"]
W6["Week 6 aggregate average<br/>Citation: mid<br/>Position: mid-high<br/>Coverage: mid<br/>Breadth: mid<br/>Sentiment: neutral-positive<br/>Depth: mid<br/>Consistency: mid"]
W1 -.->|6 weeks operation| W6
Fig 11-1: Every dimension improved. Coverage and Breadth improved most. Data shown as “low/mid/high” tiers, concrete numbers omitted.
The same brand’s citation rate varies dramatically across AI platforms. Aggregated across our 5 pilots, relative strength:
flowchart LR
subgraph HighLang["English-strong brands (Brand A/B/C/E)"]
H1[ChatGPT ✓✓]
H2[Claude ✓✓]
H3[Perplexity ✓]
H4[DeepSeek ✗]
H5[Kimi ✗]
end
subgraph LocalLang["Chinese-local brand (Brand D)"]
L1[ChatGPT ✓]
L2[Perplexity ✗]
L3[AI Overview ✓]
L4[DeepSeek ✓]
L5[Kimi ✓]
end
Fig 11-2: ✓✓ = materially cited; ✓ = mentioned; ✗ = near-zero. English-language B2B brands perform on US-origin AI; Chinese-local brands perform on Chinese models and Google AI Overview.
%%{init: {'theme':'base'}}%%
xychart-beta
title "Completion (%) vs citation-rate delta after 6 weeks (illustrative)"
x-axis ["< 40%", "40-60%", "60-80%", "80-100%"]
y-axis "Citation rate delta (relative)" 0 --> 100
bar [10, 25, 55, 80]
Fig 11-3: Citation-rate improvement over 6 weeks by completeness bucket, shown in relative terms. Brands above 80% completeness saw the largest citation gains.
Among the five pilots, three (A / B / C) only deployed AXP in Weeks 2–3, while two (D / E) had it active from Week 1. The time lag lets us observe the isolated effect of AXP.
%%{init: {'theme':'base'}}%%
xychart-beta
title "AI bot daily visits (before vs after AXP, relative index)"
x-axis ["Wk 1", "Wk 2", "Wk 3 (AXP deploy)", "Wk 4", "Wk 5", "Wk 6"]
y-axis "AI bot visits (indexed)" 0 --> 200
bar [25, 30, 35, 95, 140, 165]
Fig 11-4: AI bot traffic climbs rapidly in the week AXP is deployed. Index 100 = pre-deployment average.
But — bot traffic rising does not guarantee citation rate rising. If the AXP content itself is semantically thin, the AI has nothing to work with. AXP is infrastructure for being seen, not a silver bullet.
The customer-side operations produced five common pitfalls.
pie title "Customer-side pitfalls (5 brands, 6 weeks)"
"industry_code not filled" : 28
"logo_url missing or broken" : 22
"GBP not connected (physical)" : 18
"description too short (<20 chars)" : 17
"sameAs external links unfilled" : 15
Fig 11-5: Missing industry classification is the most common pitfall; affects Schema.org @type selection.
Based on these observations, we made product changes:
logo_url field added live validation (HEAD request verifying 200 when pasted)Three observations we did not anticipate but are worth recording.
Brands A / C / E are all bilingual (zh / en). The AI’s description of the same brand in Chinese vs English queries was often materially different. For example:
This is not hallucination — both descriptions are partially true. It is that AI’s ability to aggregate the same entity across languages is still immature. The takeaway for brand owners: Chinese and English Schema.org records should explicitly link each other via sameAs, and the descriptions should be semantically equivalent rather than independently written.
We expected negative AI statements to be the biggest threat. The operational reality is the opposite: “AI does not mention the brand at all” is a worse problem. A negative mention at least proves the AI knows the entity exists and can be corrected via ClaimReview. Complete absence means the brand is not in the candidate pool — there is no handle to grab.
This explains why we set Citation Rate weight at 25% (see Ch 3) rather than higher — the metric matters, but if it dominates the total, other dimensions get marginalized.
Conventional wisdom: “competitors showing up in the same AI answer dilutes your visibility.” We observed the opposite. Being listed next to the right competitors reinforces the brand’s category identity. Brand A, early on, co-occurred with two well-known large competitors. While Citation Rate was modest, being bracketed at “the same tier” mentally by end-users turned into unusually strong downstream conversion.
More data needed to confirm this. But it suggests GEO’s notion of “friend-vs-foe” may run opposite to traditional SEO’s “competitors.” In the AI era, being placed alongside the right brands may matter more than being named alone.
The five brands above were a mix of internal dogfooding and partner pilots. In parallel, Baiyuan GEO as a paid SaaS closed three commercial customers within its first paid month, covering industries unlike the pilots:
| Industry | Shape | Primary driver |
|---|---|---|
| Chain medical aesthetics | Multi-location physical, highly competitive category | GBP integration + physical LocalBusiness Schema.org + medical-grade hallucination detection |
| Emerging chain restaurant | Multi-location physical, rapid expansion | Location-level AXP + Phase baseline to capture expansion-period change |
| Premium aromatherapy / yoga | High ACV, word-of-mouth driven | Content Depth + Sentiment as the lead dimensions (narrative quality > citation frequency) |
These customers are fresh at time of writing; detailed operational data will appear in a future revision. The point here: the engineering design of Baiyuan GEO is validated not only internally but by paid external demand.
Navigation: ← Ch 10: Phase Baseline Testing · 📖 Index · Ch 12: Limitations and Future Work →