Baiyuan GEO Platform Whitepaper

Chapter 11 — Five-Brand Field Observations: Six Weeks of Anonymized Data

Theory is not enough. Data validates. What follows are aggregated observations from operating five live pilot brands for ~6 weeks, customer names and identifiable numbers de-identified.

Table of Contents


11.1 Brand portraits (anonymized)

The five pilot brands span B2B, B2C, physical, and pure online configurations:

Code Industry Type Market language Entry GEO score
Brand A B2B SaaS (marketing tech) Online bilingual zh / en mid-tier
Brand B Professional financial services Online primarily English high-tier
Brand C B2B SaaS (knowledge management) Online bilingual zh / en mid-tier
Brand D Restaurant chain, physical Physical Chinese low-tier
Brand E Baiyuan Technology itself (dogfooding) Online bilingual zh / en low-tier (cold start)

Entry GEO scores shown as low/mid/high tiers to preserve relative structure while redacting absolute values.

Why these five as observation sample

With only 5 samples we cannot make statistical claims. This chapter presents observations, not conclusions — the aim is to convey the real shape of operation.


11.2 GEO score distribution

Across six weeks all brands saw movement on all seven dimensions.

Fig 11-1: Seven-dimension radar (Week 1 vs Week 6, anonymized aggregate)

%%{init: {'theme':'base'}}%%
graph TD
    W1["Week 1 aggregate average<br/>Citation: mid-low<br/>Position: mid<br/>Coverage: low<br/>Breadth: low<br/>Sentiment: neutral<br/>Depth: low<br/>Consistency: low"]
    W6["Week 6 aggregate average<br/>Citation: mid<br/>Position: mid-high<br/>Coverage: mid<br/>Breadth: mid<br/>Sentiment: neutral-positive<br/>Depth: mid<br/>Consistency: mid"]
    W1 -.->|6 weeks operation| W6

Fig 11-1: Every dimension improved. Coverage and Breadth improved most. Data shown as “low/mid/high” tiers, concrete numbers omitted.

Three observed patterns

  1. Position moves first, Citation follows — after Schema.org optimization, the position at which the AI mentions the brand shifts earlier in the response (from last-paragraph to top-third); only weeks later does the mention-count itself rise
  2. Coverage expands faster than Breadth — adding coverage of intent-query types (comparison, recommendation) is easier than expanding to additional AI platforms
  3. Consistency converges last — for a brand to look the same on ChatGPT and DeepSeek typically requires 4–8 weeks

11.3 Platform coverage asymmetry

The same brand’s citation rate varies dramatically across AI platforms. Aggregated across our 5 pilots, relative strength:

Fig 11-2: Platform coverage asymmetry (anonymized)

flowchart LR
    subgraph HighLang["English-strong brands (Brand A/B/C/E)"]
      H1[ChatGPT ✓✓]
      H2[Claude ✓✓]
      H3[Perplexity ✓]
      H4[DeepSeek ✗]
      H5[Kimi ✗]
    end
    subgraph LocalLang["Chinese-local brand (Brand D)"]
      L1[ChatGPT ✓]
      L2[Perplexity ✗]
      L3[AI Overview ✓]
      L4[DeepSeek ✓]
      L5[Kimi ✓]
    end

Fig 11-2: ✓✓ = materially cited; ✓ = mentioned; ✗ = near-zero. English-language B2B brands perform on US-origin AI; Chinese-local brands perform on Chinese models and Google AI Overview.

Takeaways


11.4 Schema.org completeness and citation rate

Fig 11-3: Completeness × citation-rate delta (aggregated)

%%{init: {'theme':'base'}}%%
xychart-beta
    title "Completion (%) vs citation-rate delta after 6 weeks (illustrative)"
    x-axis ["< 40%", "40-60%", "60-80%", "80-100%"]
    y-axis "Citation rate delta (relative)" 0 --> 100
    bar [10, 25, 55, 80]

Fig 11-3: Citation-rate improvement over 6 weeks by completeness bucket, shown in relative terms. Brands above 80% completeness saw the largest citation gains.

Observations

Operational takeaways


11.5 AXP deployment before/after

Among the five pilots, three (A / B / C) only deployed AXP in Weeks 2–3, while two (D / E) had it active from Week 1. The time lag lets us observe the isolated effect of AXP.

Fig 11-4: AI bot traffic before/after AXP (anonymized aggregate)

%%{init: {'theme':'base'}}%%
xychart-beta
    title "AI bot daily visits (before vs after AXP, relative index)"
    x-axis ["Wk 1", "Wk 2", "Wk 3 (AXP deploy)", "Wk 4", "Wk 5", "Wk 6"]
    y-axis "AI bot visits (indexed)" 0 --> 200
    bar [25, 30, 35, 95, 140, 165]

Fig 11-4: AI bot traffic climbs rapidly in the week AXP is deployed. Index 100 = pre-deployment average.

Observations

But — bot traffic rising does not guarantee citation rate rising. If the AXP content itself is semantically thin, the AI has nothing to work with. AXP is infrastructure for being seen, not a silver bullet.


11.6 Customer-side pitfalls

The customer-side operations produced five common pitfalls.

Fig 11-5: Pitfall distribution (aggregated across 5 brands, 6 weeks)

pie title "Customer-side pitfalls (5 brands, 6 weeks)"
    "industry_code not filled" : 28
    "logo_url missing or broken" : 22
    "GBP not connected (physical)" : 18
    "description too short (<20 chars)" : 17
    "sameAs external links unfilled" : 15

Fig 11-5: Missing industry classification is the most common pitfall; affects Schema.org @type selection.

Shared root causes

UI feedback loop

Based on these observations, we made product changes:


11.7 Three unexpected findings

Three observations we did not anticipate but are worth recording.

1. Bilingual brands are surprisingly fragile on cross-language consistency

Brands A / C / E are all bilingual (zh / en). The AI’s description of the same brand in Chinese vs English queries was often materially different. For example:

This is not hallucination — both descriptions are partially true. It is that AI’s ability to aggregate the same entity across languages is still immature. The takeaway for brand owners: Chinese and English Schema.org records should explicitly link each other via sameAs, and the descriptions should be semantically equivalent rather than independently written.

2. “No citation” is a harder problem than “negative citation”

We expected negative AI statements to be the biggest threat. The operational reality is the opposite: “AI does not mention the brand at all” is a worse problem. A negative mention at least proves the AI knows the entity exists and can be corrected via ClaimReview. Complete absence means the brand is not in the candidate pool — there is no handle to grab.

This explains why we set Citation Rate weight at 25% (see Ch 3) rather than higher — the metric matters, but if it dominates the total, other dimensions get marginalized.

3. Competitor co-occurrence can be a positive signal

Conventional wisdom: “competitors showing up in the same AI answer dilutes your visibility.” We observed the opposite. Being listed next to the right competitors reinforces the brand’s category identity. Brand A, early on, co-occurred with two well-known large competitors. While Citation Rate was modest, being bracketed at “the same tier” mentally by end-users turned into unusually strong downstream conversion.

More data needed to confirm this. But it suggests GEO’s notion of “friend-vs-foe” may run opposite to traditional SEO’s “competitors.” In the AI era, being placed alongside the right brands may matter more than being named alone.


11.8 First-month commercial validation

The five brands above were a mix of internal dogfooding and partner pilots. In parallel, Baiyuan GEO as a paid SaaS closed three commercial customers within its first paid month, covering industries unlike the pilots:

Industry Shape Primary driver
Chain medical aesthetics Multi-location physical, highly competitive category GBP integration + physical LocalBusiness Schema.org + medical-grade hallucination detection
Emerging chain restaurant Multi-location physical, rapid expansion Location-level AXP + Phase baseline to capture expansion-period change
Premium aromatherapy / yoga High ACV, word-of-mouth driven Content Depth + Sentiment as the lead dimensions (narrative quality > citation frequency)

Observations

These customers are fresh at time of writing; detailed operational data will appear in a future revision. The point here: the engineering design of Baiyuan GEO is validated not only internally but by paid external demand.


Key takeaways

References


Navigation: ← Ch 10: Phase Baseline Testing · 📖 Index · Ch 12: Limitations and Future Work →