Baiyuan GEO Platform Whitepaper

Chapter 12 — Limitations, Open Problems, and Future Work

A tool that explicitly lists what it cannot do is more trustworthy than one that claims omniscience.

Table of Contents


12.1 What the platform cannot do

Fig 12-1: Current coverage matrix

Capability Coverage Gap
Monitoring Complete Only the 15 supported AI platforms; custom-deployed or private LLMs cannot be reached
Scoring Complete Cross-industry comparisons are not meaningful; query-space remains subjective
Structured data Complete Multi-language Schema.org only in zh-TW + en; Japanese, Korean, Southeast Asian pending
Hallucination detection Partial Depends on knowledge-source quality; coverage drops when sources are sparse
Hallucination remediation Partial Stubborn hallucinations still need human intervention
Automated closed loop Partial Search-type converges quickly, knowledge-type slowly; intermediate states are hard to feedback fully
External platform verification Restricted LinkedIn, Crunchbase, G2, Capterra have no public API; manual only
GBP integration Restricted Phase 2 API approval pending; only URL-to-Place-ID extraction available today

Fig 12-1: “Complete” = feature is comprehensive; “Partial” = core is there with known gaps; “Restricted” = blocked by external constraints.

Specific limits


12.2 Unpredictability of AI model version shifts

This is a problem we cannot fully solve from the engineering side. When OpenAI releases GPT-5, Anthropic releases Claude 4, or DeepSeek ships a new flagship, every brand’s score may shift 3–10 points simultaneously.

Three classes of version shift

Type Example Direction
Major model upgrade GPT-4o → GPT-5 Most brands rise (newer training data)
Safety / alignment tightening One vendor increases refusal rate Most brands fall (refusal masks citations)
Retrieval augmentation on/off Claude adds or removes web search Direction differs by brand based on web presence

Mitigation

Baiyuan cannot prevent these shifts, but three mechanisms reduce customer impact:

  1. Version-sensitivity banner — when a major version change is detected on a tracked AI platform, the UI displays “data is adapting to the new model; short-term volatility is expected.”
  2. Phase baseline cross-version tagging — baseline data across model versions is explicitly not comparable by raw numbers; the UI distinguishes them
  3. Weight-preserved historical comparison — internally we retain “score under specific version” for trend analysis, so version jumps are not misattributed to brand change

12.3 Open research problems

1. Real negative feedback vs hallucination error

When AI says “this brand has poor customer service,” it could be:

The handling differs drastically: hallucination should be corrected; real feedback should drive service improvement, not concealment. Baiyuan’s automation today cannot reliably tell them apart; human intervention is needed for source judgment. This is a real hole in the closed loop.

2. Causation vs correlation

A customer revises content; three weeks later, citation rate rises. Is this:

Rigorous causal proof would require A/B-testing infrastructure (half of the same brand revised, half not) — commercially not feasible. This is a shared research gap for the GEO field.

3. Long-tail query coverage strategy

Dynamic intent-query generation covers the main intent types with 20–60 queries. But long-tail queries (very specific, uncommon user questions) cannot be enumerated. When a customer says “my user asked XX and AI didn’t mention me,” is that:

Currently handled case-by-case. A future “customer-supplied intent queries” feature could help, but would introduce “customers only ask flattering questions” bias.


12.4 Roadmap

Fig 12-2: Future work dependency graph

flowchart LR
    subgraph Short["Short-term (within 6 months)"]
      A1[GBP API Phase 2-3<br/>read and write]
      A2[multi-language Schema.org<br/>extend to ja / ko]
      A3[visualization upgrade<br/>Phase baseline views]
    end
    subgraph Mid["Mid-term (6-12 months)"]
      B1[more AI platforms<br/>Mistral / Cohere deepening<br/>+ Claude Projects]
      B2[cross-language sameAs<br/>automation]
      B3[competitor co-occurrence advisor]
    end
    subgraph Long["Long-term (12+ months)"]
      C1[causal inference research<br/>A/B methodology]
      C2[private-LLM entity monitoring]
      C3[multi-tenant custom intent queries]
    end
    A1 --> B1
    A2 --> B2
    A3 --> C3

Fig 12-2: Three-phase roadmap. Each phase gates on the previous. Concrete timing depends on external factors (Google, specific AI vendors).

Short-term focus

Long-term targets


12.5 An invitation to practitioners and researchers

This book attempts to make GEO a discipline that can be discussed and advanced collectively, rather than the closed experience of a single vendor. To that end:

GEO is very early. This book aspires to be one of the first openly published technical documents in this field, so that later teams can start from the holes we already crawled out of rather than rediscovering each one independently.


Key takeaways

References


Navigation: ← Ch 11: Case Studies · 📖 Index · Executive Summary