PIF AI Whitepaper

Chapter 10: Central RAG Integration Architecture (Scheme C+)

PIF AI does not reinvent knowledge retrieval; it integrates the sibling Baiyuan central RAG v2 (rag.baiyuan.io) β€” a knowledge service using a L1 LLM Wiki + L2 vector RAG dual-layer retrieval architecture. This is one of the most important chapters: why Scheme C+ for isolation, how β€œ1 product = 1 KB” implements dual isolation, the meaning of the dual-header auth, and concrete fail-soft implementation.

πŸ“Œ Key Takeaways

10.1 Central RAG’s Dual-Layer Retrieval

10.1.1 L1 LLM Wiki: Compiled Knowledge

Wiki L1 is the first retrieval tier. Conceptually it’s a compiled wiki β€” an LLM pre-compiles KB content into structured entries, and each query matches the title and summary quickly. Hits return fast (no vector computation, no LLM synthesis).

Characteristics:

10.1.2 L2 Vector RAG: Semantic Retrieval

When L1 misses (query is novel or requires depth), the system falls back to L2 traditional vector RAG:

L2 is strong on depth and novelty β€” finding details not yet compiled into Wiki; weakness is latency and token cost.

10.1.3 L1 + L2 Hit Indicator

The central RAG returns a from_wiki field in /ask responses:

{
  "status": "success",
  "data": {
    "answer": "...",
    "from_wiki": true,     ← L1 hit
    "sources": [...],
    "response_time": 0.48  ← L1 is noticeably fast
  }
}

from_wiki: false means L2 fallback. PIF AI displays a small indicator icon so users (or SA) see which tier answered.

10.1.4 Architecture Diagram

flowchart TB
    Q["Query / ask"]
    L1{L1 Wiki<br/>hit?}
    L1A["L1 LLM Wiki<br/>compiled knowledge<br/>~0.5s"]
    L2["L2 Vector RAG<br/>chunk + embed + top-k<br/>~2-5s"]
    LLM["LLM synthesis"]
    R["Response<br/>{answer, from_wiki, sources}"]
    Q --> L1A
    L1A --> L1
    L1 -- yes --> R
    L1 -- no --> L2 --> LLM --> R

Figure 10.1: L1 Wiki retrieves first. On miss, L2 vector retrieval runs. Both layers share the same knowledge_base_id; L1 is a compiled cache of L2. PIF merely queries once; RAG automatically selects the tier.

10.2 Why Central RAG Over Self-Hosted

10.2.1 Avoiding Self-Host

Self-host need Central RAG provides
Vector DB (Weaviate / Qdrant / pgvector) βœ… Managed
Chunking strategy tuning βœ… Already optimized
Embedding model management + versioning βœ… Centralized
L1 Wiki compilation pipeline βœ… Auto-compiled
RAG quality evaluation βœ… Built up via sibling projects
Regulatory document intake + updates βœ… Handled by compliance team centrally

PIF AI focuses on the cosmetic domain; knowledge retrieval is delegated to specialists.

10.2.2 Synergy

10.3 Multi-Tenant Isolation: Scheme C+

PIF’s isolation requirements:

β€œDifferent organizations (tenants) must be isolated from each other; different products within the same organization must also be isolated.”

10.3.1 Three Candidate Schemes

Scheme Description Pros Cons
A Single tenant, single KB for all data Simplest Zero tenant / zero product isolation
B Per PIF org β†’ 1 RAG tenant; per product β†’ 1 KB DB-level tenant isolation RAG v2 has no tenant CRUD API β€” not feasible
C+ (chosen) Single tenant; per product β†’ 1 KB + backend ACL gate Feasible + application-level strict isolation Depends on rigorous PIF backend filtering

10.3.2 Scheme C+ Architecture

flowchart TB
    subgraph PIF["PIF Platform"]
        ORG1["Org A<br/>(brand)"]
        ORG2["Org B<br/>(brand)"]
        P1A["Product A1"]
        P2A["Product A2"]
        P3B["Product B1"]
        ORG1 --> P1A
        ORG1 --> P2A
        ORG2 --> P3B
    end
    subgraph RAG["RAG v2 (tenant = pif-prod)"]
        KB1["KB: pif_A_P1A"]
        KB2["KB: pif_A_P2A"]
        KB3["KB: pif_B_P3B"]
    end
    P1A -.rag_kb_id.-> KB1
    P2A -.rag_kb_id.-> KB2
    P3B -.rag_kb_id.-> KB3
    AC{PIF Backend ACL Gate}
    R["Any RAG call"]
    R --> AC
    AC -- "user.org_id β‰  product.org_id" --> DENY["403 Forbidden"]
    AC -- "pass" --> KB["Use products.rag_kb_id"]

Figure 10.2: The entire PIF platform is a single tenant (pif-prod) in RAG. Each PIF product gets a dedicated KB named pif_<org_id>_<product_id> with metadata {pif_org_id, pif_product_id}. The isolation is enforced not in RAG but at the PIF backend’s ACL gate: every RAG call must first SQL-filter WHERE org_id = user.org_id AND id = product_id to retrieve products.rag_kb_id. The frontend never passes a raw kb_id.

10.3.3 Four Layers of Defense

Combined with Β§8’s three-layer DB defense, PIF’s isolation is now four-layered:

Request β†’ L1 FastAPI ACL  β†’  L2 PostgreSQL RLS  β†’  L3 DB CHECK  β†’  L4 RAG KB per-product
         (explicit WHERE)    (current_setting)      (enum CHECK)    (pif_<org>_<prod>)

Any one layer failing still leaves three intact.

10.4 Authentication: X-RAG-API-Key + X-Tenant-ID

10.4.1 Why Two Headers

Unlike the older A1 version (single X-API-Key), RAG v2 mandates two headers:

POST /api/v1/ask HTTP/1.1
Host: rag.baiyuan.io
Content-Type: application/json
X-RAG-API-Key: <secret>
X-Tenant-ID: <uuid>

{"question": "...", "knowledge_base_id": "kb_..."}

Rationale:

Missing X-Tenant-ID returns HTTP 400 (not 401) β€” easy to misdiagnose.

10.4.2 PIF Credential Management

# /home/baiyuan/pif/.env (from env var, not committed)
RAG_API_BASE=https://rag.baiyuan.io
RAG_API_KEY=<secret>
RAG_TENANT_ID=<uuid-for-pif>
RAG_TIMEOUT_SECONDS=20
RAG_KB_NAME_PREFIX=pif

Loaded at FastAPI startup via pydantic_settings. Accessed as settings.RAG_API_KEY. Keys never enter git: .gitignore excludes .env; production uses Secret Manager.

10.5 Client Implementation

10.5.1 RagClient Architecture

# app/services/rag_client.py (excerpt)
class RagClient:
    _shared_client: httpx.AsyncClient | None = None

    @classmethod
    def _get_client(cls) -> httpx.AsyncClient:
        if cls._shared_client is None:
            cls._shared_client = httpx.AsyncClient(
                base_url=settings.RAG_API_BASE.rstrip("/"),
                timeout=httpx.Timeout(...),
                limits=httpx.Limits(max_keepalive_connections=10, max_connections=20),
            )
        return cls._shared_client

    @staticmethod
    def _headers() -> dict[str, str]:
        if not _is_configured():
            raise RagNotConfiguredError(...)
        return {
            "Content-Type": "application/json",
            "X-RAG-API-Key": settings.RAG_API_KEY.strip(),
            "X-Tenant-ID": settings.RAG_TENANT_ID.strip(),
        }

    @classmethod
    async def create_knowledge_base(
        cls, *, org_id, product_id, product_name=None
    ) -> KnowledgeBase:
        payload = {
            "name": _kb_name(org_id, product_id),  # pif_<org>_<prod>
            "metadata": {
                "pif_org_id": str(org_id),
                "pif_product_id": str(product_id),
                "pif_product_name": product_name or "",
                "source": "pif-ai",
            },
        }
        body = await cls._request("POST", "/api/v1/knowledge-bases", json=payload)
        return KnowledgeBase(id=body["data"]["id"], ...)

    @classmethod
    async def ask(cls, *, question: str, kb_id: str, ...) -> AskResult:
        if not (kb_id or "").strip():
            raise RagServiceError("kb_id required β€” PIF ACL must resolve it")
        payload = {
            "question": question.strip(),
            "knowledge_base_id": kb_id,
        }
        body = await cls._request("POST", "/api/v1/ask", json=payload)
        return AskResult(
            answer=body["data"]["answer"],
            sources=body["data"]["sources"],
            from_wiki=body["data"].get("from_wiki", False),  # L1 hit?
            raw=body["data"],
        )

10.5.2 Fail-Soft Wrapper

RagClient.create_knowledge_base(...) raises RagServiceError on failure. Product creation should not fail due to RAG, so safe_create_kb wraps:

async def safe_create_kb(*, org_id, product_id, product_name=None) -> str | None:
    """Attempt to create KB; return kb_id or None on failure (fail-soft)."""
    if not _is_configured():
        logger.info("RAG not configured β€” skipping KB creation for %s", product_id)
        return None
    try:
        kb = await RagClient.create_knowledge_base(
            org_id=org_id, product_id=product_id, product_name=product_name
        )
        return kb.id
    except RagServiceError as e:
        logger.warning("RAG create_kb failed for %s: %s", product_id, e)
        return None  # Product is still created; rag_kb_id=NULL

10.5.3 Products API Integration

# app/api/v1/products.py (excerpt)
@router.post("", response_model=ProductResponse, status_code=201)
async def create_product(...):
    # ... Create product local record ...
    product = Product(org_id=current_user.org_id, **payload.model_dump())
    db.add(product)
    await db.commit()
    await db.refresh(product)
    await initialize_pif_documents(product.id, db)

    # RAG KB creation (fail-soft)
    kb_id = await safe_create_kb(
        org_id=product.org_id,
        product_id=product.id,
        product_name=product.name,
    )
    if kb_id:
        product.rag_kb_id = kb_id
        await db.commit()
    return product

Delete mirrors: capture rag_kb_id β†’ delete locally β†’ asynchronously delete KB.

10.6 Testing: 16 Unit Tests All Passing

tests/test_rag_client.py uses httpx.MockTransport to verify:

  1. Both headers are correctly emitted
  2. KB naming matches pif_<org>_<prod>
  3. Metadata includes pif_org_id, pif_product_id, source=pif-ai
  4. Non-2xx raises RagServiceError with status_code
  5. 404 on delete is treated as already-gone (idempotent)
  6. ask rejects empty kb_id (ACL upstream responsibility)
  7. from_wiki field is correctly parsed
  8. When secrets not configured, safe_* are no-ops
  9. External errors are swallowed by safe_*, returning None

Measured: docker exec pif-backend-1 python -m pytest tests/test_rag_client.py -q completes 16 tests in 1.09s on 2026-04-19.

10.7 Future Extensions

In Phase 2 / Phase 3:

πŸ“š References

πŸ“ Revision History

Version Date Summary
v0.1 2026-04-19 First draft. L1 Wiki + L2 vector RAG, Scheme C+, dual-header auth, fail-soft, 16 unit tests

Β© 2026 Baiyuan Tech. Licensed under CC BY-NC 4.0.

Nav ← Chapter 9: Toxicology Pipeline Β· Chapter 11: Security Model β†’