PIF AI does not reinvent knowledge retrieval; it integrates the sibling Baiyuan central RAG v2 (
rag.baiyuan.io) β a knowledge service using a L1 LLM Wiki + L2 vector RAG dual-layer retrieval architecture. This is one of the most important chapters: why Scheme C+ for isolation, how β1 product = 1 KBβ implements dual isolation, the meaning of the dual-header auth, and concrete fail-soft implementation.
tenant_id; one KB per product + backend ACL gateX-RAG-API-Key + X-Tenant-ID (unlike the older A1)rag_kb_id stays NULL for back-fillapp/services/rag_client.py + 16 unit tests all passingWiki L1 is the first retrieval tier. Conceptually itβs a compiled wiki β an LLM pre-compiles KB content into structured entries, and each query matches the title and summary quickly. Hits return fast (no vector computation, no LLM synthesis).
Characteristics:
/knowledge-bases/{id}/wiki/compileWhen L1 misses (query is novel or requires depth), the system falls back to L2 traditional vector RAG:
L2 is strong on depth and novelty β finding details not yet compiled into Wiki; weakness is latency and token cost.
The central RAG returns a from_wiki field in /ask responses:
{
"status": "success",
"data": {
"answer": "...",
"from_wiki": true, β L1 hit
"sources": [...],
"response_time": 0.48 β L1 is noticeably fast
}
}
from_wiki: false means L2 fallback. PIF AI displays a small indicator icon so users (or SA) see which tier answered.
flowchart TB
Q["Query / ask"]
L1{L1 Wiki<br/>hit?}
L1A["L1 LLM Wiki<br/>compiled knowledge<br/>~0.5s"]
L2["L2 Vector RAG<br/>chunk + embed + top-k<br/>~2-5s"]
LLM["LLM synthesis"]
R["Response<br/>{answer, from_wiki, sources}"]
Q --> L1A
L1A --> L1
L1 -- yes --> R
L1 -- no --> L2 --> LLM --> R
Figure 10.1: L1 Wiki retrieves first. On miss, L2 vector retrieval runs. Both layers share the same knowledge_base_id; L1 is a compiled cache of L2. PIF merely queries once; RAG automatically selects the tier.
| Self-host need | Central RAG provides |
|---|---|
| Vector DB (Weaviate / Qdrant / pgvector) | β Managed |
| Chunking strategy tuning | β Already optimized |
| Embedding model management + versioning | β Centralized |
| L1 Wiki compilation pipeline | β Auto-compiled |
| RAG quality evaluation | β Built up via sibling projects |
| Regulatory document intake + updates | β Handled by compliance team centrally |
PIF AI focuses on the cosmetic domain; knowledge retrieval is delegated to specialists.
PIFβs isolation requirements:
βDifferent organizations (tenants) must be isolated from each other; different products within the same organization must also be isolated.β
| Scheme | Description | Pros | Cons |
|---|---|---|---|
| A | Single tenant, single KB for all data | Simplest | Zero tenant / zero product isolation |
| B | Per PIF org β 1 RAG tenant; per product β 1 KB | DB-level tenant isolation | RAG v2 has no tenant CRUD API β not feasible |
| C+ (chosen) | Single tenant; per product β 1 KB + backend ACL gate | Feasible + application-level strict isolation | Depends on rigorous PIF backend filtering |
flowchart TB
subgraph PIF["PIF Platform"]
ORG1["Org A<br/>(brand)"]
ORG2["Org B<br/>(brand)"]
P1A["Product A1"]
P2A["Product A2"]
P3B["Product B1"]
ORG1 --> P1A
ORG1 --> P2A
ORG2 --> P3B
end
subgraph RAG["RAG v2 (tenant = pif-prod)"]
KB1["KB: pif_A_P1A"]
KB2["KB: pif_A_P2A"]
KB3["KB: pif_B_P3B"]
end
P1A -.rag_kb_id.-> KB1
P2A -.rag_kb_id.-> KB2
P3B -.rag_kb_id.-> KB3
AC{PIF Backend ACL Gate}
R["Any RAG call"]
R --> AC
AC -- "user.org_id β product.org_id" --> DENY["403 Forbidden"]
AC -- "pass" --> KB["Use products.rag_kb_id"]
Figure 10.2: The entire PIF platform is a single tenant (pif-prod) in RAG. Each PIF product gets a dedicated KB named pif_<org_id>_<product_id> with metadata {pif_org_id, pif_product_id}. The isolation is enforced not in RAG but at the PIF backendβs ACL gate: every RAG call must first SQL-filter WHERE org_id = user.org_id AND id = product_id to retrieve products.rag_kb_id. The frontend never passes a raw kb_id.
Combined with Β§8βs three-layer DB defense, PIFβs isolation is now four-layered:
Request β L1 FastAPI ACL β L2 PostgreSQL RLS β L3 DB CHECK β L4 RAG KB per-product
(explicit WHERE) (current_setting) (enum CHECK) (pif_<org>_<prod>)
Any one layer failing still leaves three intact.
Unlike the older A1 version (single X-API-Key), RAG v2 mandates two headers:
POST /api/v1/ask HTTP/1.1
Host: rag.baiyuan.io
Content-Type: application/json
X-RAG-API-Key: <secret>
X-Tenant-ID: <uuid>
{"question": "...", "knowledge_base_id": "kb_..."}
Rationale:
X-RAG-API-Key handles authentication (valid client?)X-Tenant-ID handles tenant routing (quota, KB visibility scope)Missing X-Tenant-ID returns HTTP 400 (not 401) β easy to misdiagnose.
# /home/baiyuan/pif/.env (from env var, not committed)
RAG_API_BASE=https://rag.baiyuan.io
RAG_API_KEY=<secret>
RAG_TENANT_ID=<uuid-for-pif>
RAG_TIMEOUT_SECONDS=20
RAG_KB_NAME_PREFIX=pif
Loaded at FastAPI startup via pydantic_settings. Accessed as settings.RAG_API_KEY. Keys never enter git: .gitignore excludes .env; production uses Secret Manager.
# app/services/rag_client.py (excerpt)
class RagClient:
_shared_client: httpx.AsyncClient | None = None
@classmethod
def _get_client(cls) -> httpx.AsyncClient:
if cls._shared_client is None:
cls._shared_client = httpx.AsyncClient(
base_url=settings.RAG_API_BASE.rstrip("/"),
timeout=httpx.Timeout(...),
limits=httpx.Limits(max_keepalive_connections=10, max_connections=20),
)
return cls._shared_client
@staticmethod
def _headers() -> dict[str, str]:
if not _is_configured():
raise RagNotConfiguredError(...)
return {
"Content-Type": "application/json",
"X-RAG-API-Key": settings.RAG_API_KEY.strip(),
"X-Tenant-ID": settings.RAG_TENANT_ID.strip(),
}
@classmethod
async def create_knowledge_base(
cls, *, org_id, product_id, product_name=None
) -> KnowledgeBase:
payload = {
"name": _kb_name(org_id, product_id), # pif_<org>_<prod>
"metadata": {
"pif_org_id": str(org_id),
"pif_product_id": str(product_id),
"pif_product_name": product_name or "",
"source": "pif-ai",
},
}
body = await cls._request("POST", "/api/v1/knowledge-bases", json=payload)
return KnowledgeBase(id=body["data"]["id"], ...)
@classmethod
async def ask(cls, *, question: str, kb_id: str, ...) -> AskResult:
if not (kb_id or "").strip():
raise RagServiceError("kb_id required β PIF ACL must resolve it")
payload = {
"question": question.strip(),
"knowledge_base_id": kb_id,
}
body = await cls._request("POST", "/api/v1/ask", json=payload)
return AskResult(
answer=body["data"]["answer"],
sources=body["data"]["sources"],
from_wiki=body["data"].get("from_wiki", False), # L1 hit?
raw=body["data"],
)
RagClient.create_knowledge_base(...) raises RagServiceError on failure. Product creation should not fail due to RAG, so safe_create_kb wraps:
async def safe_create_kb(*, org_id, product_id, product_name=None) -> str | None:
"""Attempt to create KB; return kb_id or None on failure (fail-soft)."""
if not _is_configured():
logger.info("RAG not configured β skipping KB creation for %s", product_id)
return None
try:
kb = await RagClient.create_knowledge_base(
org_id=org_id, product_id=product_id, product_name=product_name
)
return kb.id
except RagServiceError as e:
logger.warning("RAG create_kb failed for %s: %s", product_id, e)
return None # Product is still created; rag_kb_id=NULL
# app/api/v1/products.py (excerpt)
@router.post("", response_model=ProductResponse, status_code=201)
async def create_product(...):
# ... Create product local record ...
product = Product(org_id=current_user.org_id, **payload.model_dump())
db.add(product)
await db.commit()
await db.refresh(product)
await initialize_pif_documents(product.id, db)
# RAG KB creation (fail-soft)
kb_id = await safe_create_kb(
org_id=product.org_id,
product_id=product.id,
product_name=product.name,
)
if kb_id:
product.rag_kb_id = kb_id
await db.commit()
return product
Delete mirrors: capture rag_kb_id β delete locally β asynchronously delete KB.
tests/test_rag_client.py uses httpx.MockTransport to verify:
pif_<org>_<prod>pif_org_id, pif_product_id, source=pif-aiRagServiceError with status_codeask rejects empty kb_id (ACL upstream responsibility)from_wiki field is correctly parsedsafe_* are no-opssafe_*, returning NoneMeasured: docker exec pif-backend-1 python -m pytest tests/test_rag_client.py -q completes 16 tests in 1.09s on 2026-04-19.
In Phase 2 / Phase 3:
kb_id); AI generation prefers these| Version | Date | Summary |
|---|---|---|
| v0.1 | 2026-04-19 | First draft. L1 Wiki + L2 vector RAG, Scheme C+, dual-header auth, fail-soft, 16 unit tests |
Β© 2026 Baiyuan Tech. Licensed under CC BY-NC 4.0.
Nav β Chapter 9: Toxicology Pipeline Β· Chapter 11: Security Model β