This chapter opens with a one-sentence abstract, then unfolds PIF AI’s four design propositions, gives a five-layer system overview, and enumerates the four academic and engineering contributions of this whitepaper. After reading this chapter, you should be able to explain — in under three minutes — “what PIF AI is, why it exists, and how it works.”
PIF AI is a multi-tenant SaaS that compresses Taiwan-regulated cosmetic PIF documentation from 4–8 weeks to 3–5 business days via AI document extraction, cross-query against international toxicology databases, real-time regulatory matching against Taiwan’s TFDA requirements, and an online Safety Assessor (SA) review workflow with electronic signature.
This sentence compresses three layers of information: user pain (4–8 weeks is too slow), technical solution (AI extraction + toxicology lookup + regulatory matching), and organizational compliance (SA sign-off is legally required). Chapter 2 goes deep on the regulatory background; Chapter 3 dissects the 16 items; subsequent chapters explore the technical implementation.
Taiwan’s Cosmetic Hygiene and Safety Act (the “Act”) took phased effect from 2019; the final transition window ends on July 1, 2026. After that date:
Taiwan’s cosmetic sector (brand owners, contract manufacturers, importers, consultants/testing labs) is estimated to exceed 8,000 entities. Per TFDA registration data and public statistics from the Taiwan Cosmetic Industry Association (TCIIA), more than 100,000 SKUs are on the market1.
Three major cost drivers (order-of-magnitude estimates):
| Cost item | Share | Why |
|---|---|---|
| Toxicology lookup & translation | 35% | Each ingredient requires querying PubChem, translating SDS, cross-checking TFDA lists |
| SA professional fees | 35% | Item 16 safety assessment must be signed by a qualified SA |
| Document assembly & regulatory cross-reference | 20% | 16 items scattered across multiple source documents |
| Coordination & admin | 10% | Gathering formulations, test reports across departments |
[!IMPORTANT] These percentages are industry observations, not figures established by formal peer-reviewed study. See Appendix C for the methodological note.
Large brands have in-house regulatory teams. Small businesses face a triple bind: time they cannot afford, costs they cannot bear, expertise they cannot find. This is precisely the SaaS-AI market opportunity — make professional knowledge affordable and accessible so SMEs can meet compliance within reasonable cost.
The 16 PIF items are largely a cross-document structured-information assembly problem: formulation (Excel), GMP certificate (PDF), test reports (various), regulatory lists (HTML/PDF), toxicology databases (JSON API). The bottleneck is data alignment and verification, not “writing ability.”
This is exactly what LLM Tool Use excels at: the LLM acts as a coordinator calling structured tools rather than stuffing all computation into its token context.
Evidence:
app/ai/toxicology_engine.pyuses Claude’s Tool Use pattern. The LLM invokes function signatures likepubchem.query,tfda.check_restricted,db.lookup_incito obtain structured results rather than free-form generation. See §7 for details.
Every AI output is labeled as a reference draft; the final professional judgment rests with a qualified SA signature. This is not a disclaimer — it is a design principle:
At the database layer this materializes as the pif_documents.status state machine: missing → uploaded → ai_processing → ai_draft_ready → human_reviewed → approved. AI never marks a document approved; that privilege belongs exclusively to the SA.
Cosmetic formulations are trade secrets. Brand A’s formulation must never leak to Brand B. PIF AI provides three layers of isolation:
flowchart LR
R["HTTP Request"]
L1["Layer 1<br/>FastAPI ACL<br/>org_id filter"]
L2["Layer 2<br/>PostgreSQL<br/>Row-Level Security"]
L3["Layer 3<br/>RAG KB<br/>per-product"]
D[("Data")]
R --> L1 --> L2 --> L3 --> D
L1 -. verify JWT<br/>inject org_id .-> S1["✓ user.org_id"]
L2 -. current_setting<br/>RLS policy .-> S2["✓ WHERE org_id=?"]
L3 -. ACL gate<br/>resolve kb_id .-> S3["✓ pif_org_X_prod_Y"]
Figure 1.1: Requests pass sequentially through three isolation layers. Layer 1 (FastAPI ACL) derives accessible org_id from the JWT. Layer 2 (PostgreSQL Row-Level Security) enforces filtering at the database — even an application-layer bug cannot return cross-tenant data. Layer 3 (central RAG with KB per product) scopes AI analysis queries to the current product’s dedicated KB. Defense in depth: the failure of any one layer does not compromise the whole. See §11 for the full threat model and §10 for the RAG isolation (Scheme C+).
PIF compilation is a multi-day continuous flow. Users upload different documents at different times; AI processes asynchronously in the background. If any step were hard-stopped by transient external-dependency failures (Claude API overload, PubChem rate limit, central RAG restart), user experience would collapse.
Therefore all external calls are fail-soft: on failure, the system degrades to “pending retry” rather than returning HTTP 500. Concrete example:
# app/services/rag_client.py:207
async def safe_create_kb(*, org_id, product_id, product_name=None) -> str | None:
"""Attempt to create KB; return kb_id or None on failure (fail-soft)."""
if not _is_configured():
logger.info("RAG not configured — skipping KB creation for product %s", product_id)
return None
try:
kb = await RagClient.create_knowledge_base(...)
return kb.id
except RagServiceError as e:
logger.warning("RAG create_kb failed for product %s: %s", product_id, e)
return None # product is still created; rag_kb_id left NULL for back-fill
Product creation never depends on RAG availability. Full failure-handling strategy is covered in §10.4.
The five-layer architecture of PIF AI:
flowchart TB
subgraph L1["① Frontend"]
U["Operator (Browser)"]
W["Next.js 15<br/>App Router"]
end
subgraph L2["② BFF"]
R["tRPC / API Routes"]
A["NextAuth<br/>Session + JWT"]
end
subgraph L3["③ Business"]
F["FastAPI<br/>Python 3.12"]
WK["Worker<br/>async tasks"]
end
subgraph L4["④ AI Engine"]
C["Claude<br/>Sonnet 4"]
H["Claude<br/>Haiku 4.5"]
T["Tool Use<br/>+ Vision"]
end
subgraph L5["⑤ Data & Integration"]
P[("PostgreSQL 16<br/>+ pgvector")]
RD[("Redis")]
S3[("S3 / R2")]
PC["PubChem"]
EC["ECHA"]
TF["TFDA lists"]
RAG["Central RAG<br/>rag.baiyuan.io"]
end
U --> W --> R --> A
A -.JWT.-> F
F --> WK
F --> C
F --> H
C --> T
H --> T
F --> P
F --> RD
F --> S3
F --> PC
F --> EC
F --> TF
F --> RAG
WK --> C
WK --> P
Figure 1.2: Unidirectional dependency — upper layers call lower layers, never in reverse.
| Metric | Manual | PIF AI target | Source |
|---|---|---|---|
| Compilation time | 4–8 weeks | 3–5 business days | Business goal (CLAUDE.md) |
| Toxicology query per ingredient | 2–4 hours | < 10 seconds (concurrent + cache) | Phase 1 design |
| INCI normalization confidence | Manual dictionary lookup | ≥ 0.8 (Claude + dictionary) | ai/ingredient_validator.py |
| Regulatory matching latency | Manual TFDA PDF search | Real-time (local-mirror) | mcp_servers/tfda_server/ |
| SA review time | 1–2 weeks | 2–4 hours | Online SA flow |
[!IMPORTANT] The “PIF AI target” column lists design targets, not measured values. Formal benchmarks will appear in whitepaper v0.2 after Phase 1 GA. This principle complies with the Development Constitution: “no mock data, measure before reporting.”
The entire PIF AI project — frontend, backend, AI engine, RAG integration, deployment config, 5-locale i18n, and this whitepaper itself — was developed in collaboration with Anthropic Claude Code (Anthropic’s official CLI).
Two reasons:
Academic transparency: the impact of LLM-assisted engineering on the software lifecycle is an active research topic. This project serves as a fully auditable open-source case for researchers to observe “how LLM-assisted engineering interacts with a commercial-scale, multi-dependency SaaS project.”
Community educational value: for readers considering Claude Code for complex systems, we offer a specific, reproducible process — including successes, trade-offs, and failure cases. See §7.4 and §15.
| Version | Date | Summary |
|---|---|---|
| v0.1 | 2026-04-19 | First draft. Four design propositions, five-layer architecture, Claude Code statement |
© 2026 Baiyuan Tech. Licensed under CC BY-NC 4.0.
Nav ← README · Chapter 2: Regulatory Background →
Taiwan Cosmetic Industry Association (TCIIA). “2024 Annual Member Report” (non-public statistics). ↩