This is one of the most technical chapters. We detail how PIF AI uses Anthropic Claude’s Tool Use and Vision capabilities for structured extraction, how we design prompts, score confidence, and route between Sonnet and Haiku. We then disclose the concrete practice of co-developing this project with Claude Code — a fully auditable case study of LLM-assisted engineering.
confidence ∈ [0, 1]; UI colors fields and prioritizes low-score ones for SA review| Candidate | Strengths | PIF Fit |
|---|---|---|
| Anthropic Claude | Stable Tool Use, strong Vision, long context (1M tokens) | ✅ Chosen |
| OpenAI GPT-4o | Mature ecosystem, precise function calling | Possible; Tool Use style slightly older |
| Google Gemini 2.5 | Long context, free tier | Weaker at table parsing in Vision |
| Open source (Llama 3, Qwen) | Can be self-hosted | Requires own inference infra |
Key factors:
| Task | Model | Avg tokens | Cost tier |
|---|---|---|---|
| Formulation extraction (Vision + Tool Use) | Sonnet 4 | 8K–20K | High |
| Toxicology synthesis | Sonnet 4 | 5K–15K | Medium |
| SA assessment draft | Sonnet 4 | 10K–30K | High |
| INCI normalization | Haiku 4.5 | 500–1K | Low |
| Ingredient function classification | Haiku 4.5 | 200–500 | Low |
| Document type identification (after OCR) | Haiku 4.5 | 300–800 | Low |
Routing logic lives in app/ai/model_router.py (planned) and selects based on prompt complexity, context size, and return-schema intricacy.
Traditional LLM use:
User: "What is glycerin's CAS number?"
LLM: "56-81-5" ← hallucination risk; may be wrong
Tool Use pattern:
User: "What is glycerin's CAS number?"
LLM: [calls tool pubchem.query(name="glycerin")]
Tool: {cas: "56-81-5", mw: 92.09, ...}
LLM: "Per PubChem, glycerin (CAS 56-81-5) has MW 92.09"
The LLM becomes a “coordinator”; data comes from structured tool returns. Benefits:
# app/ai/tools.py (conceptual)
TOOLS = [
{
"name": "pubchem_query",
"description": "Query PubChem for a compound by CAS or name.",
"input_schema": {
"type": "object",
"properties": {
"cas": {"type": "string", "pattern": r"^\d{2,7}-\d{2}-\d$"},
"name": {"type": "string"},
},
"anyOf": [{"required": ["cas"]}, {"required": ["name"]}],
},
},
{
"name": "tfda_check_restricted",
"description": "Check a substance against Taiwan TFDA restricted/prohibited lists.",
"input_schema": {...},
},
{
"name": "inci_normalize",
"description": "Normalize an ingredient name to canonical INCI form.",
"input_schema": {...},
},
{
"name": "db_lookup_ingredient",
"description": "Search internal ingredients table for prior records.",
"input_schema": {...},
},
]
The LLM is told about these tools in the system prompt and invokes them as needed.
sequenceDiagram
participant User
participant API as FastAPI
participant Claude as Claude Sonnet 4
participant PC as PubChem
participant DB as PostgreSQL
User->>API: Upload formula.pdf
API->>Claude: parse_formula(pdf, tools=[pubchem_query, inci_normalize, ...])
Claude->>Claude: Vision parses PDF → initial ingredient list
loop For each ingredient
Claude->>Claude: inci_normalize(raw_name)
Claude->>DB: db_lookup_ingredient(normalized)
alt Exists
DB-->>Claude: existing record
else New
Claude->>PC: pubchem_query(cas)
PC-->>Claude: CID, MW, SMILES, ...
end
end
Claude-->>API: {ingredients: [...], confidence: 0.87, ...}
API->>DB: UPSERT ingredients + product_ingredients
API-->>User: Done (16 ingredients, 87% confidence)
Figure 7.1: Claude completes multiple tool calls within a single task. This is agentic in style but constrained by explicit tool schemas — not an unbounded loop.
┌─────────────────────────────────────┐
│ ① System Prompt │
│ Role, constraints, output format │
├─────────────────────────────────────┤
│ ② Tool Schema (structured) │
│ Tool list + JSON schema │
├─────────────────────────────────────┤
│ ③ User Prompt │
│ Specific task input │
└─────────────────────────────────────┘
For toxicology analysis:
You are a senior cosmetic toxicologist with expertise in SCCS Notes of
Guidance and CIR safety assessment standards. You will receive a
formulation list and must produce a structured toxicology summary per
ingredient.
Principles:
1. Answer ONLY based on values returned by the provided database tools.
Do NOT fabricate toxicology numbers.
2. If a tool returns no data for a given endpoint, return null with a
note "no data in this source."
3. Every conclusion must cite a source (PubChem CID / TFDA Annex
item / SCCS opinion number).
4. Maintain professional, conservative tone. Prohibited phrases:
"absolutely safe", "no risk at all".
5. Output structured JSON matching the tool schema.
{
"ingredients": [
{
"inci_name": "Glycerin",
"cas": "56-81-5",
"concentration_pct": 5.0,
"confidence": 0.95,
"extraction_notes": "Clearly indicated on formula row 3"
},
{
"inci_name": "Phenoxyethanol",
"cas": "122-99-6",
"concentration_pct": null,
"confidence": 0.30,
"extraction_notes": "Name clear but concentration column empty"
}
]
}
flowchart LR
AI[AI extraction confidence]
UI{confidence value}
G[Green ✓<br/>high ≥ 0.8]
Y[Yellow ⚠<br/>medium 0.5-0.8]
R[Red ✗<br/>low < 0.5]
AI --> UI
UI -->|≥ 0.8| G
UI -->|0.5-0.8| Y
UI -->|< 0.5| R
R -.prioritized in SA queue.-> SA[SA Review]
Figure 7.2: Frontend displays three colors based on confidence; fields below 0.5 bubble to the top of the SA review queue. This makes AI uncertainty transparent — preventing low-confidence outputs from being treated as final.
[!NOTE] This section publicly documents the concrete process, deliverables, and failure cases of co-developing this project with Anthropic Claude Code (the CLI agent). This transparency aligns with the project’s Development Constitution.
Claude Code is Anthropic’s official CLI agent, designed as a “pair-programming partner.” It can:
The author used a “human decides, AI executes” split:
| Work | Human | Claude Code |
|---|---|---|
| Requirements definition | ✅ Lead | Asks clarifying questions |
| Architectural decisions | ✅ Lead | Proposes options + trade-offs |
| Code authoring | Reviews | ✅ Main producer |
| Test authoring | Reviews | ✅ Main producer |
| Documentation (incl. this whitepaper) | Reviews | ✅ Main producer |
| Deployment & ops | ✅ Lead | Suggests commands |
| Security review | ✅ Lead | Assists threat modeling |
The following commits are verifiable in baiyuan-tech/pif:
| Date | Commit | Description |
|---|---|---|
| 2026-04-19 | f33392e |
feat(i18n): extend locales to Japanese, Korean, French with language dropdown |
| 2026-04-19 | (pending) | feat(rag): central RAG integration (Scheme C+) backend |
Each commit carries a Co-Authored-By: Claude Opus 4.7 trailer explicitly marking AI co-authorship.
Case 1: 5-locale i18n expansion (2026-04-19)
Task: extend frontend i18n from zh-TW/en to ja/ko/fr.
Flow:
zh-TW.json / en.json (423 keys × 17 sections)index.tsx: binary toggle → 5-option dropdown (ARIA, click-outside, ESC)pif.baiyuan.ioFrom requirement to live: ~45 minutes.
Case 2: Central RAG integration (§10)
Documented in full in §10. Full backend code + 16 unit tests produced without live credentials — designed to be one-flag-enable once secrets are provided.
Treating Claude Code as a subject:
Full engineering practice and roadmap in §12.
| Version | Date | Summary |
|---|---|---|
| v0.1 | 2026-04-19 | First draft. Tool Use, dual-model routing, prompt three-layer, Claude Code practice |
© 2026 Baiyuan Tech. Licensed under CC BY-NC 4.0.