PIF AI Whitepaper

PIF AI Whitepaper (English edition)

A Multi-Tenant AI-Assisted Platform for Cosmetic Product Information File Documentation

Version: v0.2 · Date: 2026-04-30 · Author: Vincent Lin (Baiyuan Tech) License: Whitepaper licensed under CC BY-NC 4.0; the underlying PIF AI software is AGPL-3.0.

[!NOTE] This document is an academic-technical whitepaper. Any numbers related to performance, user counts, or revenue are labeled as target or expected values unless supported by measurement or live query — consistent with the project’s Development Constitution: no mock data, no hard-coded numbers, full testing before reporting.

The entire project (code and whitepaper) was developed with the assistance of Anthropic Claude Code, serving as an open-source case study of LLM-assisted engineering applied to regulatory-compliance domains.

🧭 Table of Contents

Part I — Introduction

§	Chapter	Topic
01	Abstract	TL;DR, four design propositions, system overview diagram
02	Regulatory Background	Taiwan Cosmetic Hygiene & Safety Act Article 8, July 2026 deadline, penalties
03	The 16 PIF Items	Per-item data source, AI handling, database mapping

Part II — System Architecture

§	Chapter	Topic
04	System Architecture	Five-layer architecture, module boundaries, data flow
05	Frontend Stack	Next.js 15 App Router, RSC, shadcn/ui
06	Backend Stack	FastAPI, SQLAlchemy async, Alembic vs inline migration

Part III — AI & Data

§	Chapter	Topic
07	AI Engine	Claude Tool Use, Claude Code engineering practice, confidence scoring
08	Database & Multi-Tenancy	Schema, Row-Level Security, `current_setting` pattern
09	Toxicology Pipeline	PubChem / TFDA / ECHA / OECD cross-query
10	Central RAG Integration	Scheme C+ isolation, dual-header auth, fail-soft

Part IV — Security & Compliance Process

§	Chapter	Topic
11	Security Model	AES-256, JWT, TOTP, audit, threat model, 5-locale i18n
12	Roadmap, Deployment & Open-Source Strategy	Docker → K8s, AGPL rationale, Phase 1–3, contribution model
13	Compliance Engine Deep Dive (Phase 22-23)	Lifecycle 5 stages, business-type responsibility matrix, 14 cross-item lint rules, V0-V3 snapshots, penalty mapping, 14-page regulatory PDF
14	Toxicology Safety Engine (new in v0.3)	NOAEL six-tier fallback, read-across, TTC, Margin of Safety, DAp correction, fail-safe asymmetry
15	Regulatory Correctness (new in v0.3)	Disclosure threshold vs concentration limit, TFDA/EU/CIR authority hierarchy, EPA ToxValDB backfill, ECHA C&L harvesting blocking CMR
16	Self-Driving Evolution & Computation-Basis Provenance (new in v0.3)	agreement_rate, asymmetric learning, active re-grounding, tox_reference SSOT, adversarial red-team

Appendices

§	Chapter	Topic
A	Glossary	PIF, SA, TFDA, INCI, 50+ entries
B	API Endpoint Reference	All frontend BFF + backend FastAPI endpoints
C	References	Statutes, standards, RFCs, academic papers
D	Changelog	Whitepaper revision history

📖 How to Read

Linear reading: Academic or regulatory readers should start at §1 and proceed through §13, then the appendices.

Quick start (open-source contributors):

Read §1 Abstract for the big picture.
Jump to §4 System Architecture for module boundaries.
Enter your area of interest (frontend → §5, backend → §6, AI → §7, RAG → §10).
Read §12 Roadmap.
Head to the code repo’s CONTRIBUTING.md to start coding.

Regulatory compliance: §2 → §3 → §9 → §11 (SA workflow) → Appendix C.

Security review: §10 → §11 + SECURITY.md.

📊 Whitepaper Scale

Metric	Target	Current
Chapters	16 chapters + 4 appendices	v0.3 complete
English word count	28,000+ words	v0.2 ≈ 32,000 words
Figures	15+ Mermaid diagrams	v0.2 ≈ 16 diagrams
Code citations	40+ (format `file:line`)	v0.2 complete
References	30+ entries	v0.2 complete

[!NOTE] This README is a ToC. The complete PDF is available on GitHub Releases. PDF convention: releases/download/<version>/whitepaper-en.pdf.

🔗 Language versions

🇹🇼 繁體中文版 (Traditional Chinese)
🇺🇸 English edition (you are here)

Nav ← Back to repo root · Format spec →

This site is open source. Improve this page.