PIF AI Whitepaper

百原 PIF AI Whitepaper / 技術白皮書

多租戶 AI 輔助化粧品 PIF 建檔平台

Multi-Tenant AI-Assisted Cosmetic Product Information File Documentation Platform

License: CC BY-NC 4.0 DOI Status: v0.2.3 zh-TW en Built with Claude Code Live Platform Download PDF (zh-TW) Download PDF (en)

📥 下載 PDF / Download PDF

語言 Language 下載 Download 說明 Notes
繁體中文 whitepaper-zh-TW.pdf 永遠指向最新 release / always the latest release
English whitepaper-en.pdf 同上 / same

上方連結使用 releases/latest/download/…,恆指向最新版本。亦可至 GitHub Releases 下載特定版本(路徑慣例:releases/download/<version>/whitepaper-<lang>.pdf)。

The links above use releases/latest/download/… and always resolve to the newest release. For a specific version, see GitHub Releases.

📄 中文白皮書 · 📄 English Whitepaper · 📋 Format Spec · 💻 Code Repo


摘要 Abstract

繁體中文

PIF AI 是一套開源多租戶 SaaS 平台,針對台灣《化粧品衛生安全管理法》第 8 條於 2026 年 7 月 1 日起全面強制實施的 PIF(Product Information File,產品資訊檔案)建檔義務,提供 AI 輔助的自動化解決方案。

本白皮書完整描述系統架構、AI 引擎設計、毒理資料 Pipeline、中心 RAG 整合的多層隔離模型(方案 C+)、安全威脅模型、SA(Safety Assessor,安全評估者)審閱工作流程、部署策略,以及開源社群貢獻方式。

English

PIF AI is an open-source multi-tenant SaaS platform that provides AI-assisted automation for the cosmetic Product Information File (PIF) documentation obligation mandated by Article 8 of Taiwan’s Cosmetic Hygiene and Safety Act, which takes full effect on July 1, 2026.

This whitepaper documents the system architecture, AI engine design, toxicology data pipeline, the multi-layer isolation model for central RAG integration (Scheme C+), the security threat model, the Safety Assessor (SA) review workflow, deployment strategy, and the open-source contribution model.


為什麼這份白皮書 / Why This Whitepaper

繁體中文

PIF 建檔在產業實務上的三大痛點:時間(每項產品 4–8 週)、成本(SA 專業費用高)、不確定性(16 項法規對照散落於多份文件)。

PIF AI 以下列四項設計命題處理這些痛點:

  1. 結構化壓縮 ≠ 生成:PIF 16 項多為結構化資訊的跨文件拼裝,正是 LLM Tool Use(工具使用)能力的強項。
  2. AI 草稿 + SA 定稿:所有 AI 輸出一律標示為「參考草稿」,最終簽署由 SA 負責,符合法規要求與工程原則。
  3. 三層資料隔離:PostgreSQL Row-Level Security + SQLAlchemy ACL 閘門 + RAG KB per-product,任何單層破口皆不致於整體洩漏。
  4. Fail-soft 為預設:任何外部相依(Claude API / PubChem / 中心 RAG)的短暫故障不得阻斷建檔流程。

English

Three operational pain points of PIF compilation in the industry: time (4–8 weeks per product), cost (qualified SA fees), and uncertainty (16 regulatory items scattered across multiple documents).

PIF AI addresses these with four design propositions:

  1. Structured composition, not generation: The 16 items are largely a cross-document assembly problem — precisely what LLM Tool Use excels at.
  2. AI draft + SA final: Every AI output is marked as a “reference draft”; the SA is always the final signatory — aligning with both regulation and engineering principles.
  3. Three-layer data isolation: PostgreSQL Row-Level Security + SQLAlchemy ACL gate + one KB per product in the central RAG — a breach of any single layer does not compromise the whole.
  4. Fail-soft by default: Transient outages of external dependencies (Claude API, PubChem, central RAG) must never block the documentation flow.

為誰而寫 / Audience

讀者 Audience 建議路徑 Reading Path
🎓 學術研究者 · Academic researchers 從 §1 循序閱讀至 §12;程式碼引用採 file:line 格式可驗證
🛠 開源貢獻者 · Open-source contributors §4 架構 → §14 部署 → CONTRIBUTING.md of the code repo
⚖️ 法規專業人士 · Regulatory professionals §2 法規 → §3 PIF 16 項 → §9 毒理 → §13 SA 流程
🔒 資安審閱者 · Security reviewers §10 RAG 隔離 + §11 威脅模型 + SECURITY.md
💼 商務決策者 · Business stakeholders §1 摘要 → §15 路線圖

開發聲明:本專案以 Claude Code 開發 / Development Note: Built with Claude Code

繁體中文

PIF AI 整個專案(前端、後端、AI 引擎、RAG 整合、部署設定、i18n 5 語系、本白皮書)皆由作者搭配 Anthropic Claude Code(Anthropic 官方 CLI)完成開發與撰寫。本專案同時是:

關於 Claude Code 如何協助建構本專案的具體工程細節,詳見白皮書 §7(AI 引擎)與 §15(路線圖與開源策略)。

English

The entire PIF AI project — frontend, backend, AI engine, RAG integration, deployment configuration, 5-locale i18n, and this whitepaper — was built and written by the author in collaboration with Anthropic Claude Code (Anthropic’s official CLI). This project is simultaneously:

Engineering details on how Claude Code contributed appear in §7 (AI Engine) and §15 (Roadmap & Open-Source Strategy).


如何引用 / How to Cite

APA 7

Lin, V. (2026). PIF AI: A multi-tenant AI-assisted platform for accelerating cosmetic
  product information file documentation under Taiwan Cosmetic Hygiene and Safety Act
  (Whitepaper v0.2.2). Baiyuan Tech.
  https://doi.org/10.5281/zenodo.19994787

BibTeX

@techreport{lin2026pifai,
  author      = {Lin, Vincent},
  title       = {PIF AI: A Multi-Tenant AI-Assisted Platform for Accelerating
                 Cosmetic Product Information File Documentation Under Taiwan
                 Cosmetic Hygiene and Safety Act},
  institution = {Baiyuan Tech},
  type        = {Whitepaper},
  number      = {v0.2.2},
  year        = {2026},
  month       = {may},
  doi         = {10.5281/zenodo.19994787},
  url         = {https://doi.org/10.5281/zenodo.19994787}
}

See also CITATION.cff — GitHub’s “Cite this repository” button reads this file directly.


This whitepaper is part of an ongoing series documenting Baiyuan Technology’s engineering practice in AI-native platforms. The three pillars share common design patterns — multi-tenant isolation, fail-soft external dependencies, Claude-assisted engineering — applied to different verticals:

本白皮書是百原科技 AI 原生平台工程實踐系列的一部分。三項支柱專案共用多租戶隔離、fail-soft 外部依賴、Claude 輔助工程等設計模式,只是應用於不同垂直領域:

Whitepaper Focus Repo
📄 This: PIF AI Whitepaper Cosmetic regulatory compliance automation (Taiwan) baiyuan-tech/pif-whitepaper
📄 GEO Platform Whitepaper Generative-engine brand visibility (7-dim citation scoring, AXP, L1 Wiki + L2 RAG origin) baiyuan-tech/geo-whitepaper
🛠 PIF AI Platform Underlying AGPL-3.0 code referenced in this document baiyuan-tech/pif

Cite both whitepapers together for a fuller picture of Baiyuan’s AI infrastructure approach — the GEO paper establishes the L1 LLM Wiki + L2 vector RAG dual-layer retrieval architecture, and this PIF paper shows how it is applied under strict multi-tenant regulatory-compliance constraints.

引用這兩份白皮書可獲得更完整的 Baiyuan AI 基礎設施視角:GEO 白皮書建立了 L1 LLM Wiki + L2 向量 RAG 雙層檢索架構,本 PIF 白皮書展示該架構如何應用於嚴格的多租戶法規合規場景。

Awesome Lists · AI-Citable Resource Index / 相關 awesome 清單

This whitepaper is positioned at the intersection of several open-source ecosystems. If you maintain one of the awesome-lists below, the PR to add this whitepaper is welcome:

本白皮書位於多個開源生態的交集。若您維護以下 awesome-list 之一,歡迎將本白皮書納入:

Design-pattern references in this whitepaper — for readers interested in the primary sources behind the design decisions:


授權 / Licensing

AGPL-3.0 的選擇理由」詳見白皮書 §14.2。


發行策略 / Publication Strategy

  1. 主要發佈 Primary publishing: 此 GitHub repository (baiyuan-tech/pif-whitepaper)
  2. PDF 附件 PDF attachment: 每次 release 自動附加 whitepaper-zh-TW.pdfwhitepaper-en.pdf(由 .github/workflows/build-pdf.yml 編譯)
  3. 學術平台 Academic platforms: arXiv、SSRN(若受控)
  4. 產業場域 Industry venues: 台灣化粧品工業同業公會、衛福部食藥署開放論壇
  5. 社群 Community: Hacker News、r/MachineLearning、Twitter/X、LinkedIn
  6. 多語系擴展 Multi-lingual expansion: 日、韓、法譯本為 v1.0 之後的路線圖項目
  7. 引用追蹤 Citation tracking: Google Scholar、Semantic Scholar

Repo 結構 / Repository Structure

pif-whitepaper/
├── README.md                    # ← 您目前閱讀的檔案
├── FORMAT.md                    # 白皮書格式規範
├── LICENSE                      # CC BY-NC 4.0
├── CITATION.cff                 # 引用資訊
├── .markdownlint.jsonc          # Markdown lint 規則
├── whitepaper-zh-TW.pdf         # (generated) 中文 PDF
├── whitepaper-en.pdf            # (generated) 英文 PDF
├── assets/
│   ├── figures/                 # Mermaid 原始檔
│   └── pdf/                     # Pandoc 建置腳本與 metadata
│       ├── concat.sh
│       ├── metadata-zh-TW.yaml
│       └── metadata-en.yaml
├── zh-TW/                       # 繁體中文 12 章 + 4 附錄
│   ├── README.md
│   ├── ch01-abstract.md
│   ├── ... ch02..ch12
│   └── appendix-a..d.md
├── en/                          # English 12 chapters + 4 appendices
│   └── (mirror of zh-TW)
└── .github/
    └── workflows/
        ├── build-pdf.yml         # Pandoc + XeLaTeX → PDF
        └── lint.yml              # markdownlint + link check

修訂記錄 / Revision History

版本 Version 日期 Date 摘要 Summary
v0.1 2026-04-19 First public draft — covers MVP (Phase 1) and Phase 2 designs for Central RAG and SA e-signature
v0.2 2026-04-30 Added Chapter 13 — Compliance Engine Deep Dive (Phase 22-23): lifecycle 5 stages, business-type responsibility matrix (4×16=64 cells), 14 cross-item lint rules R1-R14, V0-V3 version snapshots with SHA-256 fingerprints, penalty mapping (§22-25), 14-page regulatory PDF generation. Appendix B adds 14 new endpoints.
v0.2.1 2026-05-03 Zenodo registration trigger (content identical to v0.2).
v0.2.2 2026-05-03 First attempt to bake Zenodo DOI; mistakenly used a per-version DOI instead of the concept DOI. Corrected without a version bump on 2026-05-03 to point to the true concept DOI 10.5281/zenodo.19994787.
v0.2.3 2026-05-03 Initially added a software cross-citation; reverted because the companion code repository is private and has no Zenodo deposit. Whitepaper concept DOI baked in is now stable.

AI-friendly 結構 / AI-friendly Structure

本 repo 同時為人類讀者與 AI crawler 最佳化:

This repo is optimized for both human readers and AI crawlers: Schema.org TechArticle JSON-LD embedded in HTML comments, YAML frontmatter on every chapter, ISO 8601 dates, consistent terminology, and a single generated whitepaper.md for holistic semantic analysis / LLM training input.


貢獻 / Contributing

本白皮書歡迎勘誤、翻譯、章節補充。流程與規範請見 FORMAT.md 與母專案 CONTRIBUTING.md

Errata, translations, and chapter contributions are welcomed. See FORMAT.md and the code repo’s CONTRIBUTING.md for process and conventions.



© 2026 Baiyuan Tech · Released under CC BY-NC 4.0

📄 Read in Chinese › · 📄 Read in English ›