Semiconductors Data Platform
End-to-end fab operations pipeline: 4-layer medallion architecture, YAML-based rule engine with 15+ validation checks, watermark-based incremental ingestion, and yield/equipment health analytics.
Case studies with a consistent structure: problem, constraints, approach, architecture, results, and next steps.
End-to-end fab operations pipeline: 4-layer medallion architecture, YAML-based rule engine with 15+ validation checks, watermark-based incremental ingestion, and yield/equipment health analytics.
Matches research questions to datasets across 25+ sources; reduces discovery time from ~2 hours to <5 minutes.
Medallion-layered pipeline (raw/cur/meta) for Azure SQL with rerun-safe watermarking, schema-drift detection, and operational health monitoring.
Pairs EN↔ZH Wikipedia articles and flags low-similarity mismatch sentences using multilingual embeddings for cross-lingual content bias review.
Automated monitoring pipeline detecting retraction status drift across 208 DOIs; precision/recall analysis of flag classifier reliability — a data integrity and anomaly detection system for scholarly metadata.
Quantitative simulators and statistical analyses examining confidence calibration error, gambler's fallacy under streak conditions, and lifecycle investment modeling — applying behavioral economics and probabilistic reasoning to real financial decision problems. Includes reproducible Python notebooks with visualization outputs.
Designing and analyzing 9 user interview sessions and a 4-round divergence study to surface calibration and confidence signals for an alpha-stage fintech product.
Extracts a Wikipedia table, converts currency via local CSV lookup, and loads to CSV + SQLite. Containerized with Docker and automated with CI/CD pipeline (GitHub Actions). Demonstrates production-ready ETL practices at small scale.
Applied research analysis with simulated data to preserve privacy while keeping schema consistent.