Projects

Case studies with a consistent structure: problem, constraints, approach, architecture, results, and next steps.

Filter by tag:

Semiconductors Data Platform

End-to-end fab operations pipeline: 4-layer medallion architecture, YAML-based rule engine with 15+ validation checks, watermark-based incremental ingestion, and yield/equipment health analytics.

Data Engineering Python Anomaly Detection Data Quality
Case study Repo

Research Dataset Recommendation System

Matches research questions to datasets across 25+ sources; reduces discovery time from ~2 hours to <5 minutes.

Data Discovery Analytics Python
Case study Repo

Canvas Platform Data Ingestion

Medallion-layered pipeline (raw/cur/meta) for Azure SQL with rerun-safe watermarking, schema-drift detection, and operational health monitoring.

Data Engineering Azure SQL Watermarking
Case study Repo

Wikipedia Cross-Lingual Semantic Analysis (EN ↔ ZH)

Pairs EN↔ZH Wikipedia articles and flags low-similarity mismatch sentences using multilingual embeddings for cross-lingual content bias review.

NLP Multilingual Embeddings
Case study Repo

Crossref Retraction Metadata Analysis

Automated monitoring pipeline detecting retraction status drift across 208 DOIs; precision/recall analysis of flag classifier reliability — a data integrity and anomaly detection system for scholarly metadata.

Data Quality Metadata Research Anomaly Detection
Case study Repo

Finance Concept Analysis

Quantitative simulators and statistical analyses examining confidence calibration error, gambler's fallacy under streak conditions, and lifecycle investment modeling — applying behavioral economics and probabilistic reasoning to real financial decision problems. Includes reproducible Python notebooks with visualization outputs.

Finance Simulations Statistics Calibration
Case study Repo

Behavioral Decision Research at RtB

Designing and analyzing 9 user interview sessions and a 4-round divergence study to surface calibration and confidence signals for an alpha-stage fintech product.

Behavioral Research Product Analytics User Research
Case study

ETL Pipeline: Bank Info

Extracts a Wikipedia table, converts currency via local CSV lookup, and loads to CSV + SQLite. Containerized with Docker and automated with CI/CD pipeline (GitHub Actions). Demonstrates production-ready ETL practices at small scale.

ETL SQLite Docker GitHub Actions CI/CD
Case study Repo

Sports Motivation Analysis

Applied research analysis with simulated data to preserve privacy while keeping schema consistent.

Research Analytics Privacy
Case study Repo