← All posts

About Me Data Science

Introducing Myself


Hi, I'm Luyang Si, the Alumni of the University of Illinois Urbana–Champaign majoring in Information Sciences + Data Science, graduated in 2025.

I work at the intersection of data engineering, analytics, and decision science. My focus is building systems that transform messy data into reliable insights and help people make better decisions under uncertainty.

My work combines data infrastructure, statistical analysis, and behavioral experimentation, with an emphasis on building tools that turn complex datasets into actionable knowledge.

What I Work On

Data Infrastructure

I design and build data pipelines and analytical systems that enable reliable decision-making.

My work includes:

  • Incremental ETL pipelines
  • Data warehouse modeling
  • SQL-based analytical workflows
  • Data quality validation systems
  • Metadata and lineage tracking

Technologies I frequently use:

  • Python
  • SQL / T-SQL
  • Azure SQL
  • Pandas
  • Data visualization and statistical analysis tools

I focus on building data systems that are:

  • Reproducible
  • Scalable
  • Observable
  • Easy to maintain

Analytics & Decision Systems

Beyond infrastructure, I'm interested in how analytics can improve real-world decision-making.

I work on systems that combine:

  • Behavioral data
  • Statistical evaluation
  • Structured experiments

to better understand how people reason about risk, uncertainty, and long-term planning.

Current Work

Decision Science Experiments – RtB

I currently work with RtB, an early-stage research-driven company building tools that help individuals improve their decision-making.

My work includes designing and analyzing experiments such as:

Calibration Experiments

Participants assign probabilities to uncertain events and receive feedback on prediction accuracy. We evaluate performance using metrics such as:

  • Brier Score
  • Calibration curves
  • Confidence vs accuracy patterns

These experiments help reveal systematic biases in human judgment.

Financial Decision Simulators

Another project involves interactive simulations where participants make decisions about:

  • Savings and consumption
  • Risk allocation
  • Long-term financial planning

These experiments generate behavioral datasets that can be analyzed to understand decision patterns.

Selected Projects

Data Pipeline Architecture

I have built several production-style data pipelines designed for analytics and research workflows.

Key features include:

  • Incremental ingestion with watermark tracking
  • Idempotent upserts
  • Late-arriving data handling
  • Automated data quality checks
  • Metadata logging for pipeline runs

These pipelines ensure that analytical datasets remain consistent and reproducible.

Decision Framework Analysis

I also work on extracting structured decision frameworks from interview transcripts.

This involves:

  • Parsing decision statements
  • Identifying decision objects
  • Clustering decisions by structure and trade-offs

The goal is to identify reusable decision frameworks that can support decision-support tools.

Dataset Discovery Tools

Another project explores how to improve dataset discovery for researchers and analysts.

I built a prototype dataset recommender system that uses:

  • Semantic similarity
  • Dataset metadata
  • Structured dataset catalogs

to help users discover relevant datasets more efficiently.

Research Interests

The problems I care most about sit between several fields:

  • Data infrastructure
  • Decision science
  • Dataset bias and research data quality

I'm particularly interested in how better data systems can support more reliable knowledge and better decision-making.

Looking Ahead

I'm interested in roles involving:

  • Data engineering
  • Analytics and experimentation
  • Decision-support systems

I enjoy working on problems where data systems, human behavior, and analytical reasoning intersect.