Research

Academic research projects, publications, datasets, and conference presentations.

Research Projects

Alpha Research Study: Behavioral Decision Analysis via NLP Pipeline

Jan 2026 – Present · RtB · Financial Research Analyst Alpha · Ongoing

Behavioral interview data processed through a multi-layer NLP pipeline to extract decision frameworks, calibration failure modes, and intervention hypotheses.

Overview

An Alpha-stage behavioral finance study combining structured interviews with an automated NLP analysis pipeline. Four participants completed interviews covering investment strategy, risk tolerance, calibration tasks, and life planning. The pipeline converts raw recordings into structured decision signals, enabling systematic cross-participant comparison of where financial reasoning breaks down and why.

Research Question

What decision frameworks do individuals apply under financial uncertainty, and at what point do those frameworks fail?

Methods

Transcription & diarization — Raw recordings (.mp4) processed via AssemblyAI API with speaker diarization; output converted to VTT format
Three-pass filtering — Pass 1 removes calibration task noise; Pass 2 applies a decision signal gate; Pass 3 assigns sentences to 8 decision framework clusters via keyword matching (321 raw sentences → 38 high-signal sentences, 11.8% retention)
NLP layer — LDA topic modeling (gensim, 8 topics) runs in parallel with keyword clustering for cross-validation; BERT zero-shot classification (facebook/bart-large-mnli) scores each sentence against 5 behavioral signals; precision/recall evaluated against keyword pseudo-labels
Structured LLM analysis — Claude API runs three sequential phases: Error Mapping (initial condition → expected outcome → observed outcome → divergence type), Behavioral Signal extraction (3–5 measurable real-time indicators per sentence), and Intervention Hypothesis generation

Key Findings

Calibration is the universal failure mode — All 4 participants showed confidence miscalibration, but through distinct mechanisms: streak-following, pattern attribution, and boredom-driven escalation. It is the only cluster with 4/4 participant representation.
Correct framework, absent trigger — One participant independently derived a valid $2M/HYSA passive income threshold but had no activation condition attached to it. The framework was structurally sound; operationalization was missing entirely.
Education ROI requires a decision-mode gate — Some participants applied full ROI analysis unprompted; others defaulted to market sentiment and fear. These are categorically different decision modes, not calibration differences. Applying the same intervention to both would be a category error; a classifier gate is required upstream.

Technical Implementation

Python pipeline using AssemblyAI, gensim LDA, HuggingFace Transformers (facebook/bart-large-mnli), Anthropic Claude API, Google Drive API, and GitHub Actions. Two deployment modes: incremental (triggers on each new upload, processes new transcripts only) and cumulative (biweekly, full dataset reprocessing). Output is a versioned Excel report delivered via automated email to the uploader and pipeline monitors.

Status

Alpha deployment. Pipeline is live and processing data from 4 participants; participant pool is expanding. Analysis framework, automation infrastructure, and reporting pipeline are production-ready.

Tracking the Data Quality Landscape of Retracted Scholarly Materials

Aug 2024 – Sep 2025 · Supervised by Prof. Jodi Schneider · iSchool, UIUC

Investigated the scope of wrong and inconsistent indexing in Crossref data, examining how reliably title keywords can determine whether a DOI is truly retracted.
Identified 208 DOIs marked retracted in April 2023 but no longer flagged as retracted in July 2024, exposing systemic indexing drift with downstream implications for research integrity.
Developed a real-time automated pipeline to collect DOI metadata statistics, extract title keywords, and compare them against flags defined in the retraction protocol.
Annotated and cleaned the dataset; analyzed consistency of retraction flag usage across different journals using statistical and text-based methods.
Drafted full manuscript, dataset documentation, pipeline documentation, and materials for three conference presentations and posters.
Co-authored two published datasets deposited at the University of Illinois Urbana-Champaign repository.

Psychological Motivation Factors Enhancing Confidence and Participation in Sports Among Students with Poor Physical Fitness

Jun 2024 – Aug 2025 · Supervised by Prof. Peter Darch · Lead Researcher · iSchool, UIUC

Led a three-person undergraduate research team investigating psychological factors (motivation, self-efficacy, social environment) that encourage students with low physical fitness to participate in physical activity.
Designed full research plan including study scope, hypotheses, participant recruitment strategy, and timeline.
Developed survey questionnaires, data deposit workflows, analysis pipeline code, and the final technical report.
Presented findings at the 2025 Undergraduate Research Symposium (URS), University of Illinois Urbana-Champaign.

Co-designing mHealth Applications to Empower Cancer Survivors

Sep 2024 – Dec 2024 · Supervised by Prof. Rachel Adler · iSchool, UIUC

Investigated the integration of generative AI into mHealth applications for the aging population, with a focus on promoting healthier lifestyles among cancer survivors.
Developed a Flutter mobile application providing AI-powered health suggestions; designed the full user interface based on iterative user research with older adult participants.
Fine-tuned open-source LLaMA language models to improve the quality, safety, and relevance of AI-generated healthcare guidance for elderly users.

Demand Forecasting for Inventory Management System Optimization

Aug 2024 – Dec 2024 · Supervised by Prof. Yoo-Seong Song · Client: National Food Distributor · iSchool, UIUC

Optimized stock management systems for a national food distributor by developing an inventory demand forecasting model using Random Forest, trained on historical sales data combined with macroeconomic external factors (stock market performance, GDP, import/export data).
Translated model outputs into actionable operational recommendations for supply chain stakeholders; communicated findings in accessible executive briefings to client leadership.

Publications & Datasets

Manuscript Under Review

Si, L.; Salami, M. O.; Schneider, J. (2025). Tracking the Data Quality Landscape of Retracted Papers: Flag Usage in Titles and Changes in DOI Retraction Status. University of Illinois Urbana-Champaign.

Datasets

Si, L.; Salami, M. O.; Schneider, J. (2025). Dataset tracking DOIs marked as retracted in Crossref as of April 2023 but no longer marked as retracted as of July 2024. University of Illinois Urbana-Champaign. doi:10.13012/B2IDB-5333456_V1
Si, L.; Salami, M. O.; Schneider, J. (2025). Dataset on tracking the data quality landscape of retracted papers: Flag usage in titles and changes in DOI retraction status. University of Illinois Urbana-Champaign. doi:10.13012/B2IDB-2907908_V1

Conference Presentations & Posters

2025

2025 STEM Career Exploration and Symposium, UIUC · Jul 2025
Data Quality Assessment of Retracted Papers: Patterns and Retraction Status Shifts in Titles
2025 Undergraduate Research Symposium (URS), UIUC · Apr 2025
Analyzing the Consistency of Retraction Phrases Among Different Journals in Crossref Data · hdl:2142/129058
2025 Undergraduate Research Symposium (URS), UIUC · Apr 2025
Psychological Motivation Factors that Enhance Confidence and Participation in Sports Among Students with Poor Physical Fitness · hdl:2142/128150

2024

2024 iSchool Research Showcase, UIUC · Nov 2024
Distinguishing Retracted Publications from Retraction Notices in Crossref Data · hdl:2142/125134