Research
Academic research projects, publications, datasets, and conference presentations.
Research Projects
Alpha Research Study: Behavioral Decision Analysis via NLP Pipeline
Behavioral interview data processed through a multi-layer NLP pipeline to extract decision frameworks, calibration failure modes, and intervention hypotheses.
Overview
An Alpha-stage behavioral finance study combining structured interviews with an automated NLP analysis pipeline. Four participants completed interviews covering investment strategy, risk tolerance, calibration tasks, and life planning. The pipeline converts raw recordings into structured decision signals, enabling systematic cross-participant comparison of where financial reasoning breaks down and why.
Research Question
What decision frameworks do individuals apply under financial uncertainty, and at what point do those frameworks fail?
Methods
- Transcription & diarization — Raw recordings (.mp4) processed via AssemblyAI API with speaker diarization; output converted to VTT format
- Three-pass filtering — Pass 1 removes calibration task noise; Pass 2 applies a decision signal gate; Pass 3 assigns sentences to 8 decision framework clusters via keyword matching (321 raw sentences → 38 high-signal sentences, 11.8% retention)
- NLP layer — LDA topic modeling (gensim, 8 topics) runs in parallel with keyword clustering for cross-validation; BERT zero-shot classification (facebook/bart-large-mnli) scores each sentence against 5 behavioral signals; precision/recall evaluated against keyword pseudo-labels
- Structured LLM analysis — Claude API runs three sequential phases: Error Mapping (initial condition → expected outcome → observed outcome → divergence type), Behavioral Signal extraction (3–5 measurable real-time indicators per sentence), and Intervention Hypothesis generation
Key Findings
- Calibration is the universal failure mode — All 4 participants showed confidence miscalibration, but through distinct mechanisms: streak-following, pattern attribution, and boredom-driven escalation. It is the only cluster with 4/4 participant representation.
- Correct framework, absent trigger — One participant independently derived a valid $2M/HYSA passive income threshold but had no activation condition attached to it. The framework was structurally sound; operationalization was missing entirely.
- Education ROI requires a decision-mode gate — Some participants applied full ROI analysis unprompted; others defaulted to market sentiment and fear. These are categorically different decision modes, not calibration differences. Applying the same intervention to both would be a category error; a classifier gate is required upstream.
Technical Implementation
Python pipeline using AssemblyAI, gensim LDA, HuggingFace Transformers (facebook/bart-large-mnli), Anthropic Claude API, Google Drive API, and GitHub Actions. Two deployment modes: incremental (triggers on each new upload, processes new transcripts only) and cumulative (biweekly, full dataset reprocessing). Output is a versioned Excel report delivered via automated email to the uploader and pipeline monitors.
Status
Alpha deployment. Pipeline is live and processing data from 4 participants; participant pool is expanding. Analysis framework, automation infrastructure, and reporting pipeline are production-ready.
Tracking the Data Quality Landscape of Retracted Scholarly Materials
- Investigated the scope of wrong and inconsistent indexing in Crossref data, examining how reliably title keywords can determine whether a DOI is truly retracted.
- Identified 208 DOIs marked retracted in April 2023 but no longer flagged as retracted in July 2024, exposing systemic indexing drift with downstream implications for research integrity.
- Developed a real-time automated pipeline to collect DOI metadata statistics, extract title keywords, and compare them against flags defined in the retraction protocol.
- Annotated and cleaned the dataset; analyzed consistency of retraction flag usage across different journals using statistical and text-based methods.
- Drafted full manuscript, dataset documentation, pipeline documentation, and materials for three conference presentations and posters.
- Co-authored two published datasets deposited at the University of Illinois Urbana-Champaign repository.
Psychological Motivation Factors Enhancing Confidence and Participation in Sports Among Students with Poor Physical Fitness
- Led a three-person undergraduate research team investigating psychological factors (motivation, self-efficacy, social environment) that encourage students with low physical fitness to participate in physical activity.
- Designed full research plan including study scope, hypotheses, participant recruitment strategy, and timeline.
- Developed survey questionnaires, data deposit workflows, analysis pipeline code, and the final technical report.
- Presented findings at the 2025 Undergraduate Research Symposium (URS), University of Illinois Urbana-Champaign.
Co-designing mHealth Applications to Empower Cancer Survivors
- Investigated the integration of generative AI into mHealth applications for the aging population, with a focus on promoting healthier lifestyles among cancer survivors.
- Developed a Flutter mobile application providing AI-powered health suggestions; designed the full user interface based on iterative user research with older adult participants.
- Fine-tuned open-source LLaMA language models to improve the quality, safety, and relevance of AI-generated healthcare guidance for elderly users.
Demand Forecasting for Inventory Management System Optimization
- Optimized stock management systems for a national food distributor by developing an inventory demand forecasting model using Random Forest, trained on historical sales data combined with macroeconomic external factors (stock market performance, GDP, import/export data).
- Translated model outputs into actionable operational recommendations for supply chain stakeholders; communicated findings in accessible executive briefings to client leadership.
Publications & Datasets
Manuscript Under Review
Si, L.; Salami, M. O.; Schneider, J. (2025). Tracking the Data Quality Landscape of Retracted Papers: Flag Usage in Titles and Changes in DOI Retraction Status. University of Illinois Urbana-Champaign.
Datasets
- Si, L.; Salami, M. O.; Schneider, J. (2025). Dataset tracking DOIs marked as retracted in Crossref as of April 2023 but no longer marked as retracted as of July 2024. University of Illinois Urbana-Champaign. doi:10.13012/B2IDB-5333456_V1
- Si, L.; Salami, M. O.; Schneider, J. (2025). Dataset on tracking the data quality landscape of retracted papers: Flag usage in titles and changes in DOI retraction status. University of Illinois Urbana-Champaign. doi:10.13012/B2IDB-2907908_V1
Conference Presentations & Posters
2025
-
2025 STEM Career Exploration and Symposium, UIUC · Jul 2025
Data Quality Assessment of Retracted Papers: Patterns and Retraction Status Shifts in Titles -
2025 Undergraduate Research Symposium (URS), UIUC · Apr 2025
Analyzing the Consistency of Retraction Phrases Among Different Journals in Crossref Data · hdl:2142/129058 -
2025 Undergraduate Research Symposium (URS), UIUC · Apr 2025
Psychological Motivation Factors that Enhance Confidence and Participation in Sports Among Students with Poor Physical Fitness · hdl:2142/128150
2024
-
2024 iSchool Research Showcase, UIUC · Nov 2024
Distinguishing Retracted Publications from Retraction Notices in Crossref Data · hdl:2142/125134