projects

My project work ranges the full gamut from machine learning and artificial intelligence to healthcare and biology to biological engineering. I am eternally grateful for all of my mentors and collaborators who have helped me grow and strengthen my research muscle.

Multimodal AI for clinical decision making

with Professor Paul Liang and David Dai @ MIT Media Lab's Multisensory Intelligence Group | Sep 2025 - Present

Multimodal reasoning based foundation models hold considerable promise for addressing key challenges in medical practice, yet their readiness for real-world deployment remains insufficiently explored. To bridge this gap, I am working on two foundation models that aim to excel in clinical generalizability and accuracy. In this work, I've preprocessed EHR data from MIMIC-IV and used it in model training, analyzed model performance before and after supervised finetuning, and evaluated model robustness to hallucinations and likelihood of perpetuating clinical inequities with LLM-as-a-judge.

Second author on a manuscript submitted to npj Digital Medicine.

PythonBashPyTorchVLLMDeepEvalHPC (SLURM)
Multimodal AIreasoningLLM-as-a-judgehallucination analysislarge-scale data preprocessingfinetuningreinforcement learning

In silico protein evolution with reinforcement learning

6.7920 Reinforcement Learning Final Project | Dec 2025 - Present

Protein fitness landscapes are high-dimensional, discrete, and rugged: single amino acid substitutions can dramatically alter stability and activity, yet exhaustive experimental characterization is intractable. In this work, we utilize deep mutational scanning data as a set of ground truth labels that help define a small fraction of the extensive protein mutational space. We additionally apply foundation protein language model ESM-2 to represent protein sequences in a high dimensional, descriptive latent space for modeling. Reinforcement learning (RL) provides an apt framework for efficient exploration under sparse, delayed rewards: protein sequences define states, mutations define actions, and experimental stability and activity provide reward. By honing in on the protein AAV2 and RL algorithms A2C and PPO, we found that A2C explores more novel variants than PPO during training, while PPO exploits actions with high reward. Our models demonstrate learned biological significance as their generated variants contain point mutations which align with those identified in literature to improve AAV2 fitness.

PythonBashStable Baselines-3ESM-2HPC (SLURM)
Protein evolutionReinforcement learning end-to-end pipeline designA2CPPOESM-2Structural and functional landscape prediction
Slide 1

Multimodal vision model to predict diabetic retinopathy

6.4300 Computer Vision Final Project | May 2025

MultiRetNet is a novel multimodal deep learning pipeline integrating retinal imaging, socioeconomic factors, and comorbidity data to accurately stage diabetic retinopathy. I led the design of the model and the evaluation pipeline, sparked my by interest in prioritizing safety with AI in healthcare. I evaluated 3 multimodal fusion strategies in PyTorch (cross-attention, fully-connected layer, concatenation) and demonstrated that multimodal approaches reduced false negatives compared to unimodal baselines. Our results demonstrated state-of-the-art diagnostic accuracy (AUROC > 0.98) and potential for improving early detection and healthcare equity in underserved populations.

PythonPytorch
InterpretabilityHuman-in-the-loop Deferral systemSafe AI for HealthcareShapley scoresConvolutional neural networksCross attentionMultimodal fusion
Slide 1

Early prognosis of metabolic dysfunction associated fatty liver disease

6.7930 Machine Learning for Healthcare Final Project | May 2025

Metabolic dysfunction-associated fatty liver disease (MAFLD) affects 25% of adults in the United States and affects those with Type 2 diabetes and class III obesity at disproportionately higher rates. We present a deep learning-based framework for early prognosis of MAFLD in adults using structured clinical data from Mass General Brigham. Our approach utilizes binary classification, neural network prediction, linear and logistic regression, and survival modeling, as well as experimentation with addressing class imbalance. The study supports early clinical risk stratification and reveals predictive biomarkers using SHAP interpretation.

PythonHPC (SLURM)
EHR Data PreprocessingAddressing Class ImbalanceInterpretabilityTime-to-event predictionShapley scores

Transcriptomics based histological scoring for metabolic-associated steatohepatitis using machine learning

with Professor Doug Lauffenburger and Nikos Meimetis @ MIT Department of Biological Engineering | Feb 2024 - June 2025

MASH, the advanced stage of metabolic-associated steatosis liver disease, is characterized by severe accumulation of fat in the liver. The effort towards developing accurate in vitro liver models is crucial for better understanding disease progression and therapy development. A patient's disease severity is determined by a doctor's histological scoring of a liver biopsy, which produces two clinical scores: a fibrosis stage score and a NASH Activity Score (NAS). In this lab, I created various machine learning models (k-nearest neighbor, random forest, linear regression) to histologically score transcriptomic data with fibrosis stage score and NAS, allowing us to interpret in vitro liver-on-a-chip models using in vivo clinical language.

PythonBashPytorchHPC (SLURM)
Machine learningTranscriptomicsBulk RNA-seqModel benchmarking

Engineering and modeling chimeric antigen receptor macrophages to tackle cancer cachexia

MIT iGEM 2023 | Jan 2023 - Nov 2023

iGEM, an international synthetic biology competition for undergraduates, was my first exposure to entrepreneurship and research in academia. I led a three-person team to identify a new therapeutic target for cancer cachexia using synthetic biology. We developed a proof-of-concept for a novel immunotherapy: utilizing macrophages to express IL-6 specific chimeric antigen receptors, taking inspiration from CAR-T cell therapy andMorrissey, et al. (2018).

I came back to the project a few semesters afterwards to apply newfound computational modeling knowledge to model the ADME (Absorption, Distribution, Metabolism, and Excretion) behavior of a CAR-macrophage therapy in a real patient. Using differential equations to model the projected behavior, we quantified the expected therapy volume to achieve the desired biological response while also minimizing off-target effects.

First author on a manuscript under review at Frontiers of Systems Biology.

Awarded a silver medal at the 2023 iGEM Grand Jamboree.

PythonCell culturePlasmid designGel electrophoresisSDS-PAGETransforming e. coli and transfecting HEK293Fluorescent microscopyFlow cytometry
Synthetic biologyDifferential equation modelingProject management and leadership
Slide 1