Shresth Verma
I am a third-year PhD student at Harvard University advised by Prof. Milind Tambe with research interests in Reinforcement Learning and LLMs for complex decision-making tasks. My work focuses on
- Enhancing code generation by LLMs through inference-time reasoning
- LLM fine-tuning techniques that are robust to noisy human preferences
- Safe mechanisms for balancing tradeoffs in multi-objective preferences
- Enabling multi-turn, goal-oriented dialogues with LLMs
Previously, I spent two wonderful years at Google Research India, working in the AI for Social Good lab where I was grateful to be advised by Dr. Aparna Taneja. I developed and deployed robust bandit algorithms to plan targeted mobile health interventions for more than 100K beneficiaries from underserved communities in India.
Before that, I was a Data Scientist at United Health Group where I worked in the Chief Medical Officer’s team for modelling readmission risks for millions of beneficiaries. I also worked with data from the world’s largest healthcare graph database, designing graph-based analytics and tools to model and interpret patients’ longitudinal wellness journeys.
| | |
| 2023 - Present | 2021 - 2023 | 2020 - 2021 |
News
| Nov 10, 2025 | Our work on Preference-Robust DPO is accepted at AAAI 2026! |
|---|---|
| May 12, 2025 | Our work on Portfolios for Multi-objective RL has been accepted at ICML 2025. See you in Vancouver! |
| Nov 3, 2024 | I will be presenting a demo on LLMs for RL Code Generation at AAAI 2025. See you in Philly! |
| Oct 19, 2024 | Presenting my work on Social Choice Language Model at NeurIPS 2024 GenAI for Health Workshop. See you in Vancouver! |
| Jul 24, 2024 | Our work on Group Fairness in Decision-Focused Learning has been accepted at UAI 2024! |
| Mar 24, 2024 | I’ll be attending Data Study Group at The Alan Turing Institute as a Facilitator! |
| Feb 5, 2024 | Got accepted into Harvard’s Spring 2024 Technical AI Safety Fellowship! |
Selected Publications
2026
- AAAI’26Preference Robustness for DPO with Applications to Public HealthIn AAAI Conference on Artificial Intelligence, 2026
2025
- AAAI’25PRIORITY2REWARD: Incorporating Healthworker Preferences for Resource Allocation PlanningIn Proceedings of the AAAI Conference on Artificial Intelligence, 2025
- ICML’25Navigating the Social Welfare Frontier: Portfolios for Multi-objective Reinforcement LearningIn Proceedings of the 42nd International Conference on Machine Learning, 13–19 jul 2025
2024
- UAI’24Group Fairness in Predict-Then-Optimize Settings for Restless BanditsIn Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, 15–19 jul 2024
- IAAI’24Improving Health Information Access in the World’s Largest Maternal Mobile Health Program via Bandit AlgorithmsIn AAAI Conference on Artificial Intelligence, 15–19 jul 2024
2023
- IJCAI’23Limited Resource Allocation in a Non-Markovian World: The Case of Maternal and Child HealthcareIn International Joint Conference on Artificial Intelligence, 15–19 jul 2023
- AAAI’23Scalable decision-focused learning in restless multi-armed bandits with application to maternal and child healthIn AAAI Conference on Artificial Intelligence, 15–19 jul 2023
- AAAI’23Robust planning over restless groups: engagement interventions for a large-scale maternal telehealth programIn AAAI Conference on Artificial Intelligence, 15–19 jul 2023
- AAMAS’23Restless Multi-Armed Bandits for Maternal and Child Health: Results from Decision-Focused Learning.In International Conference on Autonomous Agents and Multi Agent Systems, 15–19 jul 2023
- IAAI’23Increasing impact of mobile health programs: SAHELI for maternal and child care🏆Best Deployed Application🏆In AAAI Conference on Artificial Intelligence, 15–19 jul 2023
2022
- AAAI’22Field study in deploying restless multi-armed bandits: Assisting non-profits in improving maternal and child healthIn AAAI Conference on Artificial Intelligence, 15–19 jul 2022
- TSRML-NeurIPS’22Case study: Applying decision focused learning in the real worldIn Workshop on Trustworthy and Socially Responsible Machine Learning at NeurIPS, 15–19 jul 2022
2021
- AAMAS’21Towards Sample Efficient Learners in Population based Referential Games through Action AdvisingIn International Conference on Autonomous Agents and Multi Agent Systems, 15–19 jul 2021
2020
- CoDS-COMAD’20Deep reinforcement learning for single-shot diagnosis and adaptation in damaged robotsIn ACM International Joint Conference on Data Science and Management of Data, 15–19 jul 2020
- LaREL-ICML’20Emergence of Multilingualism in Population based Referential GamesIn Workshop on Language in Reinforcement Learning at ICML, 15–19 jul 2020
- AAAI’20Emergence of Writing Systems through Multi-Agent Cooperation (Student Abstract)In AAAI Conference on Artificial Intelligence, 15–19 jul 2020