Shresth Verma

I am a third-year PhD student at Harvard University advised by Prof. Milind Tambe with research interests in Reinforcement Learning and LLMs for complex decision-making tasks. My work focuses on

Enhancing code generation by LLMs through inference-time reasoning
LLM fine-tuning techniques that are robust to noisy human preferences
Safe mechanisms for balancing tradeoffs in multi-objective preferences
Enabling multi-turn, goal-oriented dialogues with LLMs

Previously, I spent two wonderful years at Google Research India, working in the AI for Social Good lab where I was grateful to be advised by Dr. Aparna Taneja. I developed and deployed robust bandit algorithms to plan targeted mobile health interventions for more than 100K beneficiaries from underserved communities in India.

Before that, I was a Data Scientist at United Health Group where I worked in the Chief Medical Officer’s team for modelling readmission risks for millions of beneficiaries. I also worked with data from the world’s largest healthcare graph database, designing graph-based analytics and tools to model and interpret patients’ longitudinal wellness journeys.


2023 - Present	2021 - 2023	2020 - 2021

News

Nov 10, 2025	Our work on Preference-Robust DPO is accepted at AAAI 2026!
May 12, 2025	Our work on Portfolios for Multi-objective RL has been accepted at ICML 2025. See you in Vancouver!
Nov 3, 2024	I will be presenting a demo on LLMs for RL Code Generation at AAAI 2025. See you in Philly!
Oct 19, 2024	Presenting my work on Social Choice Language Model at NeurIPS 2024 GenAI for Health Workshop. See you in Vancouver!
Jul 24, 2024	Our work on Group Fairness in Decision-Focused Learning has been accepted at UAI 2024!
Mar 24, 2024	I’ll be attending Data Study Group at The Alan Turing Institute as a Facilitator!
Feb 5, 2024	Got accepted into Harvard’s Spring 2024 Technical AI Safety Fellowship!

Selected Publications

2026

AAAI’26

Preference Robustness for DPO with Applications to Public Health

Cheol Woo Kim*, Shresth Verma*, Mauricio Tec, and Milind Tambe

In AAAI Conference on Artificial Intelligence, 2026

2025

AAAI’25

PRIORITY2REWARD: Incorporating Healthworker Preferences for Resource Allocation Planning

Shresth Verma, Alayna Nguyen, Niclas Boehmer, Lingkai Kong, and Milind Tambe

In Proceedings of the AAAI Conference on Artificial Intelligence, 2025
ICML’25

Navigating the Social Welfare Frontier: Portfolios for Multi-objective Reinforcement Learning

Cheol Woo Kim, Jai Moondra, Shresth Verma, Madeleine Pollack, Lingkai Kong, and 2 more authors

In Proceedings of the 42nd International Conference on Machine Learning, 13–19 jul 2025

PDF

2024

UAI’24

Group Fairness in Predict-Then-Optimize Settings for Restless Bandits

Shresth Verma, Yunfan Zhao, Sanket Shah, Niclas Boehmer, Aparna Taneja, and 1 more author

In Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, 15–19 jul 2024

PDF
IAAI’24

Improving Health Information Access in the World’s Largest Maternal Mobile Health Program via Bandit Algorithms

Arshika Lalan*, Shresth Verma*, Paula Rodriguez Diaz, Panayiotis Danassis, Amrita Mahale, and 4 more authors

In AAAI Conference on Artificial Intelligence, 15–19 jul 2024

2023

IJCAI’23

Limited Resource Allocation in a Non-Markovian World: The Case of Maternal and Child Healthcare

Panayiotis Danassis, Shresth Verma, Jackson A. Killian, Aparna Taneja, and Milind Tambe

In International Joint Conference on Artificial Intelligence, 15–19 jul 2023
AAAI’23

Scalable decision-focused learning in restless multi-armed bandits with application to maternal and child health

Kai Wang*, Shresth Verma*, Aditya Mate, Sanket Shah, Aparna Taneja, and 3 more authors

In AAAI Conference on Artificial Intelligence, 15–19 jul 2023
AAAI’23

Robust planning over restless groups: engagement interventions for a large-scale maternal telehealth program

Jackson A Killian*, Arpita Biswas*, Lily Xu*, Shresth Verma*, Vineet Nair, and 6 more authors

In AAAI Conference on Artificial Intelligence, 15–19 jul 2023
AAMAS’23

Restless Multi-Armed Bandits for Maternal and Child Health: Results from Decision-Focused Learning.

Shresth Verma, Aditya Mate, Kai Wang, Neha Madhiwalla, Aparna Hegde, and 2 more authors

In International Conference on Autonomous Agents and Multi Agent Systems, 15–19 jul 2023
IAAI’23

Increasing impact of mobile health programs: SAHELI for maternal and child care

Shresth Verma*, Gargi Singh*, Aditya Mate, Paritosh Verma, Sruthi Gorantla, and 6 more authors

🏆Best Deployed Application🏆

In AAAI Conference on Artificial Intelligence, 15–19 jul 2023

2022

AAAI’22

Field study in deploying restless multi-armed bandits: Assisting non-profits in improving maternal and child health

Aditya Mate*, Lovish Madaan*, Aparna Taneja, Neha Madhiwalla, Shresth Verma, and 4 more authors

In AAAI Conference on Artificial Intelligence, 15–19 jul 2022
TSRML-NeurIPS’22

Case study: Applying decision focused learning in the real world

Shresth Verma, Aditya Mate, Kai Wang, Aparna Taneja, and Milind Tambe

In Workshop on Trustworthy and Socially Responsible Machine Learning at NeurIPS, 15–19 jul 2022

2021

AAMAS’21

Towards Sample Efficient Learners in Population based Referential Games through Action Advising

Shresth Verma

In International Conference on Autonomous Agents and Multi Agent Systems, 15–19 jul 2021

2020

CoDS-COMAD’20

Deep reinforcement learning for single-shot diagnosis and adaptation in damaged robots

Shresth Verma, Haritha S Nair, Gaurav Agarwal, Joydip Dhar, and Anupam Shukla

In ACM International Joint Conference on Data Science and Management of Data, 15–19 jul 2020
LaREL-ICML’20

Emergence of Multilingualism in Population based Referential Games

Shresth Verma

In Workshop on Language in Reinforcement Learning at ICML, 15–19 jul 2020
AAAI’20

Emergence of Writing Systems through Multi-Agent Cooperation (Student Abstract)

Shresth Verma, and Joydip Dhar

In AAAI Conference on Artificial Intelligence, 15–19 jul 2020