Shresth Verma

me.jpeg

I am a third-year PhD student at Harvard University advised by Prof. Milind Tambe with research interests in Reinforcement Learning and LLMs for complex decision-making tasks. My work focuses on

  • Enhancing code generation by LLMs through inference-time reasoning
  • LLM fine-tuning techniques that are robust to noisy human preferences
  • Safe mechanisms for balancing tradeoffs in multi-objective preferences
  • Enabling multi-turn, goal-oriented dialogues with LLMs

Previously, I spent two wonderful years at Google Research India, working in the AI for Social Good lab where I was grateful to be advised by Dr. Aparna Taneja. I developed and deployed robust bandit algorithms to plan targeted mobile health interventions for more than 100K beneficiaries from underserved communities in India.

Before that, I was a Data Scientist at United Health Group where I worked in the Chief Medical Officer’s team for modelling readmission risks for millions of beneficiaries. I also worked with data from the world’s largest healthcare graph database, designing graph-based analytics and tools to model and interpret patients’ longitudinal wellness journeys.

Harvard University Google AI / Meta UnitedHealth Group
2023 - Present 2021 - 2023 2020 - 2021

News

Nov 10, 2025 Our work on Preference-Robust DPO is accepted at AAAI 2026!
May 12, 2025 Our work on Portfolios for Multi-objective RL has been accepted at ICML 2025. See you in Vancouver!
Nov 3, 2024 I will be presenting a demo on LLMs for RL Code Generation at AAAI 2025. See you in Philly!
Oct 19, 2024 Presenting my work on Social Choice Language Model at NeurIPS 2024 GenAI for Health Workshop. See you in Vancouver!
Jul 24, 2024 Our work on Group Fairness in Decision-Focused Learning has been accepted at UAI 2024!
Mar 24, 2024 I’ll be attending Data Study Group at The Alan Turing Institute as a Facilitator!
Feb 5, 2024 Got accepted into Harvard’s Spring 2024 Technical AI Safety Fellowship!

Selected Publications

2026

  1. AAAI’26
    Preference Robustness for DPO with Applications to Public Health
    Cheol Woo Kim*, Shresth Verma*, Mauricio Tec, and Milind Tambe
    In AAAI Conference on Artificial Intelligence, 2026

2025

  1. AAAI’25
    PRIORITY2REWARD: Incorporating Healthworker Preferences for Resource Allocation Planning
    Shresth Verma, Alayna Nguyen, Niclas Boehmer, Lingkai Kong, and Milind Tambe
    In Proceedings of the AAAI Conference on Artificial Intelligence, 2025
  2. ICML’25
    Navigating the Social Welfare Frontier: Portfolios for Multi-objective Reinforcement Learning
    Cheol Woo Kim, Jai Moondra, Shresth Verma, Madeleine Pollack, Lingkai Kong, and 2 more authors
    In Proceedings of the 42nd International Conference on Machine Learning, 13–19 jul 2025

2024

  1. UAI’24
    Group Fairness in Predict-Then-Optimize Settings for Restless Bandits
    Shresth Verma, Yunfan Zhao, Sanket Shah, Niclas Boehmer, Aparna Taneja, and 1 more author
    In Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, 15–19 jul 2024
  2. IAAI’24
    Improving Health Information Access in the World’s Largest Maternal Mobile Health Program via Bandit Algorithms
    Arshika Lalan*, Shresth Verma*, Paula Rodriguez Diaz, Panayiotis Danassis, Amrita Mahale, and 4 more authors
    In AAAI Conference on Artificial Intelligence, 15–19 jul 2024

2023

  1. IJCAI’23
    Limited Resource Allocation in a Non-Markovian World: The Case of Maternal and Child Healthcare
    Panayiotis Danassis, Shresth Verma, Jackson A. Killian, Aparna Taneja, and Milind Tambe
    In International Joint Conference on Artificial Intelligence, 15–19 jul 2023
  2. AAAI’23
    Scalable decision-focused learning in restless multi-armed bandits with application to maternal and child health
    Kai Wang*, Shresth Verma*, Aditya Mate, Sanket Shah, Aparna Taneja, and 3 more authors
    In AAAI Conference on Artificial Intelligence, 15–19 jul 2023
  3. AAAI’23
    Robust planning over restless groups: engagement interventions for a large-scale maternal telehealth program
    Jackson A Killian*, Arpita Biswas*, Lily Xu*, Shresth Verma*, Vineet Nair, and 6 more authors
    In AAAI Conference on Artificial Intelligence, 15–19 jul 2023
  4. AAMAS’23
    Restless Multi-Armed Bandits for Maternal and Child Health: Results from Decision-Focused Learning.
    Shresth Verma, Aditya Mate, Kai Wang, Neha Madhiwalla, Aparna Hegde, and 2 more authors
    In International Conference on Autonomous Agents and Multi Agent Systems, 15–19 jul 2023
  5. IAAI’23
    Increasing impact of mobile health programs: SAHELI for maternal and child care
    Shresth Verma*, Gargi Singh*, Aditya Mate, Paritosh Verma, Sruthi Gorantla, and 6 more authors
    🏆Best Deployed Application🏆
    In AAAI Conference on Artificial Intelligence, 15–19 jul 2023

2022

  1. AAAI’22
    Field study in deploying restless multi-armed bandits: Assisting non-profits in improving maternal and child health
    Aditya Mate*, Lovish Madaan*, Aparna Taneja, Neha Madhiwalla, Shresth Verma, and 4 more authors
    In AAAI Conference on Artificial Intelligence, 15–19 jul 2022
  2. TSRML-NeurIPS’22
    Case study: Applying decision focused learning in the real world
    Shresth Verma, Aditya Mate, Kai Wang, Aparna Taneja, and Milind Tambe
    In Workshop on Trustworthy and Socially Responsible Machine Learning at NeurIPS, 15–19 jul 2022

2021

  1. AAMAS’21
    Towards Sample Efficient Learners in Population based Referential Games through Action Advising
    Shresth Verma
    In International Conference on Autonomous Agents and Multi Agent Systems, 15–19 jul 2021

2020

  1. CoDS-COMAD’20
    Deep reinforcement learning for single-shot diagnosis and adaptation in damaged robots
    Shresth Verma, Haritha S Nair, Gaurav Agarwal, Joydip Dhar, and Anupam Shukla
    In ACM International Joint Conference on Data Science and Management of Data, 15–19 jul 2020
  2. LaREL-ICML’20
    Emergence of Multilingualism in Population based Referential Games
    Shresth Verma
    In Workshop on Language in Reinforcement Learning at ICML, 15–19 jul 2020
  3. AAAI’20
    Emergence of Writing Systems through Multi-Agent Cooperation (Student Abstract)
    Shresth Verma, and Joydip Dhar
    In AAAI Conference on Artificial Intelligence, 15–19 jul 2020