Benchmarking Reinforcement Learning and Off Policy Evaluation for Medical Decision Making

Mason Hargrave

Date of Award

2025

Document Type

Thesis

Degree Name

Doctor of Philosophy (PhD)

Thesis Advisor

Marcelo O. Magnasco

Keywords

reinforcement learning, dynamic treatment regimes, offline RL, healthcare, off-policy evaluation, benchmark dataset

Abstract

Healthcare applications pose significant challenges to existing Reinforcement Learning (RL) methods due to implementation risks, low data availability, short treatment episodes, sparse re[1]wards, partial observations, and heterogeneous treatment effects (HTE). Despite significant interest in developing Dynamic Treatment Regimes (DTRs) for longitudinal patient care scenarios, no standardized benchmark has yet been developed. To address this gap, this thesis introduces Episodes of Care (EpiCare), a benchmark designed to mimic the challenges associated with applying RL to longitudinal healthcare settings. I leverage this benchmark to test seven state-of-the-art offline RL models as well as five common off-policy evaluation (OPE) techniques. My results suggest that while offline RL may be capable of improving upon existing standards of care given large data availability, its applicability does not appear to extend to the moderate to low data regimes typical of healthcare settings. Additionally, I demonstrate that several OPE techniques which have become standard in the medical RL literature fail to perform adequately under simulated conditions. These results suggest that the performance of RL models in DTRs may be difficult to meaningfully evaluate using current OPE methods, indicating that RL for this application may still be in its early stages. It is my hope that these findings, along with the EpiCare benchmark itself, will facilitate the comparison of existing methods and inspire further research into techniques that increase the practical applicability of medical RL.

Comments

A thesis presented to the faculty of The Rockefeller University in partial fulfillment of the requirements for the degree of Doctor of Philosophy

DOI

10.48496/ywg9-1z58

License and Reuse Information

This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.

Recommended Citation

Hargrave, Mason, "Benchmarking Reinforcement Learning and Off Policy Evaluation for Medical Decision Making" (2025). Student Theses and Dissertations. 803.
https://doi.org/10.48496/ywg9-1z58

Download

Included in

Life Sciences Commons

COinS

Benchmarking Reinforcement Learning and Off Policy Evaluation for Medical Decision Making

Date of Award

Document Type

Degree Name

Thesis Advisor

Related Items

Keywords

Abstract

Comments

DOI

License and Reuse Information

Recommended Citation

Included in

Search

Browse

Author Corner

Links