Published: Dec. 4, 2020

Susan Murphy, Radcliffe Alumnae Professor at the Radcliffe Institute and Professor of Statistics and Computer Science, Harvard University

Challenges in Developing Learning Algorithms to Personalize Treatment in Real Time

There are a variety of formidable challenges to reinforcement learning and control for use in designing digital health interventions for individuals with chronic disorders. Challenges include settings in which most treatments delivered by a smart device have immediate nonnegative (hopefully positive) effects but the largest longer term effects tend to be negative due to user burden. Furthermore the resulting data must be amenable to conducting a variety of statistical analyses, including causal inference as well as for use in monitoring analyses. Other challenges include an immature domain science concerning the system dynamics yet the need to incorporate some domain science due to low signal to noise ratio as well as non-stationary and sparse data. Here we describe how we confront these challenges including our use of low variance proxies for the delay effects to the reward (e.g. immediate response) in an online "bandit" learning algorithm for use in personalizing mobile health interventions.