In recent sequential multiple assignment randomized trials, outcomes were assessed multiple times to evaluate longer-term impacts of the dynamic treatment regimes (DTRs). Q-learning requires a scalar response to identify the optimal DTR. Inverse probability weighting may be used to estimate the optimal outcome trajectory, but it is inefficient, susceptible to model mis-specification, and unable to characterize how treatment effects manifest over time. We propose modified Q-learning with generalized estimating equations to address these limitations and apply it to the M-bridge trial, which evaluates adaptive interventions to prevent problematic drinking among college freshmen. Simulation studies demonstrate our proposed method improves efficiency and robustness.
CITATION STYLE
Zhang, Y., Vock, D. M., Patrick, M. E., Finestack, L. H., & Murray, T. A. (2023). Outcome trajectory estimation for optimal dynamic treatment regimes with repeated measures. Journal of the Royal Statistical Society. Series C: Applied Statistics, 72(4), 976–991. https://doi.org/10.1093/jrsssc/qlad037
Mendeley helps you to discover research relevant for your work.