Impact of discretization of the timeline for longitudinal causal inference methods
Article [Accepted Manuscript]
Is part ofStatistics in medicine
In longitudinal settings, causal inference methods usually rely on a discretization of the patient timeline that may not reflect the underlying data generation process. This paper investigates the estimation of causal parameters under discretized data. It presents the implicit assumptions practitioners make but do not acknowledge when discretizing data to assess longitudinal causal parameters. We illustrate that differences in point estimates under different discretizations are due to the data coarsening resulting in both a modified definition of the parameter of interest and loss of information about time-dependent confounders. We further investigate several tools to advise analysts in selecting a timeline discretization for use with pooled Longitudinal Targeted Maximum Likelihood Estimation for the estimation of the parameters of a marginal structural model. We use a simulation study to empirically evaluate bias at different discretizations and assess the use of the cross-validated variance as a measure of data support to select a discretization under a chosen data coarsening mechanism. We then apply our approach to a study on the relative effect of alternative asthma treatments during pregnancy on pregnancy duration. The results of the simulation study illustrate how coarsening changes the target parameter of interest as well as how it may create bias due to a lack of appropriate control for time-dependent confounders. We also observe evidence that the cross-validated variance acts well as a measure of support in the data, by being minimized at finer discretizations as the sample size increases.