Biostatistical methods in psychiatry

Analysis of correlated data with feedback for time-dependent covariates in psychiatry research

Abstract

In studies on psychiatry and neurodegenerative diseases, it is common to have data that are correlated due to the hierarchical structure in data collection or to repeated measures on the subject longitudinally. However, the feedback effect created due to time-dependent covariates in these studies is often overlooked and seldom modelled. This article reviews the methodological development of feedback effects with marginal models for longitudinal data and discusses their implementation.

Introduction

In the study of psychiatry and neurodegenerative diseases, it is common to have correlated observations. By correlation, one means that the mechanism that gives rise to the observation is not necessarily different from the one that gave rise to another observation. Observations may be correlated due to the hierarchical structure by which the data are obtained or because they are repeatedly measured.

Longitudinal models for psychiatry research

Longitudinal studies are a crucial component of psychiatric research.1 Some of the most important research questions in psychiatry and mental health investigate symptoms and behaviour change over time with time-dependent factors that influence the development of pathological and normal behaviours.2 Longitudinal studies are used to investigate mental health disorders, such as depression, schizophrenia, psychosis, bipolar disorder and post-traumatic stress disorder, among others.1 3

Longitudinal studies have been used in determining service use among patients with mental health diseases. Hospitalisation for psychiatric reasons and receiving psychiatric crisis services are two outcomes of interest for measuring service use. Such are of importance to help reduce service use and are particularly central for consumer-run organisations as they are usually government-funded.4 Other longitudinal studies are used to understand the effectiveness of treatments and programmes in improving the quality of life of patients with mental health disorders.5 6

As an example, in a longitudinal study used to investigate the success of consumer-run organisations in promoting the mental health of their members, researchers gathered demographic data and obtained information about social support, community integration, personal empowerment, quality of life, symptom distress and service use.6 Higher levels of social support, community integration and quality of life might decrease the probability of being hospitalised for psychiatric reasons or receiving psychiatric crisis services over time. Increased levels of symptom distress might increase the likelihood of using such services. Further, using such services might, in turn, increase levels of symptom distress in the future, resulting in feedback from the service use outcomes to levels of symptom distress.7

The marginal models presented in this paper are used to investigate these longitudinal studies, especially when feedback is involved in the research questions in psychiatry. These marginal models account for the different sources of correlation encountered in longitudinal data.

Types of correlation in longitudinal data

There are different types of correlation in longitudinal data. When analysing longitudinal binary data, it is essential to account for both the correlation inherent from the repeated measures of the responses and the correlation realised because of the feedback created between the responses at a particular time and the covariates at other times. Ignoring any of these correlations can lead to invalid conclusions. Such is the case, for example, when the covariates are time-dependent, and the standard logistic regression model is used.

There are three types of correlations discussed in the paper: responses with responses, covariates with current and future responses, and responses with future covariates. A model to address these types of relationships is the aim of this article and the book Marginal Models in the Analysis of Correlated Data with Time-dependent Covariates.8 The different types of correlation presented are shown in figure 1,8

Figure 1
Figure 1

Types of correlation structures.

  1. There are correlations among the responses which are denoted by  Inline Formula  as time t goes from one to T .

  2. There are correlations between the covariate  Inline Formula  and outcome  Inline Formula  when covariates at time s impact the outcomes in time  Inline Formula . If  Inline Formula , one refers to these correlations as the direct or cross-sectional effects of the covariate on the outcome. If  Inline Formula , these correlations are called the lagged effects of the covariate on the outcome.

  3. There are correlations between response  Inline Formula  and covariate  Inline Formula  when the outcome in time t impacts the covariate in time s  Inline Formula . These correlations are often referred to as feedback effects from  Inline Formula  to the future Inline Formula 

This paper provides an overview of modelling repeated responses with time-dependent, time-independent covariates and feedback effects. With this review, we offer some guidance in analysing correlated data due to repeated measurements.

Correlated models for longitudinal data

There are two common approaches used to analyse longitudinal data: population-averaged (also known as marginal models) and subject-specific models. Population-averaged models focus on understanding what affects the mean outcome of the population, while subject-specific models concentrate on determining what impacts the mean outcome of subpopulations.9 The basic idea of population-averaged models is that instead of attempting to model the within-subject covariance structure, it is treated as a nuisance, and the focus turns to the marginal mean. In this framework, the covariance structure does not need to be specified correctly for one to get reasonable estimates of regression coefficients and SEs. In contrast, the subject-specific model distinguishes observations belonging to the same or different subpopulations. Random effects are commonly used to estimate the subject-specific models. For repeated responses, the hierarchical logistic regression models are used for multilevel analysis. There are two methods used for estimating subject-specific models, maximum likelihood approach (random-effects models) and the conditional likelihood procedure. This paper focuses on marginal models for longitudinal data.

Population-averaged or marginal model

For longitudinal data, Zeger and Liang9 proposed the generalised estimating equation (GEE) marginal model. The GEE is an extension of generalised linear models to estimate the population-averaged estimates while accounting for the dependency between the repeated measurements.10 Specifically, the dependency or correlation between repeated measures is accounted for by a robust estimation of the variances of the regression coefficients. In fact, the GEE approach treats the time dependency as a nuisance, and a ‘working’ correlation matrix for the vector of repeated observations from each subject is specified to account for the dependency among the repeated observations. The form of ‘working correlation’ is assumed to be the same for the subjects, reflecting average dependence among the repeated observations over subjects. Several different working correlation structures are possible, including independence, exchangeable, autoregressive and unstructured, to name a few.

A generalised method of moments (GMM) model for longitudinal data that provides reasonable estimates of the marginal regression coefficients and is more efficient than GEE is due to Qu et al.11 However, this model does not distinguish between time-dependent and time-independent covariates. The GMM model for longitudinal data with continuous outcomes is extended to account for time-dependent covariates.12 Their model estimated regression coefficients for time-dependent covariates by classifying them into three different types, which determined the group of valid moment conditions used in the estimation process.12 This model is expanded to allow for the modelling of binary longitudinal outcomes in the presence of time-dependent covariates.13 They provide researchers with the ability of testing for valid moment conditions for time-dependent covariates individually instead of assuming that a group of moment conditions are valid because of the type of time-dependent covariate. Although these marginal models provide reasonable estimates of the regression coefficients, they assume that the effects of time-dependent covariates are the same across time. However, a partitioned coefficient model allows for the estimation of current and future effects of time-dependent covariates on binary and continuous outcomes as discussed in Irimata et al.14 For a thorough discussion of these models, see Marginal Models in the Analysis of Correlated Data with Time-dependent Covariates.8

Feedback model with time-dependent covariates

When analysing longitudinal data with time-dependent covariates, there are usually three questions (Qs) of interest that researchers seek to answer15:

Q1. What is the cross-sectional relationship/association between the outcome  Inline Formula  and the covariate  Inline Formula  (both X and Y are measured at the same time)?

Q2. Is the outcome at time t ,  Inline Formula , affected by the time-dependent covariate measured at a previous time period  Inline Formula ,  Inline Formula  ;  Inline Formula  (lagged covariates related/associated with future values of the outcome)?

Q3. Does the outcome at time  Inline Formula ,  Inline Formula  associate with the  Inline Formula  time-dependent covariate at time t ,  Inline Formula  (feedback effect of outcome on future values of the time-dependent covariate)?

A two-stage model allows researchers to answer all three questions simultaneously.8 This two-stage model accounts for feedback effects while modelling the direct impact, as well as the delayed effect, of the covariates on future responses. However, modelling feedback might not always make sense or be significant. For example, higher levels of social support might decrease the probability of service use, but service use might not affect levels of social support in the future.

Model

Stage 1 of the model allows for the fit of the cross-sectional and the lagged effects of time-dependent covariates on the outcome of interest (Q1 and Q2). Let each covariate  Inline Formula  be measured at times  Inline Formula  resulting for subject i and covariate  Inline Formula  . Thus, the model

Display Formula

with  Inline Formula  so

Display Formula

where the  Inline Formula  matrix consists of a column of ones concatenated with a lower diagonal matrix as the systematic component, and  Inline Formula  is dependent on the regression coefficients  Inline Formula  with  Inline Formula , where s and t go from 1 to T . The coefficient  Inline Formula  denotes the effect of the covariate  Inline Formula  on the response  Inline Formula  when both are measured at the tth time period. However, when  Inline Formula , it does not necessarily follow that one should interpret the past, using two different time periods in the same way as when  Inline Formula  and  Inline Formula  are in the same time period,  Inline Formula . The impact of a covariate on the response from a previous time period is not intuitively the same as when they are measured in the same period. This is especially true in health research when time of dose will have impact on the reaction of the patient. Thus, current and future effects should not be combined but rather analysed separately. This is best explained by  Inline Formula  representing the effect of  Inline Formula  on  Inline Formula , and by  Inline Formula  representing the effect of  Inline Formula  on  Inline Formula  and so on. In general, one can consider the systematic component consisting of P covariates and let  Inline Formula  be the parameters associated with those covariates, with each  Inline Formula  having maximum length T . Thus, X is of maximum dimension  Inline Formula  and β is a vector of maximum dimension  Inline Formula .

Stage 1 of the model, based only on the valid moment conditions, is fitted as

 Inline Formula ( Inline Formula  when the valid moments conditions exist. In this model,  Inline Formula  denotes the regression parameter for the cross-sectional effects of time-dependent covariates on the outcome; these are cases where the moment conditions are always valid (the effect of the covariate in the same period as the response). The coefficient  Inline Formula  represents the lagged effects of the covariate on the response when the covariate is measured prior to the outcome ( Inline Formula ) and the moment conditions are valid.

As an example, in our earlier example, one can determine if the quality of life, social support, community integration and symptom distress measured 6, 12 and 18 months before had a positive or negative association with service use now. These are given by the coefficients in stage 1 of the model.

In stage 2 of the model, the feedback from the outcome to future values of the covariates (Q3) is addressed. The jth covariate measured at time s ,  Inline Formula , is fitted as

Display Formula

where  Inline Formula  is the mean of the covariate  Inline Formula  . The,  Inline Formula  represents the feedback of the outcome  Inline Formula  on the time-dependent covariate  Inline Formula  measured immediately in the next time period. The regression coefficient  Inline Formula  represents the feedback of the outcome on the covariate measured in the two time periods after and so on. As an example, in the study referred to earlier, one may want to investigate whether service use increases levels of symptom distress in the future.

In both stages of the two-stage model, estimates are obtained using GMM after determining valid moment conditions. For modelling the feedback from the outcome to two or more time-dependent covariates, the estimates are obtained through the use of simultaneous GMM. Computing code to fit this model can be found online (https://github.com/ElsaVazquez29/Feedback-Code).

Conclusions

In psychiatry, the correlation inherent in repeated measures is further affected by the presence of time-dependent covariates. It grossly impedes any interpretations the psychiatrist makes. In particular, the changes and feedback presented when the covariates are time-dependent cannot be ignored. Often, the feedback effects go unchecked. However, any modelling of longitudinal data must address the impact from the feedback, as well as the immediate and the delayed effects of covariates on the responses. For the aspect of feedback, there is an advantage of using a two-part model to correlated data with time-dependent covariates as it allows one to use GMM methods to identify valid moments.

While there is merit in the models due to Lai and Small,12 Zhou et al,16 Lalonde et al 13 and Irimata et al,14 they do not always account for the feedback. The two-stage GMM model allows one to account for the feedback effects across different time-periods. It partitions the regression coefficients and allows one to identify directional and delayed effects.

We have developed a new approach to marginal regression analysis for time-dependent covariates with feedback. We use the GMM to make optimal use of the estimating equations that are made available by the covariates in both the direct part of the model and the feedback. We have focused on marginal regression analysis. The approach is also useful for obtaining more efficient estimates. This model conditions on part of the past history of covariates and outcomes. The partly conditional model is intermediate between the marginal model that conditions only on the covariates at time t and the transition model that conditions on the full history of covariates and outcomes at time t.

In summary, there exists a correlation when modelling dependent data with time-dependent and time-independent covariates with feedback effects, which cannot be ignored. With this review on the methodological development, we recommend the marginal models as extensively discussed in Marginal Models in the Analysis of Correlated Data with Time-dependent Covariates.8

Dr. Elsa Vazquez-Arreola is a biostatistician at the National Institute of Diabetes and Digestive and Kidney Diseases, USA. She obtained a Ph.D. degree in Statistics from Arizona State University, USA. She has consulted on numerous clinical and research studies and is currently a co-author of the book entitled “Marginal models in the analysis of correlated data with time-dependent covariates” with Dr. Jeffrey R. Wilson and Dr. Ding-Geng Chen. Her main research interests include models for correlated data, time-dependent covariates, generalized linear mixed models and propensity score models with applications in public health and behavioural health.

author bio image
Article metrics
Altmetric data not available for this article.
Dimensionsopen-url