Introduction
In the study of psychiatry and neurodegenerative diseases, it is common to have correlated observations. By correlation, one means that the mechanism that gives rise to the observation is not necessarily different from the one that gave rise to another observation. Observations may be correlated due to the hierarchical structure by which the data are obtained or because they are repeatedly measured.
Longitudinal models for psychiatry research
Longitudinal studies are a crucial component of psychiatric research.1 Some of the most important research questions in psychiatry and mental health investigate symptoms and behaviour change over time with time-dependent factors that influence the development of pathological and normal behaviours.2 Longitudinal studies are used to investigate mental health disorders, such as depression, schizophrenia, psychosis, bipolar disorder and post-traumatic stress disorder, among others.1 3
Longitudinal studies have been used in determining service use among patients with mental health diseases. Hospitalisation for psychiatric reasons and receiving psychiatric crisis services are two outcomes of interest for measuring service use. Such are of importance to help reduce service use and are particularly central for consumer-run organisations as they are usually government-funded.4 Other longitudinal studies are used to understand the effectiveness of treatments and programmes in improving the quality of life of patients with mental health disorders.5 6
As an example, in a longitudinal study used to investigate the success of consumer-run organisations in promoting the mental health of their members, researchers gathered demographic data and obtained information about social support, community integration, personal empowerment, quality of life, symptom distress and service use.6 Higher levels of social support, community integration and quality of life might decrease the probability of being hospitalised for psychiatric reasons or receiving psychiatric crisis services over time. Increased levels of symptom distress might increase the likelihood of using such services. Further, using such services might, in turn, increase levels of symptom distress in the future, resulting in feedback from the service use outcomes to levels of symptom distress.7
The marginal models presented in this paper are used to investigate these longitudinal studies, especially when feedback is involved in the research questions in psychiatry. These marginal models account for the different sources of correlation encountered in longitudinal data.
Types of correlation in longitudinal data
There are different types of correlation in longitudinal data. When analysing longitudinal binary data, it is essential to account for both the correlation inherent from the repeated measures of the responses and the correlation realised because of the feedback created between the responses at a particular time and the covariates at other times. Ignoring any of these correlations can lead to invalid conclusions. Such is the case, for example, when the covariates are time-dependent, and the standard logistic regression model is used.
There are three types of correlations discussed in the paper: responses with responses, covariates with current and future responses, and responses with future covariates. A model to address these types of relationships is the aim of this article and the book Marginal Models in the Analysis of Correlated Data with Time-dependent Covariates.8 The different types of correlation presented are shown in figure 1,8
There are correlations among the responses which are denoted by as time t goes from one to T .
There are correlations between the covariate and outcome when covariates at time s impact the outcomes in time . If , one refers to these correlations as the direct or cross-sectional effects of the covariate on the outcome. If , these correlations are called the lagged effects of the covariate on the outcome.
There are correlations between response and covariate when the outcome in time t impacts the covariate in time s . These correlations are often referred to as feedback effects from to the future
This paper provides an overview of modelling repeated responses with time-dependent, time-independent covariates and feedback effects. With this review, we offer some guidance in analysing correlated data due to repeated measurements.
Correlated models for longitudinal data
There are two common approaches used to analyse longitudinal data: population-averaged (also known as marginal models) and subject-specific models. Population-averaged models focus on understanding what affects the mean outcome of the population, while subject-specific models concentrate on determining what impacts the mean outcome of subpopulations.9 The basic idea of population-averaged models is that instead of attempting to model the within-subject covariance structure, it is treated as a nuisance, and the focus turns to the marginal mean. In this framework, the covariance structure does not need to be specified correctly for one to get reasonable estimates of regression coefficients and SEs. In contrast, the subject-specific model distinguishes observations belonging to the same or different subpopulations. Random effects are commonly used to estimate the subject-specific models. For repeated responses, the hierarchical logistic regression models are used for multilevel analysis. There are two methods used for estimating subject-specific models, maximum likelihood approach (random-effects models) and the conditional likelihood procedure. This paper focuses on marginal models for longitudinal data.
Population-averaged or marginal model
For longitudinal data, Zeger and Liang9 proposed the generalised estimating equation (GEE) marginal model. The GEE is an extension of generalised linear models to estimate the population-averaged estimates while accounting for the dependency between the repeated measurements.10 Specifically, the dependency or correlation between repeated measures is accounted for by a robust estimation of the variances of the regression coefficients. In fact, the GEE approach treats the time dependency as a nuisance, and a ‘working’ correlation matrix for the vector of repeated observations from each subject is specified to account for the dependency among the repeated observations. The form of ‘working correlation’ is assumed to be the same for the subjects, reflecting average dependence among the repeated observations over subjects. Several different working correlation structures are possible, including independence, exchangeable, autoregressive and unstructured, to name a few.
A generalised method of moments (GMM) model for longitudinal data that provides reasonable estimates of the marginal regression coefficients and is more efficient than GEE is due to Qu et al.11 However, this model does not distinguish between time-dependent and time-independent covariates. The GMM model for longitudinal data with continuous outcomes is extended to account for time-dependent covariates.12 Their model estimated regression coefficients for time-dependent covariates by classifying them into three different types, which determined the group of valid moment conditions used in the estimation process.12 This model is expanded to allow for the modelling of binary longitudinal outcomes in the presence of time-dependent covariates.13 They provide researchers with the ability of testing for valid moment conditions for time-dependent covariates individually instead of assuming that a group of moment conditions are valid because of the type of time-dependent covariate. Although these marginal models provide reasonable estimates of the regression coefficients, they assume that the effects of time-dependent covariates are the same across time. However, a partitioned coefficient model allows for the estimation of current and future effects of time-dependent covariates on binary and continuous outcomes as discussed in Irimata et al.14 For a thorough discussion of these models, see Marginal Models in the Analysis of Correlated Data with Time-dependent Covariates.8
Feedback model with time-dependent covariates
When analysing longitudinal data with time-dependent covariates, there are usually three questions (Qs) of interest that researchers seek to answer15:
Q1. What is the cross-sectional relationship/association between the outcome and the covariate (both X and Y are measured at the same time)?
Q2. Is the outcome at time t , , affected by the time-dependent covariate measured at a previous time period , ; (lagged covariates related/associated with future values of the outcome)?
Q3. Does the outcome at time , associate with the time-dependent covariate at time t , (feedback effect of outcome on future values of the time-dependent covariate)?
A two-stage model allows researchers to answer all three questions simultaneously.8 This two-stage model accounts for feedback effects while modelling the direct impact, as well as the delayed effect, of the covariates on future responses. However, modelling feedback might not always make sense or be significant. For example, higher levels of social support might decrease the probability of service use, but service use might not affect levels of social support in the future.
Model
Stage 1 of the model allows for the fit of the cross-sectional and the lagged effects of time-dependent covariates on the outcome of interest (Q1 and Q2). Let each covariate be measured at times resulting for subject i and covariate . Thus, the model
with so
where the matrix consists of a column of ones concatenated with a lower diagonal matrix as the systematic component, and is dependent on the regression coefficients with , where s and t go from 1 to T . The coefficient denotes the effect of the covariate on the response when both are measured at the tth time period. However, when , it does not necessarily follow that one should interpret the past, using two different time periods in the same way as when and are in the same time period, . The impact of a covariate on the response from a previous time period is not intuitively the same as when they are measured in the same period. This is especially true in health research when time of dose will have impact on the reaction of the patient. Thus, current and future effects should not be combined but rather analysed separately. This is best explained by representing the effect of on , and by representing the effect of on and so on. In general, one can consider the systematic component consisting of P covariates and let be the parameters associated with those covariates, with each having maximum length T . Thus, X is of maximum dimension and β is a vector of maximum dimension .
Stage 1 of the model, based only on the valid moment conditions, is fitted as
( when the valid moments conditions exist. In this model, denotes the regression parameter for the cross-sectional effects of time-dependent covariates on the outcome; these are cases where the moment conditions are always valid (the effect of the covariate in the same period as the response). The coefficient represents the lagged effects of the covariate on the response when the covariate is measured prior to the outcome ( ) and the moment conditions are valid.
As an example, in our earlier example, one can determine if the quality of life, social support, community integration and symptom distress measured 6, 12 and 18 months before had a positive or negative association with service use now. These are given by the coefficients in stage 1 of the model.
In stage 2 of the model, the feedback from the outcome to future values of the covariates (Q3) is addressed. The jth covariate measured at time s , , is fitted as
where is the mean of the covariate . The, represents the feedback of the outcome on the time-dependent covariate measured immediately in the next time period. The regression coefficient represents the feedback of the outcome on the covariate measured in the two time periods after and so on. As an example, in the study referred to earlier, one may want to investigate whether service use increases levels of symptom distress in the future.
In both stages of the two-stage model, estimates are obtained using GMM after determining valid moment conditions. For modelling the feedback from the outcome to two or more time-dependent covariates, the estimates are obtained through the use of simultaneous GMM. Computing code to fit this model can be found online (https://github.com/ElsaVazquez29/Feedback-Code).