Article Text

## Abstract

In studies on psychiatry and neurodegenerative diseases, it is common to have data that are correlated due to the hierarchical structure in data collection or to repeated measures on the subject longitudinally. However, the feedback effect created due to time-dependent covariates in these studies is often overlooked and seldom modelled. This article reviews the methodological development of feedback effects with marginal models for longitudinal data and discusses their implementation.

- biostatistics
- longitudinal studies
- models, statistical
- statistics as topic

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

## Statistics from Altmetric.com

## Introduction

In the study of psychiatry and neurodegenerative diseases, it is common to have correlated observations. By correlation, one means that the mechanism that gives rise to the observation is not necessarily different from the one that gave rise to another observation. Observations may be correlated due to the hierarchical structure by which the data are obtained or because they are repeatedly measured.

### Longitudinal models for psychiatry research

Longitudinal studies are a crucial component of psychiatric research.1 Some of the most important research questions in psychiatry and mental health investigate symptoms and behaviour change over time with time-dependent factors that influence the development of pathological and normal behaviours.2 Longitudinal studies are used to investigate mental health disorders, such as depression, schizophrenia, psychosis, bipolar disorder and post-traumatic stress disorder, among others.1 3

Longitudinal studies have been used in determining service use among patients with mental health diseases. Hospitalisation for psychiatric reasons and receiving psychiatric crisis services are two outcomes of interest for measuring service use. Such are of importance to help reduce service use and are particularly central for consumer-run organisations as they are usually government-funded.4 Other longitudinal studies are used to understand the effectiveness of treatments and programmes in improving the quality of life of patients with mental health disorders.5 6

As an example, in a longitudinal study used to investigate the success of consumer-run organisations in promoting the mental health of their members, researchers gathered demographic data and obtained information about social support, community integration, personal empowerment, quality of life, symptom distress and service use.6 Higher levels of social support, community integration and quality of life might decrease the probability of being hospitalised for psychiatric reasons or receiving psychiatric crisis services over time. Increased levels of symptom distress might increase the likelihood of using such services. Further, using such services might, in turn, increase levels of symptom distress in the future, resulting in feedback from the service use outcomes to levels of symptom distress.7

The marginal models presented in this paper are used to investigate these longitudinal studies, especially when feedback is involved in the research questions in psychiatry. These marginal models account for the different sources of correlation encountered in longitudinal data.

### Types of correlation in longitudinal data

There are different types of correlation in longitudinal data. When analysing longitudinal binary data, it is essential to account for both the correlation inherent from the repeated measures of the responses and the correlation realised because of the feedback created between the responses at a particular time and the covariates at other times. Ignoring any of these correlations can lead to invalid conclusions. Such is the case, for example, when the covariates are time-dependent, and the standard logistic regression model is used.

There are three types of correlations discussed in the paper: responses with responses, covariates with current and future responses, and responses with future covariates. A model to address these types of relationships is the aim of this article and the book *Marginal Models in the Analysis of Correlated Data with Time-dependent Covariates*.8 The different types of correlation presented are shown in figure 1,8

There are correlations among the responses which are denoted by as time t goes from one to T .

There are correlations between the covariate and outcome when covariates at time s impact the outcomes in time . If , one refers to these correlations as the direct or cross-sectional effects of the covariate on the outcome. If , these correlations are called the lagged effects of the covariate on the outcome.

There are correlations between response and covariate when the outcome in time t impacts the covariate in time s . These correlations are often referred to as feedback effects from to the future

This paper provides an overview of modelling repeated responses with time-dependent, time-independent covariates and feedback effects. With this review, we offer some guidance in analysing correlated data due to repeated measurements.

### Correlated models for longitudinal data

There are two common approaches used to analyse longitudinal data: population-averaged (also known as marginal models) and subject-specific models. Population-averaged models focus on understanding what affects the mean outcome of the population, while subject-specific models concentrate on determining what impacts the mean outcome of subpopulations.9 The basic idea of population-averaged models is that instead of attempting to model the within-subject covariance structure, it is treated as a nuisance, and the focus turns to the marginal mean. In this framework, the covariance structure does not need to be specified correctly for one to get reasonable estimates of regression coefficients and SEs. In contrast, the subject-specific model distinguishes observations belonging to the same or different subpopulations. Random effects are commonly used to estimate the subject-specific models. For repeated responses, the hierarchical logistic regression models are used for multilevel analysis. There are two methods used for estimating subject-specific models, maximum likelihood approach (random-effects models) and the conditional likelihood procedure. This paper focuses on marginal models for longitudinal data.

### Population-averaged or marginal model

For longitudinal data, Zeger and Liang9 proposed the generalised estimating equation (GEE) marginal model. The GEE is an extension of generalised linear models to estimate the population-averaged estimates while accounting for the dependency between the repeated measurements.10 Specifically, the dependency or correlation between repeated measures is accounted for by a robust estimation of the variances of the regression coefficients. In fact, the GEE approach treats the time dependency as a nuisance, and a ‘working’ correlation matrix for the vector of repeated observations from each subject is specified to account for the dependency among the repeated observations. The form of ‘working correlation’ is assumed to be the same for the subjects, reflecting average dependence among the repeated observations over subjects. Several different working correlation structures are possible, including independence, exchangeable, autoregressive and unstructured, to name a few.

A generalised method of moments (GMM) model for longitudinal data that provides reasonable estimates of the marginal regression coefficients and is more efficient than GEE is due to Qu *et al*.11 However, this model does not distinguish between time-dependent and time-independent covariates. The GMM model for longitudinal data with continuous outcomes is extended to account for time-dependent covariates.12 Their model estimated regression coefficients for time-dependent covariates by classifying them into three different types, which determined the group of valid moment conditions used in the estimation process.12 This model is expanded to allow for the modelling of binary longitudinal outcomes in the presence of time-dependent covariates.13 They provide researchers with the ability of testing for valid moment conditions for time-dependent covariates individually instead of assuming that a group of moment conditions are valid because of the type of time-dependent covariate. Although these marginal models provide reasonable estimates of the regression coefficients, they assume that the effects of time-dependent covariates are the same across time. However, a partitioned coefficient model allows for the estimation of current and future effects of time-dependent covariates on binary and continuous outcomes as discussed in Irimata *et al*.14 For a thorough discussion of these models, see *Marginal Models in the Analysis of Correlated Data with Time-dependent Covariates*.8

### Feedback model with time-dependent covariates

When analysing longitudinal data with time-dependent covariates, there are usually three questions (Qs) of interest that researchers seek to answer15:

Q1. What is the cross-sectional relationship/association between the outcome and the covariate (both X and Y are measured at the same time)?

Q2. Is the outcome at time t , , affected by the time-dependent covariate measured at a previous time period , ; (lagged covariates related/associated with future values of the outcome)?

Q3. Does the outcome at time , associate with the time-dependent covariate at time t , (feedback effect of outcome on future values of the time-dependent covariate)?

A two-stage model allows researchers to answer all three questions simultaneously.8 This two-stage model accounts for feedback effects while modelling the direct impact, as well as the delayed effect, of the covariates on future responses. However, modelling feedback might not always make sense or be significant. For example, higher levels of social support might decrease the probability of service use, but service use might not affect levels of social support in the future.

### Model

Stage 1 of the model allows for the fit of the cross-sectional and the lagged effects of time-dependent covariates on the outcome of interest (Q1 and Q2). Let each covariate be measured at times resulting for subject i and covariate . Thus, the model

with so

where the
matrix consists of a column of ones concatenated with a lower diagonal matrix as the systematic component, and
is dependent on the regression coefficients
with
, where
s
and
t
go from
1
to
T
. The coefficient
denotes the effect of the covariate
on the response
when both are measured at the *t*th time period. However, when
, it does not necessarily follow that one should interpret the past, using two different time periods in the same way as when
and
are in the same time period,
. The impact of a covariate on the response from a previous time period is not intuitively the same as when they are measured in the same period. This is especially true in health research when time of dose will have impact on the reaction of the patient. Thus, current and future effects should not be combined but rather analysed separately. This is best explained by
representing the effect of
on
, and by
representing the effect of
on
and so on. In general, one can consider the systematic component consisting of
P
covariates and let
be the parameters associated with those covariates, with each
having maximum length
T
. Thus,
X
is of maximum dimension
and
β
is a vector of maximum dimension
.

Stage 1 of the model, based only on the valid moment conditions, is fitted as

( when the valid moments conditions exist. In this model, denotes the regression parameter for the cross-sectional effects of time-dependent covariates on the outcome; these are cases where the moment conditions are always valid (the effect of the covariate in the same period as the response). The coefficient represents the lagged effects of the covariate on the response when the covariate is measured prior to the outcome ( ) and the moment conditions are valid.

As an example, in our earlier example, one can determine if the quality of life, social support, community integration and symptom distress measured 6, 12 and 18 months before had a positive or negative association with service use now. These are given by the coefficients in stage 1 of the model.

In stage 2 of the model, the feedback from the outcome to future values of the covariates (Q3) is addressed. The *j*th covariate measured at time
s
,
, is fitted as

where is the mean of the covariate . The, represents the feedback of the outcome on the time-dependent covariate measured immediately in the next time period. The regression coefficient represents the feedback of the outcome on the covariate measured in the two time periods after and so on. As an example, in the study referred to earlier, one may want to investigate whether service use increases levels of symptom distress in the future.

In both stages of the two-stage model, estimates are obtained using GMM after determining valid moment conditions. For modelling the feedback from the outcome to two or more time-dependent covariates, the estimates are obtained through the use of simultaneous GMM. Computing code to fit this model can be found online (https://github.com/ElsaVazquez29/Feedback-Code).

## Conclusions

In psychiatry, the correlation inherent in repeated measures is further affected by the presence of time-dependent covariates. It grossly impedes any interpretations the psychiatrist makes. In particular, the changes and feedback presented when the covariates are time-dependent cannot be ignored. Often, the feedback effects go unchecked. However, any modelling of longitudinal data must address the impact from the feedback, as well as the immediate and the delayed effects of covariates on the responses. For the aspect of feedback, there is an advantage of using a two-part model to correlated data with time-dependent covariates as it allows one to use GMM methods to identify valid moments.

While there is merit in the models due to Lai and Small,12 Zhou *et al*,16 Lalonde *et al*
13 and Irimata *et al*,14 they do not always account for the feedback. The two-stage GMM model allows one to account for the feedback effects across different time-periods. It partitions the regression coefficients and allows one to identify directional and delayed effects.

We have developed a new approach to marginal regression analysis for time-dependent covariates with feedback. We use the GMM to make optimal use of the estimating equations that are made available by the covariates in both the direct part of the model and the feedback. We have focused on marginal regression analysis. The approach is also useful for obtaining more efficient estimates. This model conditions on part of the past history of covariates and outcomes. The partly conditional model is intermediate between the marginal model that conditions only on the covariates at time *t* and the transition model that conditions on the full history of covariates and outcomes at time *t*.

In summary, there exists a correlation when modelling dependent data with time-dependent and time-independent covariates with feedback effects, which cannot be ignored. With this review on the methodological development, we recommend the marginal models as extensively discussed in *Marginal Models in the Analysis of Correlated Data with Time-dependent Covariates*.8

## References

Dr. Elsa Vazquez-Arreola is a biostatistician at the National Institute of Diabetes and Digestive and Kidney Diseases, USA. She obtained a Ph.D. degree in Statistics from Arizona State University, USA. She has consulted on numerous clinical and research studies and is currently a co-author of the book entitled “Marginal models in the analysis of correlated data with time-dependent covariates” with Dr. Jeffrey R. Wilson and Dr. Ding-Geng Chen. Her main research interests include models for correlated data, time-dependent covariates, generalized linear mixed models and propensity score models with applications in public health and behavioural health.

## Footnotes

Contributors EV and JRW conceived of the presented idea and wrote the manuscript with support from D-GC.

Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests None declared.

Patient consent for publication Not required.

Provenance and peer review Commissioned; externally peer reviewed.

Data availability statement No additional data are available.

## Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.