Article Text

Download PDFPDF

Machine learning methods in psychiatry: a brief introduction
  1. Zhirou Zhou1,
  2. Tsung-Chin Wu2,
  3. Bokai Wang1,
  4. Hongyue Wang1,
  5. Xin M Tu3,4 and
  6. Changyong Feng1
  1. 1 Department of Biostatistics and Computational Biology, University of Rochester, Rochester, New York, USA
  2. 2 Department of Mathematics, University of California San Diego, La Jolla, California, USA
  3. 3 Family Medicine and Public Health, University of California San Diego, La Jolla, California, USA
  4. 4 Naval Health Research Center, San Diego, California, USA
  1. Correspondence to Professor Changyong Feng; Changyong_Feng{at}


Machine learning (ML) techniques have been widely used to address mental health questions. We discuss two main aspects of ML in psychiatry in this paper, that is, supervised learning and unsupervised learning. Examples are used to illustrate how ML has been implemented in recent mental health research.

  • models
  • statistical
  • psychiatry

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


The development of high technologies has significantly changed the research and treatment methods in psychiatry. Advanced technologies such as social media, smartphones and wearable devices have enabled psychiatric clinicians and researchers to collect a wide range data of subjects/patients within a relatively short period of time to monitor the psychical status of clients or patients,1 and to offer more accurate and personalised treatments. While enjoying the convenience brought to us by the advanced technologies, we are facing the challenge of analysing the large data set generated from them, and making good prediction of some outcomes for a new subject. Unlike the traditional statistical methods which try to find a good fit of the data to interpret the association between the outcome and some potential features, medical researchers and clinicians are interested in the prediction of treatment methods (for example, the dosage of a drug) and treatment outcomes (eg, 5-year survival probability) given a comprehensive measurement of different features of a patient.

Machine learning (ML) takes advantage of advanced statistical methods and computer science techniques, and has been implemented to analyse ‘big data’ nowadays.2 The common types of ML techniques used in the psychiatric field include supervised learning (SL) and unsupervised learning (USL).3

SL is used for data type with a labelled response variable. The purpose of SL is to develop a model for which the outcome can be formulated as a function of the features (covariates) so that the model can make a prediction of the outcome in the future when only the features are given. For instance, suppose we are interested in identifying a patient with either major depressive disorder or no depression, based on the measurement of some factors of patients. SL methods try to build a model between the outcome (eg, depression or not) and a series of features, such as age, gender, education background, work type and so on, which are collected from different data sources. Commonly used examples of SL algorithms include logistic regression (LR) and support vector machine (SVM);4 LR was borrowed directly from traditional statistics and SVM was invented by computer scientists.5 We will discuss LR in detail in the next section.

USL is applied to data without a labelled outcome.6 The algorithms try to recognise similarities/dissimilarities between subjects through input variables (features) without the aid of a labelled outcome. This is why it is called ‘unsupervised’. One of the most commonly used USL methods is the k-means clustering which minimises within-cluster variances to partition observations into k clusters. The lack of labelling will make USL more challenging, while this could also help to reveal the underlying data structure without a possible prior bias. We will discuss the k-means clustering by concrete example in the later section.

Supervised learning

LR is a widely used statistical method to model the conditional expectation of a binary outcome with the given covariates. It has also been generalised to the case of polytomous outcomes.7 For this reason, it is a natural choice for a classification method in multivariate analysis. After estimating the parameters in the regression function based on the training data, we predict the probabilities that a new subject will be assigned to each category in the outcome by substituting its observed features into the regression function.

LR is one of the most popular SL tools in biomedical studies. Recently, Lee et al developed an LR model for adolescent suicide attempt prediction using sociodemographic characteristics, risk behaviours and psychological variables.8 This study is based on a sample of 247 222 subjects in the Korea Youth Risk Behavior Web-based Survey. The LR model was used to predict the risk of suicide with 13 different variables selected through univariate analysis screening.9

For simplicity, assume the behaviour of adolescent suicide attempt (a binary outcome variable Y, with 1 for ‘Yes’ and 0 for ‘No’) is strongly associated with age (a continuous variable X 1), gender (a binary variable X 2, with 1 for male and 0 for female), experience of violence (a binary variable X 3 with 1 for ‘Yes’ and 0 for ‘No’), feelings of sadness (a binary variable X 4, with 1 for ‘Yes’ and 0 for ‘No’) and current alcohol drinking (a binary variable X 5, with 1 for ‘Yes’ and 0 for ‘No’) after variable selection. The LR model assumes that the conditional distribution of Y given the covariates is of the form

Embedded Image


Embedded Image

is called the linear predictor, a linear combination of covariates.10

Suppose we have a new subject with features in table 1, and want to predict the probability of committing suicide.

Table 1

Features of a hypothesised subject

Then we have,

Embedded Image

Embedded Image

Embedded Image


Embedded Image

Therefore, the suggested probability of this person’s suicide was calculated as 3.32%, which belongs to the high-risk group (>0.12%).8

Unsupervised learning

K-means clustering is a statistical technique that has been used to recognise the patterns of different types or levels of severity of a specific illness, based on related variables with no outcome labels provided. Fuente-Tomas et al proposed an easy-to-use, cluster-based severity classification for bipolar disorder (BD) that may help clinicians in the processes of personalised medicine and shared decision-making.11 In this study, 224 subjects with a diagnosis of BD (Diagnostic and Statistical Manual of Mental Disorders, 4th Edition, Text Revision) under ambulatory treatment were classified into five different clusters based on 12 variables from five domains.

In their method part, k-means clustering was used to reduce the dimension of four types of variables including patients’ sociodemographic and BD characteristics, psychometric instruments and laboratory results. By the algorithm of k-means clustering, the criterion is calculated by within-point scatter,

Embedded Image

where Embedded Image is the mean vector of the jth cluster, Embedded Image , and Embedded Image refers to all clusters that subject i belongs to. The number of clustering k can be found out by the Elbow method,12 a heuristic method that helps interpret the consistency within cluster analysis. K-means clustering aims to minimise the criterion by assigning n observations to k clusters in such a way that within each cluster, the variance between the observations and the cluster mean is minimised.

The variables were selected by testing the between-group difference using χ2 test or one way analysis of variance. Along with other variables chosen by expert criteria, 12 variables were included in the global severity formula.11 The variables included in the global severity formula are listed in box 1.

Box 1

Variables in the global severity formula


(1) Clinical characteristics of the bipolar disorder (BD)

Number of hospitalisations (HospN)

Number of suicide attempts (SuicAttN)

Comorbid personality disorder (ComPD)

(2) Physical health

Body mass index (BMI)

Metabolic syndrome (MetS)

Number of comorbid physical illnesses (IllnessN)

(3) Cognition

Screen for Cognitive Impairment in Psychiatry score (SCIPTr4 )

(4) Real-world functioning

Permanently disabled due to BD (PD ×BD)

Functioning Assessment Short Test Total Score (FASTT )

Functioning Assessment Short Test Leisure Time Subscale Score (FASTleisure )

(5) Health-related quality of life

SF-36 Physical Functioning Scale Score (SFPF )

SF-36 Mental Health Scale Score (SFMH ).

Since the 12 selected variables can take values from 0 to 1 and all of them have equal weights, the sum of all variables need to be multiplied by 10/12 so that the result could represent the severity from low (0) to high (10).

Embedded Image

The severity clusters were defined by the 5th, 25th, 50th, 75th and 95th percentiles of the score calculated by this formula. Patients can be classified into different clusters by recognising which range (defined by the centiles) their global severity score falls into.

Conclusion and discussion

In this paper we give a brief introduction of two ML methods, SL and USL, through LR and k-means clustering. Examples have been used to show how they can be used in practice. As data structures are getting more and more complicated in mental health studies, we need advanced and flexible methods to analyse the data and to offer precise and personalised treatments for patients. We believe ML as a combination of statistical methods and computer science will play an important role in psychiatry. We plan to introduce some other developments of ML methods in the following issues.


Zhirou Zhou obtained her BEc in Economic Statistics from Beijing University of Technology in 2018. She is currently a masters student in Statistics in the Department of Biostatistics and Computational Biology at the University of Rochester Medical Center. Her research interests include variable selection and causal inference.

Embedded Image


  • Correction notice This article has been corrected since it was first published. The second equation under the section heading 'Unsupervised Learning' was missing an end parenthesis. This has since been updated.

  • Contributors ZZ and T-CW: collected the data and wrote the draft. BW and HW: reviewed and revised the draft. XMT: reviewed the article. CF: proposed the topic and reviewed the final draft.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Provenance and peer review Commissioned; internally peer reviewed.