Machine learning methods in psychiatry: a brief introduction ============================================================ * Zhirou Zhou * Tsung-Chin Wu * Bokai Wang * Hongyue Wang * Xin M Tu * Changyong Feng ## Abstract Machine learning (ML) techniques have been widely used to address mental health questions. We discuss two main aspects of ML in psychiatry in this paper, that is, supervised learning and unsupervised learning. Examples are used to illustrate how ML has been implemented in recent mental health research. * models * statistical * psychiatry ## Introduction The development of high technologies has significantly changed the research and treatment methods in psychiatry. Advanced technologies such as social media, smartphones and wearable devices have enabled psychiatric clinicians and researchers to collect a wide range data of subjects/patients within a relatively short period of time to monitor the psychical status of clients or patients,1 and to offer more accurate and personalised treatments. While enjoying the convenience brought to us by the advanced technologies, we are facing the challenge of analysing the large data set generated from them, and making good prediction of some outcomes for a new subject. Unlike the traditional statistical methods which try to find a good fit of the data to interpret the association between the outcome and some potential features, medical researchers and clinicians are interested in the prediction of treatment methods (for example, the dosage of a drug) and treatment outcomes (eg, 5-year survival probability) given a comprehensive measurement of different features of a patient. Machine learning (ML) takes advantage of advanced statistical methods and computer science techniques, and has been implemented to analyse ‘big data’ nowadays.2 The common types of ML techniques used in the psychiatric field include supervised learning (SL) and unsupervised learning (USL).3 SL is used for data type with a labelled response variable. The purpose of SL is to develop a model for which the outcome can be formulated as a function of the features (covariates) so that the model can make a prediction of the outcome in the future when only the features are given. For instance, suppose we are interested in identifying a patient with either major depressive disorder or no depression, based on the measurement of some factors of patients. SL methods try to build a model between the outcome (eg, depression or not) and a series of features, such as age, gender, education background, work type and so on, which are collected from different data sources. Commonly used examples of SL algorithms include logistic regression (LR) and support vector machine (SVM);4 LR was borrowed directly from traditional statistics and SVM was invented by computer scientists.5 We will discuss LR in detail in the next section. USL is applied to data without a labelled outcome.6 The algorithms try to recognise similarities/dissimilarities between subjects through input variables (features) without the aid of a labelled outcome. This is why it is called ‘unsupervised’. One of the most commonly used USL methods is the k-means clustering which minimises within-cluster variances to partition observations into k clusters. The lack of labelling will make USL more challenging, while this could also help to reveal the underlying data structure without a possible prior bias. We will discuss the k-means clustering by concrete example in the later section. ## Supervised learning LR is a widely used statistical method to model the conditional expectation of a binary outcome with the given covariates. It has also been generalised to the case of polytomous outcomes.7 For this reason, it is a natural choice for a classification method in multivariate analysis. After estimating the parameters in the regression function based on the training data, we predict the probabilities that a new subject will be assigned to each category in the outcome by substituting its observed features into the regression function. LR is one of the most popular SL tools in biomedical studies. Recently, Lee *et al* developed an LR model for adolescent suicide attempt prediction using sociodemographic characteristics, risk behaviours and psychological variables.8 This study is based on a sample of 247 222 subjects in the Korea Youth Risk Behavior Web-based Survey. The LR model was used to predict the risk of suicide with 13 different variables selected through univariate analysis screening.9 For simplicity, assume the behaviour of adolescent suicide attempt (a binary outcome variable *Y*, with 1 for ‘Yes’ and 0 for ‘No’) is strongly associated with age (a continuous variable *X* 1), gender (a binary variable *X* 2, with 1 for male and 0 for female), experience of violence (a binary variable *X* 3 with 1 for ‘Yes’ and 0 for ‘No’), feelings of sadness (a binary variable *X* 4, with 1 for ‘Yes’ and 0 for ‘No’) and current alcohol drinking (a binary variable *X* 5, with 1 for ‘Yes’ and 0 for ‘No’) after variable selection. The LR model assumes that the conditional distribution of *Y* given the covariates is of the form ![Formula][1] where ![Formula][2] is called the linear predictor, a linear combination of covariates.10 Suppose we have a new subject with features in table 1, and want to predict the probability of committing suicide. View this table: [Table 1](http://gpsych.bmj.com/content/33/1/e100171/T1) Table 1 Features of a hypothesised subject Then we have, ![Formula][3] ![Formula][4] ![Formula][5] and, ![Formula][6] Therefore, the suggested probability of this person’s suicide was calculated as 3.32%, which belongs to the high-risk group (>0.12%).8 ## Unsupervised learning K-means clustering is a statistical technique that has been used to recognise the patterns of different types or levels of severity of a specific illness, based on related variables with no outcome labels provided. Fuente-Tomas *et al* proposed an easy-to-use, cluster-based severity classification for bipolar disorder (BD) that may help clinicians in the processes of personalised medicine and shared decision-making.11 In this study, 224 subjects with a diagnosis of BD (Diagnostic and Statistical Manual of Mental Disorders, 4th Edition, Text Revision) under ambulatory treatment were classified into five different clusters based on 12 variables from five domains. In their method part, k-means clustering was used to reduce the dimension of four types of variables including patients’ sociodemographic and BD characteristics, psychometric instruments and laboratory results. By the algorithm of k-means clustering, the criterion is calculated by within-point scatter, ![Formula][7] where ![Formula][8] is the mean vector of the *j*th cluster, ![Formula][9] , and ![Formula][10] refers to all clusters that subject i belongs to. The number of clustering k can be found out by the Elbow method,12 a heuristic method that helps interpret the consistency within cluster analysis. K-means clustering aims to minimise the criterion by assigning n observations to k clusters in such a way that within each cluster, the variance between the observations and the cluster mean is minimised. The variables were selected by testing the between-group difference using χ2 test or one way analysis of variance. Along with other variables chosen by expert criteria, 12 variables were included in the global severity formula.11 The variables included in the global severity formula are listed in box 1. Box 1 ### Variables in the global severity formula #### Variables **(1)** **Clinical characteristics of the** bipolar disorder (**BD)** Number of hospitalisations (*HospN*) Number of suicide attempts (*SuicAttN*) Comorbid personality disorder (*ComPD*) **(2)** **Physical health** Body mass index (*BMI*) Metabolic syndrome (*MetS*) Number of comorbid physical illnesses (*IllnessN*) **(3)** **Cognition** Screen for Cognitive Impairment in Psychiatry score (*SCIPTr4 *) **(4)** **Real-world functioning** Permanently disabled due to BD (*PD ×BD*) Functioning Assessment Short Test Total Score (*FASTT *) Functioning Assessment Short Test Leisure Time Subscale Score (*FASTleisure *) **(5)** **Health-related quality of life** SF-36 Physical Functioning Scale Score (*SFPF *) SF-36 Mental Health Scale Score (*SFMH *). Since the 12 selected variables can take values from 0 to 1 and all of them have equal weights, the sum of all variables need to be multiplied by 10/12 so that the result could represent the severity from low (0) to high (10). ![Formula][11] The severity clusters were defined by the 5th, 25th, 50th, 75th and 95th percentiles of the score calculated by this formula. Patients can be classified into different clusters by recognising which range (defined by the centiles) their global severity score falls into. ## Conclusion and discussion In this paper we give a brief introduction of two ML methods, SL and USL, through LR and k-means clustering. Examples have been used to show how they can be used in practice. As data structures are getting more and more complicated in mental health studies, we need advanced and flexible methods to analyse the data and to offer precise and personalised treatments for patients. We believe ML as a combination of statistical methods and computer science will play an important role in psychiatry. We plan to introduce some other developments of ML methods in the following issues. ## Footnotes * Correction notice This article has been corrected since it was first published. The second equation under the section heading 'Unsupervised Learning' was missing an end parenthesis. This has since been updated. * Contributors ZZ and T-CW: collected the data and wrote the draft. BW and HW: reviewed and revised the draft. XMT: reviewed the article. CF: proposed the topic and reviewed the final draft. * Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors. * Competing interests None declared. * Patient consent for publication Not required. * Provenance and peer review Commissioned; internally peer reviewed. * Received November 4, 2019. * Accepted November 5, 2019. * © Author(s) (or their employer(s)) 2020. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. [http://creativecommons.org/licenses/by-nc/4.0/](http://creativecommons.org/licenses/by-nc/4.0/) This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: [http://creativecommons.org/licenses/by-nc/4.0/](http://creativecommons.org/licenses/by-nc/4.0/). ## References 1. Chen M , Mao S , Liu Y . Big data: a survey. Mobile Netw Appl 2014;19:171–209.[doi:10.1007/s11036-013-0489-0](http://dx.doi.org/10.1007/s11036-013-0489-0) [CrossRef](http://gpsych.bmj.com/lookup/external-ref?access_num=10.1007/s11036-013-0489-0&link_type=DOI) 2. Jordan MI , Mitchell TM . Machine learning: trends, perspectives, and prospects. Science 2015;349:255–60.[doi:10.1126/science.aaa8415](http://dx.doi.org/10.1126/science.aaa8415) [Abstract/FREE Full Text](http://gpsych.bmj.com/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzNDkvNjI0NS8yNTUiO3M6NDoiYXRvbSI7czoyNToiL2dwc3ljaC8zMy8xL2UxMDAxNzEuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 3. Cho G , Yim J , Choi Y , et al . Review of machine learning algorithms for diagnosing mental illness. Psychiatry Investig 2019;16:262–9.[doi:10.30773/pi.2018.12.21.2](http://dx.doi.org/10.30773/pi.2018.12.21.2) 4. Bzdok D , Krzywinski M , Altman N . Machine learning: supervised methods. Nat Methods 2018;15:5–6.[doi:10.1038/nmeth.4551](http://dx.doi.org/10.1038/nmeth.4551) 5. Cortes C , Vapnik V . Support-vector networks. Mach Learn 1995;20:273–97.[doi:10.1007/BF00994018](http://dx.doi.org/10.1007/BF00994018) [CrossRef](http://gpsych.bmj.com/lookup/external-ref?access_num=10.1007/BF00994018&link_type=DOI) [Web of Science](http://gpsych.bmj.com/lookup/external-ref?access_num=A1995RX35400003&link_type=ISI) 6. Miotto R , Li L , Kidd BA , et al . Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci Rep 2016;6:26094.[doi:10.1038/srep26094](http://dx.doi.org/10.1038/srep26094) 7. Agresti A . An introduction to categorical data analysis. 3rd edn. New York: Wiley, 2018. 8. Lee J , Jang H , Kim J , et al . Development of a suicide index model in general adolescents using the South Korea 2012–2016 national representative survey data. Sci Rep 2019;9:1846.[doi:10.1038/s41598-019-38886-z](http://dx.doi.org/10.1038/s41598-019-38886-z) 9. Wang H , Peng J , Wang B , et al . Inconsistency between univariate and multiple logistic regressions. Shanghai Arch Psychiatry 2017;29:124–8.[doi:10.11919/j.issn.1002-0829.217031](http://dx.doi.org/10.11919/j.issn.1002-0829.217031) 10. McCullagh P , Nelder JA . Generalized linear models. 2nd edn. Chapman and Hall/CRC, 1989. 11. Fuente-Tomas Ldela , Arranz B , Safont G , et al . Classification of patients with bipolar disorder using k-means clustering. PLoS One 2019;14:e0210314.[doi:10.1371/journal.pone.0210314](http://dx.doi.org/10.1371/journal.pone.0210314) 12. Thorndike RL . Who belongs in the family? Psychometrika 1953;18:267–76.[doi:10.1007/BF02289263](http://dx.doi.org/10.1007/BF02289263) [CrossRef](http://gpsych.bmj.com/lookup/external-ref?access_num=10.1007/BF02289263&link_type=DOI) [Web of Science](http://gpsych.bmj.com/lookup/external-ref?access_num=A1953YB33500001&link_type=ISI) Zhirou Zhou obtained her BEc in Economic Statistics from Beijing University of Technology in 2018. She is currently a masters student in Statistics in the Department of Biostatistics and Computational Biology at the University of Rochester Medical Center. Her research interests include variable selection and causal inference.
![][12] [1]: /embed/mml-math-1.gif [2]: /embed/mml-math-2.gif [3]: /embed/mml-math-3.gif [4]: /embed/mml-math-4.gif [5]: /embed/mml-math-5.gif [6]: /embed/mml-math-6.gif [7]: /embed/mml-math-7.gif [8]: /embed/mml-math-8.gif [9]: /embed/mml-math-9.gif [10]: /embed/mml-math-10.gif [11]: /embed/mml-math-11.gif [12]: /embed/graphic-1.gif