Introduction
Happy people tend to live longer and have better physical and mental health. Adolescence is a critical period since some results suggest that positive youth development can improve long-term health.1 Furthermore, adolescent depression was a strong predictor of mental disorders during adulthood.2 For example, many investigators had reported that undergraduate students suffered from depression and are vulnerable to suicide attempt and completed suicide.3–8 In a meta-analysis, Ibrahim et al also concluded that undergraduate students were more prone to depression with high prevalence.9 Therefore, robust identification of unhappy students is critical to develop and apply specific interventions to at-risk individuals. So far, traditional approaches adopted single self-report scale such as Centre for Epidemiologic Studies Depression Scale (CES-D), Satisfaction with Life Scale (SWL) and Positive and Negative Affect Schedule (PANAS), which are not reliable since SWB was multifaceted.10 Indeed, SWB contains many dimensions such as life-satisfaction, positive emotion and negative emotion.11 For these reasons, identifying unhappy students required multivariate approaches to adequately circumscribe the multifaceted construct of SWB.
As a multivariable big-data problem,12 machine learning can provide SWB problem with solutions that would outperform classical method. As such, previous studies had applied machine learning approaches to predict SWB. For example, Bogomolov et al used machine learning to predict SWB by using real-world and online data from mobile phone.13 14 Saputri and Lee adopted the same method to predict country SWB15 and Jatupaiboon et al used electroencephalogram to train model.16 These studies showed that machine learning could predict SWB better than single scale measurements. However, these previous studies focused on adult population and their application in terms of preventive strategies towards mental health was limited. Moreover, recent learning approaches, such as ensemble methods, had shown improved classification accuracies.
Ensemble methods had been widely adopted recently because of its good performance. The general idea of ‘ensemble methods’ was essentially based on constructing a set of simple classifiers and combining them. Final decisions were given by weighted or unweighted votes from each simple classifier, which contributes to model accuracy.17 One of the most representative ensemble methods was gradient boosting algorithm. It combined a set of simple classifiers. Each of them performed on data with one distribution. Those weak classifiers generated one strong classifier which can achieve higher accuracy than other simple ones.18 Finally, their performance would be improved. Gradient boosting algorithm had many advantages. First, it was insensitive to data with non-normal distributions and outliers. Additionally, we did not have any a priori hypotheses about input variables, which should be considered by the boosting algorithm. This algorithm was also robust against the addition of irrelevant input variables due to trees’ attribute.19
Including both psychological and physiological parameters, we can take advantages of the gradient boosting algorithm to predict undergraduates’ SWB with satisfying accuracy.