Background Reproducibility is a cornerstone of scientific advancement; however, many published works may lack the core components needed for study reproducibility.
Aims In this study, we evaluate the state of transparency and reproducibility in the field of psychiatry using specific indicators as proxies for these practices.
Methods An increasing number of publications have investigated indicators of reproducibility, including research by Harwicke et al, from which we based the methodology for our observational, cross-sectional study. From a random 5-year sample of 300 publications in PubMed-indexed psychiatry journals, two researchers extracted data in a duplicate, blinded fashion using a piloted Google form. The publications were examined for indicators of reproducibility and transparency, which included availability of: materials, data, protocol, analysis script, open-access, conflict of interest, funding and online preregistration.
Results This study ultimately evaluated 296 randomly-selected publications with a 3.20 median impact factor. Only 107 were available online. Most primary authors originated from USA, UK and the Netherlands. The top three publication types were cohort studies, surveys and clinical trials. Regarding indicators of reproducibility, 17 publications gave access to necessary materials, four provided in-depth protocol and one contained raw data required to reproduce the outcomes. One publication offered its analysis script on request; four provided a protocol availability statement. Only 107 publications were publicly available: 13 were registered in online repositories and four, ten and eight publications included their hypothesis, methods and analysis, respectively. Conflict of interest was addressed by 177 and reported by 31 publications. Of 185 publications with a funding statement, 153 publications were funded and 32 were unfunded.
Conclusions Currently, Psychiatry research has significant potential to improve adherence to reproducibility and transparency practices. Thus, this study presents a reference point for the state of reproducibility and transparency in Psychiatry literature. Future assessments are recommended to evaluate and encourage progress.
- retrospective studies
- sample size
- Sampling studies
- research design
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Reproducibility is a cornerstone of scientific advancement1; however, many published works lack the core components needed for reproducibility and transparency. These barriers to reproducibility have presented serious immediate and long-term consequences for psychiatry, including poor credibility, reliability and accessibility.2 Fortunately, methods to improve reproducibility are practical and applicable to many research designs. For example, preregistration of studies provides public access to the protocol and analysis plan. Reproducibility promotes independent verification of results2 and successful replication,2 3 and it hedges against outcome switching.4 Supporting this need in the field of psychology, the Open Science Collaboration’s reproducibility project was an attempt to replicate the findings of 100 experimental and correlated studies published in three leading psychology journals. Researchers found that 97% of the original reports had statistically significant results, whereas only 37% of the replicated studies had significant results.5 With regard to outcome switching, a recent survey of 154 researchers investigating electrical brain stimulation found that less than half were able to replicate previous study findings. These researchers also admitted to selective reporting of study outcomes (41%), adjusting statistical analysis to alter results (43%) and adjusting their own statistical measurements to support certain outcomes.6 Leveraging good statistical practices and using methods that promote reproducibility, such as preregistration, are necessary to protect against similar incidents of selective reporting.7
Considerable advancements have been noted to promote and endorse reproducible and transparent research practices in the field of psychology. For example, the Centre for Open Science, the Berkeley Institute for Transparency in the Social Sciences and the Society for the Improvement of Psychological Science have all worked vigorously to establish a culture of transparency and a system of reproducible research practices. However, mental health researchers, including psychiatry researchers, have not kept pace with their psychology counterparts.8 A few editorials have circulated to promote awareness of reproducibility and transparency within psychiatric literature.7–9 For example, The Lancet Psychiatry published an editorial addressing various topics of reproducibility, such as increasing the availability of materials, protocols, analysis scripts and raw data within online repositories. The editorial author argued that few limitations exist within psychiatry that would impede depositing study materials in a public repository. The author additionally countered a commonly held position that raw patient data should not be made available, noting that the use of appropriate de-identification can make the information anonymous.8 A second editorial published in JAMA Psychiatry7 argued for more robust statistical analysis and decision making to improve the reproducibility of psychiatry studies. The author of this editorial discussed several statistical considerations, including the effects of statistical assumption violations on validity and study power, the likelihood of spurious findings based on small sample sizes, a priori covariate selection, effect size reporting and cross-validation. These efforts are good first steps to create awareness of the problem, which was deemed a ‘reproducibility crisis’ by over 1000 scientists in a recent Nature survey10 ; however, further measures are needed. A top-down approach to evaluating transparency and reproducibility would provide valuable information about the current state of the psychiatry literature. In our study, we examined a random sample of publications from psychiatric literature for evaluating specific indicators of reproducibility and transparency within the field. Our results may be used both to evaluate for current strengths and limitations and to serve as baseline data for subsequent investigations.
We conducted an observational study using a cross-sectional design based on methodology by Hardwicke et al.2 Our study is reported in accordance with guidelines for meta-epidemiological methodology research.11 We have made available protocols, materials and other pertinent information on Open Science Framework (https://osf.io/n4yh5/). This study was not subject to institutional review board oversight because it did not include human participants.
Journal and study selection
We used the National Library of Medicine (NLM) catalogue to search for all journals, using the subject terms tag Psychiatry[ST]. This search was performed on May 29, 2019 by DT. The inclusion criteria required that journals were in English and also MEDLINE indexed. The list of journals in the NLM catalogue was then extracted along with their electronic ISSN (or linking ISSN if electronic is unavailable). The final ISSN search string was used to search PubMed to identify all publications between January 1, 2014, and December 31, 2018. DT then compiled a random sample of 300 publications from selected journals.
Prior to data extraction, two investigators (CES and JZP) underwent a full day of training to ensure inter-rater reliability. The training included a review of the study design, protocol, extraction form and the identification of information from two publications selected by DT. The two investigators were given three articles from which to extract data as examples. Following extraction, the pair reconciled all differences. The training session was recorded and listed online for reference (https://osf.io/jczx5/). Prior to extracting data from all studies, these two investigators extracted data from the first 10 publications from their specialty list. Discrepancies were resolved by discussion between the investigators.
Two investigators (CES, JZP) extracted data from the 300 publications in a duplicate and blinded fashion. Following extraction, a final consensus meeting was held by the pair to resolve disagreements. A third investigator (DT) was available for adjudication, but this process was not necessary. A pilot-tested Google form was created based on a study by Hardwicke et al,2 with additions. This form included the indicators of reproducibility and transparency (https://osf.io/3nfa5/) and items related to study characteristics. This assessment of reproducibility and transparency was developed according to key indicators in Hardwicke et al in addition to other indicators relevant to promoting transparent, collaborative, reproducible research. The indicators examined were: public accessibility, funding, conflict of interest, citation frequency and any statements for protocol, materials, data availability and preregistration. (See online supplementary table A for specific quantity and percentage). Further explanation of the frequency, value and relevance of these indicators of transparency and reproducibility is organised in online supplementary table B. The extracted data varied according to the study design, with studies that had no empirical data being excluded (eg, editorials, commentaries (without reanalysis), simulations, news, reviews and poems). We also expanded the study design options to include cohort, case series, secondary analysis, chart reviews and cross-sectional studies. Finally, we used the following funding categories: university, hospital, public, private/industry, or non-profit.
Open access availability
We searched Open Access Button (https://openaccessbutton.org) to assess whether studies were available by open access. If Open Access Button was unable to access the article, then two investigators (CES, JZP) used the publication title and DOI to search Google or PubMed to find whether the full text version was publicly available.
Replication and evidence synthesis
We used the Web of Science (https://www.webofknowledge.com/) to determine whether the studies composing our sample were replication studies or included in systematic reviews. Web of Science was used to easily determine the number and type of other studies that cited each publication we examined. Articles unavailable on Web of Science were located and examined via PubMed or other resources. We determined if each publication was a replication study based on if the research was conducted to replicate aspects of a prior study’s design or findings. To do so, we reviewed each publication that had cited the studies included in our sample using Web of Science’s citation listing feature. We performed this process in the same manner as data extraction, described previously.
We report descriptive statistics for each category along with 95% CIs of proportions, calculated using Microsoft Excel.
Our search of the NLM catalogue identified 346 journals, with only 158 meeting the inclusion criteria (figure 1). The median 5-year impact factor for these journals was 3.20 (IQR: 2.2–4.2). Our PubMed search returned 407 656 studies initially, and this number was reduced to 90 281 after we applied the date limiter. From these search returns, 300 psychiatry research publications were randomly selected. Four were inaccessible, yielding a final sample size of 296 publications. The majority of the 296 publications had a primary author from USA (155, 52%), UK (86, 29%) and the Netherlands (32, 11%). The top three publications types were cohort studies (46, 16%), surveys (45, 15%) and clinical trials (36, 12%). With regard to accessibility, 107 (36%) of the 296 publications were publicly available, whereas the additional 189 publications (64%) were only available behind a paywall. Remaining sample characteristics are displayed in table 1 and online supplementary table A.
Factors for reproducibility include the availability of materials, data, protocol, analysis script and preregistration. (See online supplementary table B detailing the relevance and value of each factor). Of the 296 publications, 185 were analysed for a materials availability statement and 211 were analysed for a data availability, protocol availability, analysis script availability statement and preregistration statement (figure 1). These differences were the result of excluding particular study designs from certain analyses, such as excluding case studies from preregistration. Of the 185 publications analysed for a materials availability statement, 22 (12% (95% CI: 8.2% to 16%]) had a materials availability statement, yet only 17 provided an accessible materials document (table 2). Only 14 (6.6% (95% CI: 3.8% to 6.6%)) of the 211 publications provided a data availability statement, with just one study including all the raw data necessary to reproduce its findings. Only four of the 211 publications (1.9% (95% CI: 0.4% to 3.4%)) provided a protocol availability statement, and a single publication (0.47% (95% CI: 0% to 1.3%)) stated that its analysis script was available on request (table 2). Of the 211 publications for which preregistration was analysed, only 13 (6.2% (95% CI: 3.4% to 8.9%)) included a statement that the study was registered in publicly accessible repositories (table 2). All 13 publications were accessible; four (31% (95% CI: 26% to 36%)) included their hypothesis, 10 (77% (95% CI: 72% to 82%)) included their methods and eight (62% (95% CI: 56% to 67%)) included their analysis plan.
Conflict of interest and funding
All 296 publications were included in the conflict of interest and funding source analysis (figure 1). Of these studies, 177 (60% (95% CI: 41% to 69%)) included a conflict of interest statement, with 10% reporting a conflict of interest. With regard to funding, 185 (63% (95% CI: 43% to 82%)) of the 296 articles had a funding statement. Of the 296 publications, 153 (52% (95% CI: 36% to 68%)) were funded and 32 (11% (95% CI: 7.3% to 14%)) did not receive funding. The majority (72, 24%) of funding came from public sources. Additional results are presented in table 2.
Replication and evidence synthesis
Of the 296 publications, 211 were analysed for being a replication study, and 201 were analysed to determine how many had been cited in a meta-analysis or systematic review (figure 1). Four (1.9% (95% CI: 0.4% to 3.4%)) were identified as a replication study, and 82 (41% (95% CI: 30% to 51%)) were cited in at least one systematic review or meta-analysis (table 2).
Our results demonstrated that the majority of publications within psychiatry literature lack the necessary materials, raw data and detailed protocols to be easily reproducible. These findings are concerning, given the critical need for reproducible and transparent scientific research. In this section, we outlined a few of the issues causing concern and offer suggestions to improve this disparity between standards of research and current practices.
To begin, we found that only 13 publications had a statement about preregistration. Preregistration allows for independent evaluation of the consistency between the registered plan and what was actually performed in the study. Selective reporting bias—upgrading, downgrading, removing, or adding study outcomes based on statistically significant findings—is particularly problematic. Comparisons between preregistration documents and published reports enable independent researchers to determine whether this form of bias has likely occurred. Multiple studies indicate that selective reporting bias is a pervasive problem in the medical literature12–16 including psychotherapy trials.17 Scott et al evaluated selective outcome reporting of clinical trials published in The American Journal of Psychiatry, Archives of General Psychiatry/JAMA Psychiatry, Biological Psychiatry, Journal of the American Academy of Child and Adolescent Psychiatry and The Journal of Clinical Psychiatry.18 They found that 28% of trials in their sample showed evidence of selective outcome reporting. As another example, the COMPare project was designed to evaluate all trials published in prestigious general medical journals. After completing evaluations for selective outcome reporting, members of the project drafted letters to the editor requesting clarification for discrepant endpoints. To date, they have identified 354 outcomes that were not reported and 357 outcomes that were silently added across 67 trials.19 To address this type of problem, stricter adherence to preregistration is needed. For example, although the Food and Drug Administration Amendments Act codified into law that all applicable clinical trials should be prospectively registered before trial commencement, penalties for non-compliant investigators have never been enacted.20 Given that this safeguard is already in place, greater enforcement is likely a viable first step toward improvement. Additionally, the International Committee of Medical Journal Editors (ICMJE) mandates that ICMJE-endorsing journals require prospective trial registration as a precondition for publication for all clinical trials.21 However, studies have found that journals do not always enforce registration policies.22 Given that journals are gatekeepers of scientific knowledge and advancement, we advocate for journals adopting mechanisms to enforce their policies. Additional training is also warranted for junior researchers and students who may not be aware of the inherent issues involved in the failure to preregister studies. Responsible conduct of research courses are required for trainees participating in fellowships and training programme funded by the National Institutes of Health. For more established faculty, universities offer modules related to research ethics, human participant protections, data management, informed consent and anonymity. Such courses could likely incorporate training into issues involving preregistration, transparency and reproducibility. Academic conferences offer another avenue for training of all parties regarding open research practices.
Transparency of the methodological process, data collection and data analyses increases the credibility of study findings.8 Thus, access to the complete protocols and materials used to perform a study is imperative for replication attempts. This need is illustrated by the Reproducibility Project in Cancer Biology, which attempted to reproduce 50 landmark studies after concerns were raised by two drug companies regarding replication of cancer study findings.23 Replication of 32 of the 50 studies was abandoned, in large part because methodological details were not available from the original researchers in these published papers.5 In addition, a review of 441 biomedical publications from 2000 to 2014 found that only one study provided a full protocol, and none made all of their raw data available.24 Given the significant deficiency of materials availability in psychiatry, looking to other fields to garner ideas would be suggested. For example, the American Journal of Political Science requires authors of manuscripts accepted for publication to provide sufficient materials to enable other researchers to verify all analytic results reported in the narrative and supporting documents.25 Furthermore, this journal requires the materials of the final draft manuscript to be verified to confirm that the analytic results are reproducible for each study. In this process, both the quantitative and qualitative analyses have verification processes conducted at universities. Following verification, the university staff release the final data for public access, and then final publication can occur.
With regard to strengths, this study included a random sample of the psychiatry literature from a large selection of journals. We used extensive training to ensure inter-rater reliability between investigators. Data were extracted in duplicate and blinded fashion with joint reconciliation to minimise human error. This double data extraction methodology is the gold standard in systematic reviews and is recommended by the Cochrane Handbook for Systematic Reviews of Interventions.26 Furthermore, all relevant study materials have been made available to ensure transparency and reproducibility.
We acknowledge that although our sample size was 50% greater than that of Hardwicke et al,2 it still only represents a small fraction of the published literature. In addition, the current indicators of reproducibility and transparency have not been completely established. We used factors previously identified in social sciences and applied them to psychiatry. Our study findings should also be interpreted in light of our sample, which included only MEDLINE-indexed journals and studies published during a set time period. Differences in indicators might exist in other journals or outside this timeframe.
In conclusion, we stress the importance of adopting transparent and reproducible practices in research. Certainly, if the public lacks trust in science, it could evolve into a lack of trust in clinical practices.27 Lack of transparency is not an unknown issue,1 but when faced with change, we must reform our current practices. This study presents a reference point for the state of reproducibility and transparency in psychiatry literature and future assessments are recommended to evaluate progress.
Caroline E. Sherry is a second year medical student at Oklahoma State University Center for Health Sciences. She graduated from the University of Notre Dame with a double major in Science and Theology, as well as a minor specialising in the science and practice of humanistic, compassionate medicine in 2018. Caroline became a member of the Vassar Research team as a first year medical student, enjoyed dedicated summer research time, and is looking forward to the opportunity to engage in research on rotations.
Presented at OSU-CHS Research Conference 2020
Contributors All authors have contributed substantially to the planning, conduct and reporting of the work described in the article, including, but not limited to; Study design, data acquisition, data analysis, manuscript drafting and final manuscript approval. CES and JZP collaborated on the extraction, validation, organisation, analysis and interpretation of all data. CES was also responsible for team organisation and the manuscript formatting, revision and submission. DT designed methods, compiled the publication list, led data extraction training and assisted with data interpretation and manuscript editing. BKC and AP contributed to data interpretation and writing the introduction and discussion sections. MV provided advisement and leadership in data interpretation, scientific writing and manuscript editing.
Funding This study was funded through the 2019 Presidential Research Fellowship Mentor–Mentee Program at Oklahoma State University Center for Health Sciences.
Competing interests None declared.
Patient consent for publication Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement All protocols, materials and other pertinent information are available on Open Science Framework (https://osf.io/n4yh5/). Comprehensive results are accessible online in Supplementary Tables A and B.