Table 2

Description of nine speech databases and their subsidiaries

RankCorpusYearLanguageAverage ageSubjects, nSpeech length (hour)LocationMedia typeDescriptionCitations, n
1AphasiaBank12170
Cantonese2016Cantonese7/2Hong Kong, ChinaVideoNative Cantonese speakers with stroke-induced aphasia
Croatian2017Croatian10/10Zagreb, CroatiaVideoNative Croatian speakers with stroke-induced aphasia
French2016French11/14FranceAudioNative French speakers with aphasia
Italian2011Italian10USAVideoNative Italian speakers with aphasia
Mandarin2015Mandarin459ChinaVideoAll patients with Mandarin as L1 and the aetiology is cerebral vascular accident (CVA)
Spanish2011Spanish4USAVideoCommunication impairments by monolingual and bilingual speakers of Spanish and/or English
2WRAPEnglish5464/20030–40USAAudioConnected speech problems of patients with dementia168
3Orozco-Arroyave Database2014Spanish62; 6050/50>150SpainAudioSpeech recordings of patients with Parkinson’s disease and healthy controls57
4DementiaBank70–8017
English Holland2016English68; 722USAVideoIndividuals with Alzheimer's disease—language tasks from a Telerounds presentation
English Kempler2016English816USAAudioIndividuals with Alzheimer's disease—conversation and Cookie-Theft picture descriptions
English Pitt2016English208/104USAAudioDementia and control data for four language tasks from a large longitudinal study
English PPA DePaul2016English661USAVideoIndividual with primary progressive aphasia longitudinal data
English PPA Hopkins2016English36USAAudioIndividuals with primary progressive aphasia data
German PPA2016EnglishGermanyAudioPrimary progressive aphasia data
Mandarin_Lu2016Mandarin52ChinaAudioIndividuals with dementia data
Spanish PerLA2012Spanish21SpainAudioIndividuals with Alzheimer's disease and dementia data
Taiwanese_Lu2016Taiwanese16ChinaAudioIndividuals with dementia
5Cambridge Cookie-Theft Corpus2010English5487/22741.5CambridgeAudio, brain scansIndividuals who have suffered from brain injury given the language task of picture description9
6CoDAS2006Dutch5460.5Netherlands and FlandersAudioA pilot study of six aphasic speakers with two levels of annotation: an orthographic-phonetic transcription and a part-of-speech (POS) tagging2
7GREECAD2016Greek5572/281AthensAudioAn annotated Greek Corpus of Aphasic Discourse1
8FluencyBank461
POLER2013English725/25Washington, DCAudioChildren with epilepsy and controls
IISRP2005English4100/50USAAudioSeminal study of children who stutter with controls
Ratner2012English323/15USAAudioChildren who stutter and controls
Voices2006English4212USAVideoInterviews from the Voices of Stuttering project
Ulm1997German694UlmAudioChildren who stutter from Ulm
9DAIC-WOZ2014English50USAVideoAnxiety, depression and post-traumatic stress disorder in University of Southern California124
  • Year denotes the establishment of the speech database.

  • n refers to the number of subjects and citations.

  • – denotes information not available or the project is still ongoing with an increasing number of subjects.

  • / denotes the number of patients proportioned to the number of controls.

  • CoDAS, Corpus of Dutch Aphasic Speech; DAIC-WOZ, Distress Analysis Interview Corpus-Wizard of Oz; GREECAD, Greek Corpus of Aphasic Discourse; POLER, Plasticity of Language in Epilepsy Research; WRAP, Wisconsin Registry for Alzheimer’s Prevention.