Providence Corpus

Katherine Demuth
Macquarie University


Participants: 6
Type of Study: naturalistic
Location: Providence, RI
Media type: video
DOI: doi:10.21415/T5R30X

Browsable transcripts

Phon data

CHAT data

Link to media folder

Citation information

Börschinger, Benjamin, Johnson, Mark, & Demuth, Katherine. 2013. A Joint Model of Word Segmentation and Phonological Variation for English Word-final /t/-Deletion. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Vol 1: Long Papers, 1508–1516, Sofia, Bulgaria.

Song, Jae Yung, Demuth, Katherine, Evans, Karen, & Shattuck-Hufnagel, Stefanie. 2013. Durational Cues to Fricative Codas in 2-year-olds’ American English: Voicing and Morphemic Factors. Journal of the Acoustical Society of America, 133: 2931-2946.

Song, Jae Yung, Demuth, Katherine, & Shattuck-Hufnagel, Stefanie. 2012. The Development of Acoustic Cues to Coda Contrasts in Young Children Learning American English. Journal of the Acoustical Society of America, 131(4): 3036-3050.

Evans, Karen, & Demuth, Katherine. 2012. Individual Differences in Pronoun Reversal: Evidence from two Longitudinal Case Studies. Journal of Child Language, 39: 162-191.

Song, Jae Yung, Sundara, Megha, & Demuth, Katherine. 2009. Phonological Constraints on Children’s Production of English Third Person Singular -s. Journal of Speech, Language, and Hearing Research, 52(3): 623-642.

Demuth, Katherine & McCullough, Elizabeth. 2009. The Prosodic (re)Organization of Children’s early English Articles. Journal of Child Language, 36: 173-200.

Demuth, Katherine, Jennifer Culbertson, & Jennifer Alter. 2006. Word-minimality, Epenthesis, and Coda Licensing in the Acquisition of English. Language & Speech, 49, 137-174.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

NameAge RangeSessionsSex
William1;04.12 - 3;04.1844M

Katherine Demuth and her research assistants at Brown University compiled the Providence corpus from 2002-2005. The corpus contains longitudinal audio/video recordings of 6 monolingual English-speaking children’s language development from 1-3 years during spontaneous interactions with their parents (usually their mothers) at home. The aim of the study was to provide a corpus of phonetically transcribed data, with linked acoustic files, for the purpose of studying early phonological and morphological development. The participants included 3 boys (Alex, Ethan, William) and 3 girls (Lily, Naima, Violet). Each child was recorded for 1 hour every 2 weeks beginning at the onset of first words. Two of the children have denser corpora, with weekly recordings from 1;3-2;10 (Naima) and 2;0-3;0 (Lily). The three girls (Lily, Naima, and Violet) were also recorded monthly from 3-4 years. The total corpus consists of 364 hours of speech. Audio is available for all children and video is available for all children except Ethan, who was diagnosed with Aspergers Syndrome at the age of 5. Both adult and child utterances were orthographically transcribed using CLAN conventions, with the audio/video files linked. Trained transcribers then carried out a broad phonemic (SAMPA > Unicode) transcription of the child utterances. A second trained coder then retranscribed 10% of each recording. Reliability scores ranged from 80%-98% (discounting voicing errors) on this second segment.


Collection and transcription of the Providence Corpus (and the similar Lyon Corpus for French) was supported by NIMH grant #1ROIMH60922.