Providence Corpus

Katherine Demuth
Macquarie University


Participants: 6
Type of Study: naturalistic
Location: Providence, RI
Media type: video
DOI: doi:10.21415/T5R30X

Citation information

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

NameAge RangeSessionsSex
William1;04.12 - 3;04.1844M

Katherine Demuth and her research assistants at Brown University compiled the Providence corpus from 2002-2005. The corpus contains longitudinal audio/video recordings of 6 monolingual English-speaking children’s language development from 1-3 years during spontaneous interactions with their parents (usually their mothers) at home. The aim of the study was to provide a corpus of phonetically transcribed data, with linked acoustic files, for the purpose of studying early phonological and morphological development. The participants included 3 boys (Alex, Ethan, William) and 3 girls (Lily, Naima, Violet). Each child was recorded for 1 hour every 2 weeks beginning at the onset of first words. Two of the children have denser corpora, with weekly recordings from 1;3-2;10 (Naima) and 2;0-3;0 (Lily). The three girls (Lily, Naima, and Violet) were also recorded monthly from 3-4 years. The total corpus consists of 364 hours of speech. Audio is available for all children and video is available for all children except Ethan, who was diagnosed with Aspergers Syndrome at the age of 5. Both adult and child utterances were orthographically transcribed using CLAN conventions, with the audio/video files linked. Trained transcribers then carried out a broad phonemic (SAMPA > Unicode) transcription of the child utterances. A second trained coder then retranscribed 10% of each recording. Reliability scores ranged from 80%-98% (discounting voicing errors) on this second segment.


Collection and transcription of the Providence Corpus (and the similar Lyon Corpus for French) was supported by NIMH grant #1ROIMH60922.