Lyon Corpus

Katherine Demuth
Macquarie University


Harriet Jisa
Dynamique du Langage
Université Lumiére Lyon 2


Participants: 4
Type of Study: naturalistic
Location: France
Media type: video
DOI: doi:10.21415/T5M02D

Browsable transcripts

Phon data

CHAT data

Link to media folder

Citation information

Demuth, K. & A. Tremblay (2008). Prosodically-conditioned variability in children's production of French determiners. Journal of Child Language, 35, 99-127.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

The Lyon Corpus was compiled by Harriet Jisa and research assistants at the University of Lyon 2 from 2002-2005. The corpus contains longitudinal audio/video recordings of 5 monolingual French-speaking children’s language development from 1-3 years during spontaneous interactions with their mothers at home. (Two additional children have been recorded, and transcription is currently in progress). The aim of the study was to provide a corpus of phonetically transcribed data, with linked acoustic files, for the purpose of studying early phonological and morphological development.


The participants who have been fully transcribed included 2 boys (Theotime, Nathan) and 3 girls (Marie, Marilyn, and Anaïs). Each child was recorded for 1 hour every 2 weeks beginning at the onset of first words between the ages of 1-3. The currently transcribed corpus consists of 185 hours of speech.


Both adult and child utterances were orthographically transcribed using CLAN conventions, with the audio/video files linked. Trained transcribers then carried out a broad phonemic (SAMPA > Unicode) transcription of the child utterances. Ten percent of each recording was then retranscribed by a second trained coder. Segmental reliability scores ranged from 90%-98%. Missing video for Lyon include: NAT18a NAT18b NAT47b.

Problems with the Video

Anais has extra video at the end for these: ana08b 6:50, ana12b 14:56, ana 17b (a few seconds), ana19b 7:30, ana25b (a few seconds), ana26b 1:50.

Marie has extra video at the end for these: mar11b 2:25, mar19b 3:55, mar21b 3:00, mar23b4:14, mar25b 0:15, mar26b 5:18, mar37 1:14
Also mar20a begins with 11 seconds from another child.

Marilyn has partially transcribed files from after age 1;06.13 with 23 additional fully untranscribed files from before that age. In the transcripts, Marilyn has the name Marie. This is a bit confusing, since there is also another child called Marie in the Marie folder. Marilyn's date of birth is 28-FEB-2001.

Nathan is missing two videos: nat36b and nat43b. Also, these Nathan files have extra video at the end: nat01b 0:07, nat02b 0:37, nat03b 2:24, nat04b 3:27, nat08b 1:15, nat10b 2:00, nat11b 3:49, nat13b 8:44, nat14b 0:57, nat15b 5:10, nat17b 1:46, nat19b 1:26, nat35b 0:02, nat38b 2:40, nat40b 0:12, nat41b 3:50, nat42b 0:10.


Collection and transcription of the Lyon Corpus (and the similar Providence Corpus for English) was supported by funding from NIH R01MH60922 (to Katherine Demuth), two grants from Action Concertée Inititative (Terrains, techniques et théories et Internationalisation des sciences humaines et sociales), as well as support from the Délégation générale à la langue française and the Ministère de l'Enseignement Supérieur et de la Recherche.