AG 11 (Short session): Big data: new opportunities and challenges in language acquisition research

Session organizers

Christina Bergmann & Alex Cristia (Laboratoire de Sciences Cognitives et Psycholinguistique, Paris) & Sho Tsuji (RIKEN)

External website




Language acquisition research on topics ranging from phonological processing to semantic knowledge has been built on meticulous examination of small data sets, such as case studies. While we have learned a lot from such careful work, some limitations quickly became evident. Most saliently, each child follows a unique path, and since only a handful of children represent each language, it can be difficult to separate individual quirks from language-specific, but more subtle, trends. A new horizon has opened as bigger, open data sets began to emerge. Early examples are the CHILDES and the MCDI projects, standardized tools for, respectively, spontaneous conversations, and the receptive and productive lexicon. While these tools provide a common platform, they are as yet limited to only a few, welldocumented languages. Today, new technologies pave the way to some much needed crosslinguistic extensions, for instance, via automatic annotation of daylong audio (video) recordings in seldom described languages. It also becomes possible to include other linguistic levels (e.g., receptive knowledge through open repositories of experimental results).

The present workshop will provide a platform for language acquisition researchers to assess the progress towards high quality, big, and open data sets, and to discuss solutions for current challenges. Researchers who have collected large linguistic data sets from infants and children will discuss current insights and perspectives. In addition to presenting results from their own research, they will discuss challenges (such as data standardization, anonymization, barriers to data sharing), and consider how to facilitate crosslinguistic extensions.

Deadline: 15.8.2014


  14:00 Katherine Demuth (Macquarie U)  
    The challenges and rewards of Big Data approaches to language acquisition research  
  15:00 Rémi Lavalley & Kay Berkling (Cooperative State U Karlsruhe)  
    A hyper-linked web-interface to spellings of words written by children of different ages and school-types  
  15:30 Thomas Hills & Hanako Yoshida (U of Warwick)  
    Using semantic structure to understand language learning in monolinguals and bilinguals  
  16:00 Break  
  16:30 Michelle McGillion1, Jane Herbert1, Julian Pine2, & Danielle Matthews1 (1University of Sheffield, UK, 2 University of Liverpool, UK)  
    A RCT to test the causal role of caregiver contingent talk in infant language learning  
  17:00 Nienke Dijkstra1,2, Titia Benders1,3 & Paula Fikkert1 (1Centre for Language Studies, Radboud University Nijmegen, Niederlande,2International Max Planck Research School for Language Sciences, Nijmegen, Niederlande, 3School of Psychology, University of Newcastle, Australien)  
    How to handle a big set of baby language data  
  17:30 Sho Tsuji1, Christina Bergmann2, & Alejandrina Cristia2 (1 RIKEN Brain Sciences Institute, Wako, Japan2 Laboratoire de Sciences Cognitives et Psycholinguistique, EHESS, DEC-ENS, CNRS, Paris, Frankreich)  
    Looking back to move forward: community-augmented meta-analyses  
  18:00 Final discussion  

Deviation from the original schedule

The talk by Eisenbeiss et al. was cancelled.