Volume 1 Issue 9 - October 19, 2007
How do We Talk? A Close Look into the Intricacies of the Word Production Factory
Jenn-Yeu Chen

Institute of Cognitive Science
  • Chen, J.-Y.*, & Dell, G. S. (2006). Word-form encoding in Chinese speech production. In P. Li, L. H. Tan, E. Bates, & O. J. L. Tzeng (Eds.), Handbook of East Asian Psycholinguistics (Vol. 1: Chinese) (pp. 165-174). Cambridge, UK: Cambridge University Press.
  • Chen, T.-M., & Chen, J.-Y.* (2006). Morphological Encoding in the Production of Compound Words in Mandarin Chinese. Journal of Memory and Language, 54, 491-514.
  • Chen, J.-Y.*, Chen, T.-M., & Dell, G. S. (2002). Word-form encoding in Mandarin Chinese as assessed by the implicit priming task. Journal of Memory and Language, 46, 751-781.

Most of us spend a good chunk of a day’s time talking. We do this ever since we pass the first year of our life. We talk very fast, producing an average of 150 words per minute, or two and half words per second. We also talk very reliably, making only one mistake every 1000 words spoken. We talk with such an ease that we don’t think much of it. Yet, producing even just one word involves a cascade of processes that are controlled by the most complex and delicate factory ever existing in the world we know. Scientists (linguists, psychologists, and neural scientists) have begun to understand how such a factory is designed and how it operates.

Figure 1. A model of single word production (from Levelt, Roelofs, and Meyers, 1999, Figure 1).
A widely cited theory postulates six divisions in this factory, which work in a serial fashion to produce the kind of sounds we hear as speech (Figure 1). The first division, called the conceptualization division, is responsible for putting together a concept that has a lexical match. For example, to express the concept of {FATHER}, the conceptualization division must make sure that the concept is specific enough and an appropriate word exists for it. In this example, is an appropriate word for the concept, but is not. The latter involves two concepts which require two words. Strictly speaking, {MALE PARENT} and {FATHER} are not identical concepts, even though they may be considered synonymous for every practical purpose.

The next division, called the lemma selection division, is in charge of selecting the word <father> from the mental dictionary which matches the concept specified in the last division. A lemma is a pointer that instructs the next division as to what forms to retrieve for the word. It also carries the syntactic properties of the word (e.g., syntactic category, number, tense, gender), which allow the factory to build a sentence structure if a sentence is to be produced.

The third division, called the word-form encoding division, consists of two sub-divisions. The morphological encoding sub-division uncovers the morphological structure of the word. The phonological encoding sub-division retrieves the phonological contents (sound segments) of the word. The latter also works to assemble the sound segments into syllable-sized units. These units are used by the phonetic encoding division to find and retrieve the gestural scores (or motor programs) from the memory. The last division, the articulatory division, executes the motor programs to give rise to the kind of sounds we hear in a word.

The above theory, proposed by Willem Levelt and his students of Max Planck Institute for Psycholinguistics in Nijmegen, the Netherlands, was developed solely based on Dutch, a language which belongs to the same language family as English. In the past few years, my laboratory has been testing whether the Levelt theory can generalize to Chinese. There are good reasons to believe that the generality of the theory must be limited because the linguistic design characteristics of Chinese are very different from those of English or Dutch. Our work started with the linguistic unit called syllable.

A syllable consists of a vowel (V) surrounded by optional consonants. ‘I’ is a V syllable, ‘he’ is a CV syllable, ‘eat’ is a VC syllable, and ‘heat’ is a CVC syllable. A syllable can form a word, but a word is usually made up of more than a syllable. A question we ask is how the phonological contents of a word is organized and stored in the mental dictionary. Assuming that the sound segments (i.e., consonants and vowels) are the basic building blocks of a word, one possibility is that only the sound segments are stored. Another possibility is that the sound segments are organized into syllables, which are then organized into word. On this latter view, syllables are part of the stored phonological contents of a word.

The Levelt theory adopts the first view, which we will call the segmental view to contrast it with the syllable view. The design characteristics of the English and the Dutch languages provide justifications for the segmental view. In these languages, there are more than 10,000 different types of syllables. Syllable boundaries are sometimes ambiguous. For example, does /l/ in belong to the first syllable /syl/ or the second syllable /la/, or both? Resyllabification is common in connected speech that involves more than one word. For example, /hit-it/ becomes /hi-tit/. These factors put in doubt the plausibility of storing the syllables of a word in the mental dictionary. After all, if the syllables of a word are going to be torn apart and reassembled, what’s the point of storing them? Instead, the Levelt theory proposes that the syllables are constructed on-line when the segments are being assembled during word production.

However, the segmental view may not be adequate when applied to Chinese. There are only 1200 different types of syllables (400 if tones are disregarded) in Chinese, and the majority of them are open syllables with no consonantal coda (i.e., CV, but not CVC). Syllable boundaries are always clear and fixed. Most importantly, resyllabification is prohibited. Thus, /fan1-an4/ cannot be resyllabified and spoken as /fa1-nan4/. Tian-an-men spoken as Tia-nan-men would be considered foreign. With these design characteristics, it may be more plausible to postulate stored syllables in Chinese than in English.

Figure 2. An illustration of the subtle difference in the word form encoding division of the word production factory in English/Dutch and in Chinese.
Our research in the past few years has obtained critical evidence that supports the stored-syllable view in Chinese word production. Our research also discovered that another linguistic unit, called morpheme, played a very different role in Chinese and Dutch/English. In English, many words can be broken down into smaller units which bear meanings. For example, the word ‘resyllabified’ can be decomposed into four morphemes [re-], [syllable], [-ify], and [-ed]. In Chinese, ‘gung1-yuan2’ (park) consists of two morphemes: [gung1] and [yuan2]. Our research shows that the English word production factory operates by retrieving the morphemes of a word before it goes on to retrieve the sound segments, whereas the Chinese word production factory seems to bypass the morphemes. We believe the word form encoding division of the production factory requires a higher-order unit to control the retrieval and assembly of the sound segments of a word. The higher-order unit is morpheme in English/Dutch, and is syllable in Chinese (Figure 2). Thus, although speakers are all endowed with an efficient production factory, the detailed design of the factory is different in English and in Chinese in a very subtle and important way.

< previousnext >