McNeill, D., Bertenthal, B., Cole, J. and Gallagher, S. 2005. Gesture-first, but no gestures? 

Commentary on Michael A. Arbib.  Behavioral and Brain Sciences. 28: 138-39

 

Gesture-first, but no gestures?

David McNeill, Department of Psychology, University of Chicago

Bennett Bertenthal, Department of Psychology, University of Chicago

Jonathan Cole, Clinical Neurophysiology, Poole Hospital, UK

Shaun Gallagher, Department of Philosophy, University of Central Florida


 

ABSTRACT

Abstract: Although Arbib’s extension of the mirror-system hypothesis
neatly sidesteps one problem with the “gesture-first” theory of language
origins, it overlooks the importance of gestures that occur in current-day
human linguistic performance, and this lands it with another problem. We
argue that, instead of gesture-first, a system of combined vocalization and
gestures would have been a more natural evolutionary unit..

Michael Arbib’s extension of the mirror-system hypothesis for explaining the origin of language elegantly sets the stage for further discussion, but we think it overlooks a crucial source of data – the kinds of gestures that actually occur in current human linguistic performance. These data lead us to doubt a basic claim of the “gesture- first” theory, that language started as a gesture language that was gradually supplanted by speech. Arbib has modified this theory with his concept of an expanding spiral, but this new model does not go far enough in representing a speech-gesture system that evolved together. Classic gesture-first. The enduring popularity of “gesturefirst” seems to presuppose that gestures are simple and that as we humans, and language, became more complex, speech evolved and to an extent supplanted gesture, a belief that emerged as part of the Enlightenment quest for the natural state of man and is credited to Condillac, and which has continued since (e.g., Hewes 1973; Armstrong et al. 1995; Corballis 2002). However, contrary to the traditional view, we contend that gesture and language, as they currently exist, belong to a single system of verbalized thinking and communication, and neither can be called the simple twin of the other. It is this system, in which both speech and gesture are crucial, that we should be explaining. It makes little sense to ask which part of an unbroken system is “simpler”; a better question is how the parts work together.

In this system, we find synchrony and coexpressiveness – gesture and speech conveying the same idea unit, at the same time. Gesture and speech exhibit what Wundt described long ago as the “simultaneous” and “sequential” sides of the sentence (Blumenthal 1970, p. 21) and Saussure, in notes recently discovered, termed “l’essence double du langage” (Harris 2002). Double essence, not enhancement, is the relationship, and we do not see how it could have evolved from the supplanting of gestures by speech. In the remainder of this commentary, we summarize three sources of evidence to support this assertion.

Figure 1. Gesture combining upward movement
and interiority. (Computer illustration from a video by Fey
Parrill, University of Chicago).

 

1. Consider the attached drawing (Fig. 1). The speaker was describing a cartoon episode in which one character tries to reach another character by climbing up inside a drainpipe. The speaker is saying, “and he goes up through the pipe this time,” with the gesture occurring during the boldfaced portion (the illustration captures the moment when the speaker says the vowel of “through”). Coexpressively with “up,” her hand rose upward, and coexpressively with “through,” her fingers spread outward to create an interior space. These took place together and were synchronized with “up through,” the linguistic package that combines the same meanings. The effect is a uniquely gestural way of packaging meaning – something like “rising hollowness,” which does not exist as a semantic package of English at all. Speech and gesture, at the moment of their synchronization, were coexpressive. The very fact there is shared reference to the character’s climbing up inside the pipe makes clear that it is being represented by the speaker in two ways simultaneously – analytic/combinatoric in speech and global/synthetic in gesture. We suggest it was this very simultaneous combination of opposites that evolution seized upon.

2. When signs and speech do combine in contemporary human performance, they do not synchronize. Kendon (1988) observed sign languages employed by aboriginal Australian women – full languages developed culturally for (rather frequent) speech taboos – which they sometimes combine with speech. The relevant point is that in producing these combinations, speech and sign start out synchronously, but then, as the utterance proceeds, speech outruns the semantically equivalent signs. The speaker stops speaking until the signs catch up and then starts over, only for speech and signs to pull apart again. If, in the evolution of language, there had been a similar doubling up of signs and speech, as the supplanting scenario implies, they too would have been driven apart rather than into synchrony, and for this reason, too, we doubt the replacement hypothesis.


3. The Wundt/Saussure “double essence” of gesture and language appears to be carried by a dedicated thought-hand-language circuit in the brain. This circuit strikes us as a prime candidate for an evolutionary selection at the foundation of language. It implies that the aforementioned combinations of speech and
gesture were the selected units, not gesture first with speech supplanting or later joining it. We observe this circuit in the unique neurological case of I.W., who lost all proprioception and spatial position sense from the neck down at age 19, and has since taught himself to move using vision and cognition. The thoughtlanguage- hand link, located presumably in Broca’s area, ties together language and gesture, and, in I.W., survives and is partly dissociable from instrumental action.

We can address Arbib’s pantomime model by observing the kinds of gestures the dedicated link sustains in I.W.’s performance, in the absence of vision: his gestures are (1) coexpressive and synchronous with speech; (2) not supplemental; and (3) not derivable from pantomime. I.W. is unable to perform instrumental actions without vision but continues to perform speech-synchronized, coexpressive gestures that are virtually indistinguishable from normal (topokinetic accuracy is reduced but morphokinetic accuracy is preserved) (Cole et al. 2002). His gestures without vision, moreover, minimize the one quality that could be derived from pantomime, a so-called “first-person” or “character” viewpoint, in which a gesture replicates an action of a character (cf. McNeill 1992).

More generally, an abundance of evidence demonstrates that spontaneous, speech-synchronized gestures should be counted as part of language (McNeill 1992). Gestures are frequent (accompanying up to 90% of utterances in narrations). They synchronize exactly with coexpressive speech segments, implying that gesture and related linguistic content are coactive in time and jointly convey what is newsworthy in context. Gesture adds cohesion, gluing together potentially temporally separated but thematically related segments of discourse. Speech and gesture develop jointly in children, and decline jointly after brain injury. In contrast to cultural emblems, such as the “O.K.” sign, speech-synchronized gestures occur in all languages, so far as is known. Finally, gestures are not “signs” with an independent linguistic code. Gestures exist only in combination with speech, and are not themselves a coded system.

Arbib’s gesture-first. Arbib’s concept of an expanding spiral may avoid some of the problems of the supplanting mechanism. He speaks of scaffolding and spiral expansion, which appear to mean, in both cases, that one thing is preparing the ground for or propping up further developments of the other thing – speech to gesture, gesture to speech, and so on. This spiral, as now described, brings speech and gesture into temporal alignment (see Fig. 6 in the target article), but also implies two things juxtaposed rather than the evolution of a single “thing” with a double essence. Modification to produce a dialectic of speech and gesture, beyond scaffolding, does not seem impossible. However, the theory is still focused on gestures of the wrong kind for this dialectic – in terms of Kendon’s Continuum (see McNeill 2000 for two versions), signs, emblems, and pantomime. Because it regards all gestures as simplified and meaning-poor, it is difficult to see how the expanding spiral can expand to include the remaining point on the Continuum, “gesticulations” – the kind of speech-synchronized coexpressive gesture illustrated above. A compromise is that pantomime was the initial protolanguage but was replaced by speech plus gesture, leading to the thoughtlanguage- hand link that we have described. This hypothesis has the interesting implication that different evolutionary trajectories landed at different points along Kendon’s Continuum. One path led to pantomime, another to coexpressive and speech-synchronized gesticulation, and so on. These different evolutions are reflected today in distinct ways of combining movements with speech. Although we do not question the importance of extending the mirror system hypothesis, we have concerns about a theory that predicts, as far as gesture goes, the evolution of what did not evolve instead of what did.

 

Unpublished Appendix: Gestures AND The Origin of language

1. INTRODUCTTION

The enduring popularity of the ‘gesture-first’ theory of language origins seems to presup­pose that gestures are simple and that as we, and language, became more complex speech evolved and to an extent supplanted gesture. The theory emerged as part of the Enlight­enment quest for the natural state of man and is credited to Condillac. It has continued to draw adherents ever since, e.g., [1, 2, 3]. However, contrary to the traditional view, we contend that gesture and lan­guage, as they currently exist, belong to a single system of ver­balized thinking and communication. Gesture and lan­guage developed to­gether and currently exist together. Nei­ther can be called the simple twin of the other.  It is this sys­tem, in which both speech and gesture are crucial, that we should be explaining.  It makes little sense to ask which part of an unbroken system is ‘simpler’—a bet­ter question is how the parts work to­gether.

2. PROBLEMS WITH GESTURE FIRST

2.1.      Speech-gesture combinations

In this system, we find synchrony and co-expressiveness—gesture and speech convey the same idea unit, at the same time. Gesture and speech exhibit what Wundt described long ago as the “si­multaneous” and “sequential” sides of the sentence [4,  p. 21] and Saussure, in notes recently discov­ered, termed “l’essence double du lan­gage” [5]. 

Consider the attached drawing. The speaker was describing a cartoon epi­sode in which one character tries to reach another character by climbing up inside a drainpipe. The speaker is saying, “and he tries going up thróugh it this time”, with the gesture occurring during the boldfaced portion (the illustration captures the moment when the speaker says the vowel of “through”).  Co-ex­pressively with “up” her hand rose up­ward and co-ex­pressively with “through” her fin­gers spread outward to create an interior space. These took place together, and were synchro­nized with “up through”, the linguistic package that combines the same mean­ings.

The effect is a uniquely gestural way of packaging meaning – something like ‘rising hol­lowness’, which does not exist as a semantic package of English at all.  Speech and ges­ture, at the moment of their synchronization, were co-expres­sive. The very fact there is shared refer­ence to the character’s climbing up in­side the pipe makes clear that it is being represented by the speaker in two ways simultaneously—analytic/combinatoric in speech and global/synthetic in gesture.  We suggest it was this very simultaneous combination of opposites that evolution seized upon.

2.2.      Speech-sign non-combinations

When signs and speech do combine in contemporary human performance they do not synchronize.  Kendon [6] ob­served sign languages employed by Aboriginal women—full languages de­veloped culturally for (rather frequent) speech taboos—which they some­times combine with speech.  The relevant point is that in producing these combi­nations speech and sign start out syn­chronously but then, as the utterance proceeds, speech out­runs the semanti­cally equivalent signs.  The speaker stops speaking until the signs catch up and then starts over, only for speech and signs to pull apart again.  If, in the evo­lution of language, there had been a similar doubling up of signs and speech, as the supplanting scenario implies, they too would have been driven apart rather than into synchrony, and for this reason too we doubt the replacement hypothe­sis.

2.3.      A dedicated thought-language-hand link

The Wundt/Saussure “double essence” of gesture and language appears to be carried by a dedicated thought-hand-lan­guage circuit in the brain. This circuit strikes us as a prime candidate for an evolutionary selection at the foundation of language.  It implies that the afore­mentioned combinations of speech and gesture were the selected units, not ges­ture first with speech supplanting it. We observe this circuit in the unique neuro­logical case of IW, who lost all proprio­ception and spatial position sense from the neck down at age 19, and has since taught himself to move using vision and cognition.  The thought-language-hand link, located presumably in Broca’s Area, ties together language and gesture, and, in IW, survives and is partly disso­ciable from instrumental action. In the absence of vision his gestures are: a) co-expressive and synchronous with speech, b) not supplemental, and c) not derivable from pantomime.   IW is unable to per­form instrumental actions without vision but continues to perform speech-syn­chronized, co-expressive gestures that are virtually indistinguishable from nor­mal (topokinetic accuracy is reduced but morphoki­netic accuracy is preserved) [7]. His gestures without vision, moreover, minimize the one quality that could be derived from pantomime, a so-called ‘first-person’ or ‘character’ viewpoint, in which a gesture replicates an action of a character (cf. [8]). 

In the following illustration, IW adjusts his rate of speech and ges­ture, with­out vision, in tandem—main­taining synchrony under conditio where seemingly the only pace-setting cues are his sense of how quickly a shared meaning is being presented in speech and gesture simultaneously.  Note that his hand circles in and out at the same speech points in both the slow and fast speech ratesns .



Normal Speed (bracketed material=0.56 sec., 5 syllables) “and [I’m startin’ to] use my hands now

Slow Speed (bracketed material=0.76 sec., 5 syllables) “because [I’m startin’ to] get into

I’m

startin’

to use

to get

 

 



 

2.4       Gestures in the total language picture

More generally, an abundance of evi­dence demonstrates that spontaneous, speech-syn­chronized gestures should be counted as part of language [8]. Gestures are frequent (ac­compa­nying up to 90% of utterances in narra­tions). They syn­chronize exactly with co-expressive speech segments, imply­ing that gesture and related linguistic content are co-ac­tive in time and jointly convey what is newsworthy in context.  Gesture adds cohesion, gluing together potentially temporally separated but thematically related segments of dis­course.  Speech and gesture develop jointly in children, and de­cline jointly after brain injury.  In contrast to cultural emblems, like the “O.K.” sign, speech-synchronized ges­tures occur in all lan­guages, so far as is known. Finally, ges­tures are not ‘signs’, with an independ­ent linguistic code.  Gestures exist only in combination with speech, and are not themselves a coded system. 

4. ARBIB’S VERSION OF GESTURE-FIRST

Michael Arbib [9] has presented a new version of ‘gesture-first’ in the form of an ex­panding evolutionary spiral, which may avoid some of the problems of the supplanting mechanism.  How­ever, be­cause the theory regards all ges­tures as simplified and meaning-poor, it is diffi­cult to see how the expanding spi­ral can include the kind of speech-syn­chronized co-expressive gestures illus­trated above.  A compromise theory is that panto­mime was the initial proto-language, but was replaced by speech plus gesture, leading to the thought-lan­guage-hand link that we have described. This hypothesis has the interest­ing im­plication that different evolutionary tra­jectories landed at dif­ferent points along Kendon’s Continuum (cf. [10]). One path led to pantomime, an­other to co-expressive and speech-syn­chronized gesticulation, etc.  These dif­ferent evolutions are reflected today in distinct ways of combining movements with speech.

 

REFERENCES

Arbib, Michael A. In press. From monkey-like action recognition to human lan­guage:   An evolutionary framework for neurolinguistics.  Behavioral and Brain Sci­ences.

Armstrong, David F., Stokoe, Wil­liam C., & Wilcox, Sherman E. 1995. Gesture and the Nature of Language. Cambridge: Cambridge University Press.

Blumenthal, Arthur (ed. and trans.). 1970. Language and Psychology: Historical as­pects of psycholinguis­tics. New York: John Wiley & Sons Ltd.

Cole, J., Gallagher, S., and McNeill, D. 2002. Gesture following deaffer­entation: A phenomenologically in­formed experimental study.  Phe­nomenology and the Cogni­tive Sci­ences 1: 49-67.\

Hewes, Gordon W. 1973.  Primate communication and the gestural ori­gins of lan­guage.  Current Anthro­pology 14:5-24.

Corballis, Michael C. 2002.  From Hand to Mouth: The origins of lan­guage.  Prince­ton, NJ: Princeton University Press.

Harris, Roy. 2002. Times Literary Supplement 26 July, 5182: 30.

Kendon, Adam. 1988. Sign Lan­guages of Aboriginal Australia: Cultural, semiotic and communica­tive perspectives. Cambridge: Cam­bridge University Press.

McNeill, David. 1992.  Hand and Mind: What gestures reveal about thought.  Chi­cago: University of Chicago Press.

McNeill, David. 2000.  Introduc­tion. In D. McNeill (ed.), Language and Gesture, pp. 1-10.  Cambridge: Cambridge University Press.