About KayPENTAXProductsSupportPublications
 
 
Home > Publications > Application Notes
 
Application Notes
PDF not available
Section 3 - Implications of Using Auditory Feedback for the SLP

The Ninth Vocal Fold Physiology Symposium, organized and sponsored by the Voice Foundation, titled Vocal Fold Physiology: Controlling Complexity and Chaos, was held in 1995 in Sydney, Australia. There were many papers presented that considered the auditory influence on speech and voice production. These papers and discussions were prepared as a book publication, edited by P. Davis and N. Fletcher (1996), Vocal Fold Physiology, Controlling Complexity and Chaos, San Diego: Singular Publishing Group (1996). Excerpts (pages 348-357) dealing with auditory function and auditory feedback from Chapter 23 by Daniel R. Boone, "Clinical Relevance of Controlling Chaos and Complexity: Implications for the Speech Pathologist, is reproduced here with permission of the publisher. The references at the end of this section represent literature citations made for the entire Application manual.

It is hoped that some of the text will serve to underscore the need for a clinical instrument like the Facilitator with its five auditory components: real-time amplification, looping playback, delayed auditory feedback, speech-range masking, and metronomic pacing.

Auditory Monitoring and Shaping [excerpt beginning from middle of p. 349]

Professional users of voice, particularly actors and singers, would agree with the work of Wyke [5]
who postulated that vocalization involves ‘prephonatory tuning’ and ‘acoustic automonitoring.’
Before we actually vocalize speech or singing patterns, it appears that the organization of the vocalization
is auditorily governed. Voicing may well differ from the neural organization required in somatic motor
behavior. It is well established that somatic motor behavior (such as throwing a ball) requires a premotor
planning set which precedes production. Such motor function requires premotor cortical planning,
demonstrated to be activated in the cortical premotor strip (Brodman 6) anterior to the Brodman 4 motor
cortex [6]. Further, there is a precise temporal relationship to such premotor activity and the continuous,
sequential movements required for a motor task (such as throwing the ball). May there not be an auditory-
governance system that exists similarly to premotor cortex that is an active prephonation modeling system
that directs the rapid sequence of phonations observed in speaking and singing? This auditory-governance
system also plays a critical role in self-correction enabling the vocal performer to adjust vocal production of
fundamental frequency, inflection, duration, and prosody to match (or differ) from the internal prephonation
model.

A number of papers at the Ninth Vocal Fold Physiology Symposium, Controlling Chaos and Complexity, have given credence to the existence of an auditory-governance system that provides direction for human voicing. For example, the work of Kawahara [7,8] using transformed auditory feedback (TAF) and delayed auditory feedback (DAF) shows the influence of deviations in auditory feedback on the ongoing phonation of experimental subjects. Kawahara and Williams [9] have shown fundamental frequency (Fo) feedback distortions can be corrected by subjects within a latency period of 100-250 ms. This landmark research suggests that auditory perception plays a role "in a system that automatically regulates voice fundamental frequency,"perhaps pointing to a model of Fo in which laryngeal output is monitored by a component of the auditory system. While the TAF data support the role of auditory feedback in correcting adjustments of Fo, the DAF data have well documented over time the effect of auditory feedback on the melody and prosody of speech and song.

The TAF method, altering frequency feedback to a subject while phonating, was also used by others [10] who studied the reaction time of subjects attempting to maintain "vocalization with a steady pitch". Immediately (a latency of 50 or 100 ms) following a stimulus cue, the TAF occurred. Under this condition, pitch reactions to the tone cue occurred within 120 to 180 ms. The authors postulated that there may be ‘shared pathways of CNS systems involved in the control of voice Fo.’ Of some clinical relevance, however, is the observation that the vocal response (pitch) following a slight alteration of Fo feedback can be immediately (120-180 ms) corrected. The auditory perceptual monitoring system obviously plays a vital role in vocal production.

Other chapters from the conference look closely at the replicability of various voice productions, often using auditory modeling that would require some kind of ongoing auditory-governance. Each of the studies required some kind of human subject phonation response. Watson and Hixon [11], while focusing on respiratory function during singing, studied a trained singer in the process of learning an original aria: at first, singing it ‘cold’, followed by performing it as a memorized piece. The singer was urged to sing through the new piece several times until it could be performed from auditory memory, ‘achieving accurate goals with an economy of movement.’ While there would be some focus by the singer on the proprioceptive feedback related to breathing patterns, the primary reliance was on auditory memory. There was evidence that the singer had learned an auditory patterning: a combination of phonation prolongation, Fo shift, prosodic variations, duration shifts, and any other auditory component that is part of song. The study from one perspective was a study to see how quickly a trained singer could develop a prephonatory set. Once such a set was available to the singer, the aria could be performed.

Others [12] looked at replicability and accuracy of pitch patterns in professional singers, requiring three experienced singers to sing the same passages three times. Not only was Fo studied, but the consistency of vibrato was determined for each subject’s three repetitions of the passage. What (and where) are the neural controls for such fine gradations of pitch? Does the internal auditory prephonation set in the trained singer have a sensitivity for 3–10 Hz deviations from one’s prephonation target-set and actual pitch performance?

Another argument for an auditory-governance system for phonation can be seen in the expression of tonal languages, where intonations have specific coded-language meanings. Rose [13] writes in comparing two different groups of Chinese dialects (Yue and Wu) that one intonation difference can change ‘specific syntactically or phonologically defined environments.’ The complicated Yue (Hong Kong Cantonese) dialect consists often of six contrasting tones for a single word, each tone representing a different meaning. One can only postulate the fine discriminations in frequency and stress that the speaker must make to inflect the different coded meanings in a tonal language. The auditory-governance system must be fine tuned to permit such fine vocal adjustments.

Although the focus of a study by Estill and others [14] looked at temporal perturbation in varying modes of the singing voice, the remarkable differences in vocal product lend support to the probable existence of an auditory-governance system. Singers were taught six different vocal qualities: speech, opera, twang, belting, falsetto, and sob. The authors looked at continuous temporal alteration between two fundamental frequencies and ‘discontinuous and seemingly random switching between one mode of vibration to another within an utterance.’ Although one might argue that the singer, to produce such fine switches, must rely on proprioceptive feedback and mental imaging (switching to a new singing role), there is obviously heavy guidance from auditory self-hearing.

Neuroanatomic Locus of the Auditory-Governance System

Much of our knowledge of the auditory system has been gained from the study of the guinea pig, cat, dog, and primate [15]. Although much of the study in animals has been restricted to tonotopic organization and response, some of the studies of primates reported [16, 17] have shown greater relevance to the neuronatomic locus of the auditory system in humans. The organization of the human auditory system as described by Celesia [18] reviews the role of the medial geniculates in carrying tonotopic afferents to the primary auditory cortex of Brodmann 41 and 42 (Heschl’s gyrus). Surrounding this primary auditory cortex is area 22 known as Wemicke’s cortex, which apparently has great relevance to the understanding of the spoken word. The cytoarchitectonic organization of the human auditory cortex has been detailed to show [19] the thalamocortical connections with their absolute tonotopic cortical display. It was the definitive work of Minckler [20], however, that demonstrated a heretofore unidentified bundle of fibers radiating from the lateral pulvinar of the thalamus directly to Wernicke’s area of the temporal lobe, therefore bypassing primary auditory cortex. This human auditory bundle is posterior to that portion of area 22 from which radiates a bundle of fibers down to Broca’s convolution, known as the arcuate fasciculus. This Minckler-identified bundle of fibers is postulated to contain both afferent and efferent fibers between auditory association cortex and the pulvinar body of the thalamus. Can this bundle have some relevance to the neuroanatomical location of an auditory-governance system?

Since some kind of auditory-govemance system appears to be a vital part of most human vocalization, there is obvious relevance of such a system to voice therapy. In voice therapy, we place heavy reliance on self-hearing and monitoring, external auditory modeling, and helping a patient hear a good voice from a bad one.

Influences on Respiration and Phonation

The mechanical role of the larynx is discussed [21] with particular emphasis given to its role of resistance in respiration. As a resistor, the larynx is placed in series with the lung and "contributes up to 25% of the total airway resistance." This resistance falls on inspiration and increases during expiration. This mechanism of laryngeal braking is prominent in neonatal life, becoming less as the human matures. Of some clinical relevance is the importance of these data specific to speech pathologist participation in therapy for asthmatics and patients with paradoxical vocal fold function, where there may be greater laryngeal resistance to inspiratory airflow than expiratory air.

Studies [22] of the neuronal loci for vocalization in the decerebrate cat have demonstrated that neurons in the lateral part of the intermediate periaqueductal gray (PAG) matter integrate respiratory-laryngeal-facial muscles responsible for vocalization. The results of the current study and previous studies [23] have shown that emotional vocalization seems to be monitored by sequenced neuronal templates within the PAG, resulting in altered breathing patterns observed as emotional vocalization. Of some clinical relevance are recent respiratory-vocalization studies [24] that suggest linguistic demands by the human speaker dictate changes in respiratory function to match ongoing linguistic needs while speaking. Once again, we see patterns of vocalization emerging as prosodic linguistic patterns, rather than as isolated vocalization or isolated phonemic movements.

Further study [25] of the PAG and emotional expression found that distinct coordinated patterns of skeletal, autonomic, and antinociceptive adjustments are mediated by longitudinal PAG neuronal columns, located lateral and ventrolateral to the aqueduct. They concluded that the PAG "lies at a crossroads for a multitude of neural circuits" and is required for animal survival with emotional vocalizations used for coping with "stress, threat, and pain." It might be postulated that emotional situations trigger a series of respiratory and other movements as a reaction to stress, that could interfere with such higher cortical directives as linguistic vocalization.

Studies [26] of laryngeal muscle patterning during speech and contrasted with nonspeech laryngeal gestures find profound differences between the two behaviors. Using a focal stimulator, laryngeal muscle movements during speech demonstrated a rapidly conducting neural pathway from the cerebral cortex to the periphery. Nonspeech laryngeal gestures, including respiration, sniffing, throat clearing, and voluntary cough, showed much more neuronal patterning at lower brain levels, lacking the direct neural flow from cortex to periphery seen during speech voicing. Once again, we must appreciate the existence of sequenced motor patterning of vocal responses. In normal speech, one might postulate that the equivalent premotor control system required for somatic movements is supplemented by an auditory-governance system that provides the silent modeling need for on-target vocalization.

Through electrical stimulation of the midbrain of the anesthetized dog [27], howls, growls, and whines have been observed. At cortical levels, there appear to be discrete areas of dog motor cortex that when stimulated can produce changes in dog phonation. In developing a cortical map of dog phonation, Luschei hopes eventually to demonstrate some neural connection between dog cortex and midbrain structures. The wolf or hounddog, which seem to possess the most complex vocal system, may exhibit cortically directed vocalization. However, human volitional phonation requiring obvious cortical initiation that has the capability of becoming phonologically elaborate probably still requires sequential neural templates in the midbrain for its actual motoric execution.

Traditional voice analysis of vocal fold and laryngeal function has received a complimentary assist from nonlinear dynamics. So much of vocal function can be said to be deterministic and unpredictable, with the term ‘chaotic’ well characterizing these nonlinear data. The acoustics of our chaotic voicing system is shaped by complex, nonlinear phonation coupled with "multimode resonators both downstream and upstream from the glottis" [28]. Add to the equation human performance variability, and we can appreciate the dilemma of the clinician who attempts to make order and predictability out of chaotic performance.

Herzel [28] analyzed voice signals from a nonlinear dynamics point of view, concluding that rough voice may be caused by a number of physical instabilities. In attempting to measure vocal roughness, he found that widely used jitter and shimmer calculations measure only the amount of perturbation ("but not its correlations") and, therefore, are not sufficient to quantify roughness. It appears clinically that not only do bifurcations and chaos contribute to patient voice roughness, but the patient’s motivations, internal homeostasis, and fatigue will vary according to the time of day, clinic noise levels, and with the overall interactive effectiveness of the voice clinician. Our clinic-laboratory hardware is among the more stable components of the clinical scene.

Discussion and Summary

Since some kind of auditory-governing system appears to be a vital part of most human vocalization, there is obvious relevance of such a system to voice therapy. In voice therapy, we place heavy reliance on self-hearing and monitoring, external auditory modeling, and helping the patient hear a good voice from a bad one.

What we hear is what we say and sing. The use of transformed auditory and delayed auditory feedback tells us how immediately one can correct vocal production to compensate for distortions of auditory feedback. It appears that the human has a silent auditory system that provides the modeling required for specific vocalization. Some faulty voices may be the result of a faulty auditory-governance system. The use of masking noise seems to defeat the impact of the faulty auditory model, and under masking conditions the patient may show a much improved voice. We record the patient’s voice under masking conditions, and if this voice appears to be desirable phonation, we then use it as a model for the voice patient to copy

The tape recorder enables us to use the patient’s voice as a model. By using various voice therapy facilitating approaches [29,30], the patient is often able to produce a target voice (the voice the clinician thinks would be good for the patient to use). When using a specific therapy approach, we record the patient’s vocal responses. Once the target voice is produced, we stop recording and play back for the patient his or her target voice. We use the patient’s "best" voice as an auditory model to which the patient listens and then matches. Intensive practice repeating the target model will often provide the patient the "feeling" of what the target voice should feel like, as well as practice in matching an internal-external auditory model.

My focus on using auditory modeling in therapy has led me to use amplification with patients wearing earphones in voice therapy. The slight amplification provided seems to help the patient focus on the auditory aspects of voicing. With slight ongoing amplification, as the patients speak, they often exhibit a stronger, clearer voice with less perturbation. The earphones are obviously useful for patients who are using either masking or an auditory model in therapy.

There appears to be an auditory-govemance system that monitors and directs the vocalization of speech and song. Such an auditory system needs to be described, neuroanatomically located, and tested for its function and effects. Meanwhile, for patients with faulty vocalization, judicious use of masking noise will often help particular patients produce a better-sounding voice. Use of the auditory-govemance system in voice therapy often produces desirable phonation, by employing auditory modeling (preferably of the patient’s own voice) and/or by using amplification of the patient’s vocalization attempts.

References

[1]. Hubbell R. "Language and linguistics" in Speech, Language, and Hearing. 2nd. ed., Eds. P. Skinner and R. Shelton, (Wiley, New York 1985).

[2]. Crystal D. "Linguistic mythology and the first year of life", British J Disorders of Communic. 8, 29-36 (1973).

[3]. Blount, B. "Emotional expression" in Language Development, Vol. 2: Language, Thought, and Culture, Ed. S. Kuczaj, (Eribaum, Hillsdale, NJ 1982).

[4]. Boone, D.R. and Plante, E. Human Communication and its Disorders, 2nd ed, (Prentice Hall, Englewood Cliffs, NJ 1993).

[5]. B. Wyke. "Advances in the neurology of phonation: Phonatory reflex mechanisms in the larynx" British J. Communic. 2, 2-14 (1967).

[6]. Mesulam, M.M. Principles of Behavioral Neurology, (F.A. Davis, Philadelphia 1985).

[7]. Kawahara, H. "Interactions between speech production and perception under auditory feedback perturbations on fundamental frequencies", J. Acoust. Socjapan 15, 201-202 (1994).

[8]. Kawahara, H. "Transformed auditory feedback: Effects of fundamental frequency perturbation," ATR Tech. Rep. 12, 1-14 (1993).

[9]. Kawahara, H. and Williams, J.C. "Effects of auditory feedback on voice pitch trajectories: Characteristic responses to pitch perturbations", in Vocal Fold Physiology, Controlling Complexity and Chaos, Ch. 18, (Singular Publishing Group, San Diego, 1996).

[10]. Larson, C.R., White, J.P., Freedland, M.B., and Burnett, T.A. "Interactions between voluntary pitch modulations and pitch-shifted feedback signals: Implications for neural control of voice pitch", in Vocal Fold Physiology, Controlling Complexity and Chaos, Ch. 19, (Singular Publishing Group, San Diego, 1996).

[11]. Watson, P.J. and Hixon, T J. "Respiratory behavior during the learning of a novel aria by a highly trained classical singer" in Vocal Fold Physiology, Controlling Complexity and Chaos, Ch. 22, (Singular Publishing Group, San Diego, 1996).

[12]. Sundberg, J., Prame, E., and Lwarsson, J. "Replicability and accuracy of pitch patterns in professional singers", Ch. 20 (Singular Publishing Group, San Diego, 1996).

[13]. Rose, P. "Between- and within-speaker variation in the fundamental frequency of Cantonese citation tones", in Vocal Fold Physiology, Controlling Complexity and Chaos, Ch. 21 (Singular Publishing Group, San Diego,1996).

[14]. Estill, J., Fujimura, 0., Sawada, M., and Beechler, K. "Temporal perturbation and voice qualities", in Vocal Fold Physiology, Controlling Complexity and Chaos, Ch. 16, (Singular Publishing Group, San Diego,1996).

[15]. Schreiner, C.E. and Cynader, M.S. "Basic fundamental organization of second auditory cortical field (AII) of the cat", J. NeurophysioL 51, 1284-1305 (1984).

[16]. Kasdon ,D.L. and Jacobson, S. "The thalamic efferents to the inferior parietal lobule of the rhesus monkey", J. Comp. NeuroL 177, 685-706 (1978).

[17]. Pfingst, B.E., Altschuler, R.A., Watkin, K.L., and Larson, C.R. "Neuroanatomic bases of hearing and speech" in Handbook of Speech-Language Pathology and Audiology, Eds. N.J. Lass, L.V McReynolds, J.L. Northern, and D.E. Yoder, (Decker, Philadelphia, 1988), 77-127.

[18]. Celesia, G.G. "Organization of auditory cortical areas in man", Brain 99, 403-414 (1976).

[19]. Galaburda, A. and Sanides, E. "Cytoarchitectonic organization of the human auditory cortex", J. Cotnp. Neurol. 227, 511-539 (1984).

[20]. Minckler, J. "Functional organization and maintenance", Introduction to Neuroscience (C.V. Mosby, St. Louis, 1972).

[21]. Brancatisano, A. "Respiratory control of the larynx", in Vocal Fold Physiology, Controlling Complexity and Chaos, Ch. 8 (Singular Publishing Group, San Diego, 1996).

[22]. Davis, P., Zhang, S.P., and Bandler, R. "Midbrain and medullary regulation of vocalization", in Vocal Fold Physiology, Controlling Complexity and Chaos, Ch. 8 (Singular Publishing Group, San Diego, 1996).

[23]. Zhang, S.P., Davis, P.J., Bandler, R., and Carrive, P. "Brain stem integration of vocalization: Role of the midbrain periaqueductal gray", J. Neurophysiol 72 1337-1356 (1994).

[24]. Winkworth, A.L., Davis, P.J., Adams, R.D., and Ellis, E. "Breathing patterns during spontaneous speech", J. Speech and Hearing Res. 38, 124-144 (1995).

[25]. Bandler, R., Keay, K.A., Vaughan, C.W., and Shipley, M.T. "Columnar organization of PAG neurons regulating emotional and vocal expression", in Vocal Fold Physiology, Controlling Complexity and Chaos, Ch.10 (Singular Publishing Group, San Diego, 1996).

[26]. Ludlow, C. and Lou, G. "Observations on human laryngeal muscle control", in Vocal Fold Physiology, Controlling Complexity and Chaos, Ch. 14, (Singular Publishing Group, San Diego,1996).

[27]. Jaffe, D.M., Soloman, N.R., and Luschei, E.S. "Activation of laryngeal muscle by electrical stimulation of the canine motor cortex", in Vocal Fold Physiology, Controlling Complexity and Chaos, Ch. 13 (Singular Publishing Group, San Diego, 1996).

[28]. Herzel, H. "Possible mechanisms of vocal instabilities", in Vocal Fold Physiology, Controlling Complexity and Chaos, Ch. 5 (Singular Publishing Group, (1996).

[29]. Boone, D.R. and McFarlane, S.C. The Voice and Voice Therapy, 5th ed., (Prentice Hall, Englewood Cliffs, NJ, 1994).

[30]. Colton, R.H. and Casper, J.K. Understanding Voice Problems, (Williams and Wilkins, Baltimore, 1990).

 
 
 
 
 
 
Copyright © 1996-2011 KayPENTAX, a Division of PENTAX Medical Company. All rights reserved.