|
|
Section 3 - Implications of Using Auditory Feedback for the SLP
The Ninth Vocal Fold Physiology Symposium, organized and
sponsored by the Voice Foundation, titled Vocal Fold Physiology: Controlling Complexity
and Chaos, was held in 1995 in Sydney, Australia. There were many papers presented
that considered the auditory influence on speech and voice production. These papers and
discussions were prepared as a book publication, edited by P. Davis and N. Fletcher
(1996), Vocal Fold Physiology, Controlling Complexity and Chaos, San Diego:
Singular Publishing Group (1996). Excerpts (pages 348-357) dealing with auditory function
and auditory feedback from Chapter 23 by Daniel R. Boone, "Clinical Relevance of
Controlling Chaos and Complexity: Implications for the Speech Pathologist, is reproduced
here with permission of the publisher. The references at the end of this section represent
literature citations made for the entire Application manual.
It is hoped that some of the text will serve to underscore the need for a clinical
instrument like the Facilitator with its five auditory components: real-time
amplification, looping playback, delayed auditory feedback, speech-range masking, and
metronomic pacing.
Auditory Monitoring and Shaping
[excerpt beginning from middle of
p. 349]
Professional users of voice, particularly actors and singers, would agree with the work
of Wyke [5]
who postulated that vocalization involves prephonatory tuning and
acoustic automonitoring.
Before we actually vocalize speech or singing patterns, it appears that the organization
of the vocalization
is auditorily governed. Voicing may well differ from the neural organization required in
somatic motor
behavior. It is well established that somatic motor behavior (such as throwing a ball)
requires a premotor
planning set which precedes production. Such motor function requires premotor cortical
planning,
demonstrated to be activated in the cortical premotor strip (Brodman 6) anterior to the
Brodman 4 motor
cortex [6]. Further, there is a precise temporal relationship to such premotor activity
and the continuous,
sequential movements required for a motor task (such as throwing the ball). May there not
be an auditory-
governance system that exists similarly to premotor cortex that is an active prephonation
modeling system
that directs the rapid sequence of phonations observed in speaking and singing? This
auditory-governance
system also plays a critical role in self-correction enabling the vocal performer to
adjust vocal production of
fundamental frequency, inflection, duration, and prosody to match (or differ)
from the internal prephonation
model.
A number of papers at the Ninth Vocal Fold Physiology Symposium, Controlling Chaos and
Complexity, have given credence to the existence of an auditory-governance system that
provides direction for human voicing. For example, the work of Kawahara [7,8] using
transformed auditory feedback (TAF) and delayed auditory feedback (DAF) shows the
influence of deviations in auditory feedback on the ongoing phonation of experimental
subjects. Kawahara and Williams [9] have shown fundamental frequency (Fo) feedback
distortions can be corrected by subjects within a latency period of 100-250 ms. This
landmark research suggests that auditory perception plays a role "in a system that
automatically regulates voice fundamental frequency,"perhaps pointing to a model of
Fo in which laryngeal output is monitored by a component of the auditory system. While the
TAF data support the role of auditory feedback in correcting adjustments of Fo, the DAF
data have well documented over time the effect of auditory feedback on the melody and
prosody of speech and song.
The TAF method, altering frequency feedback to a subject while phonating, was also used
by others [10] who studied the reaction time of subjects attempting to maintain
"vocalization with a steady pitch". Immediately (a latency of 50 or 100 ms)
following a stimulus cue, the TAF occurred. Under this condition, pitch reactions to the
tone cue occurred within 120 to 180 ms. The authors postulated that there may be
shared pathways of CNS systems involved in the control of voice Fo. Of some
clinical relevance, however, is the observation that the vocal response (pitch) following
a slight alteration of Fo feedback can be immediately (120-180 ms) corrected. The auditory
perceptual monitoring system obviously plays a vital role in vocal production.
Other chapters from the conference look closely at the replicability of various voice
productions, often using auditory modeling that would require some kind of ongoing
auditory-governance. Each of the studies required some kind of human subject phonation
response. Watson and Hixon [11], while focusing on respiratory function during singing,
studied a trained singer in the process of learning an original aria: at first, singing it
cold, followed by performing it as a memorized piece. The singer was urged to
sing through the new piece several times until it could be performed from auditory memory,
achieving accurate goals with an economy of movement. While there would be
some focus by the singer on the proprioceptive feedback related to breathing patterns, the
primary reliance was on auditory memory. There was evidence that the singer had learned an
auditory patterning: a combination of phonation prolongation, Fo shift, prosodic
variations, duration shifts, and any other auditory component that is part of song. The
study from one perspective was a study to see how quickly a trained singer could develop a
prephonatory set. Once such a set was available to the singer, the aria could be
performed.
Others [12] looked at replicability and accuracy of pitch patterns in professional
singers, requiring three experienced singers to sing the same passages three times. Not
only was Fo studied, but the consistency of vibrato was determined for each subjects
three repetitions of the passage. What (and where) are the neural controls for such fine
gradations of pitch? Does the internal auditory prephonation set in the trained singer
have a sensitivity for 310 Hz deviations from ones prephonation target-set and
actual pitch performance?
Another argument for an auditory-governance system for phonation can be seen in the
expression of tonal languages, where intonations have specific coded-language meanings.
Rose [13] writes in comparing two different groups of Chinese dialects (Yue and Wu) that
one intonation difference can change specific syntactically or phonologically
defined environments. The complicated Yue (Hong Kong Cantonese) dialect consists
often of six contrasting tones for a single word, each tone representing a different
meaning. One can only postulate the fine discriminations in frequency and stress that the
speaker must make to inflect the different coded meanings in a tonal language. The
auditory-governance system must be fine tuned to permit such fine vocal adjustments.
Although the focus of a study by Estill and others [14] looked at temporal perturbation
in varying modes of the singing voice, the remarkable differences in vocal product lend
support to the probable existence of an auditory-governance system. Singers were taught
six different vocal qualities: speech, opera, twang, belting, falsetto, and sob. The
authors looked at continuous temporal alteration between two fundamental frequencies and
discontinuous and seemingly random switching between one mode of vibration to
another within an utterance. Although one might argue that the singer, to produce
such fine switches, must rely on proprioceptive feedback and mental imaging (switching to
a new singing role), there is obviously heavy guidance from auditory self-hearing.
Neuroanatomic Locus of the
Auditory-Governance System
Much of our knowledge of the auditory system has been gained from the study of the
guinea pig, cat, dog, and primate [15]. Although much of the study in animals has been
restricted to tonotopic organization and response, some of the studies of primates
reported [16, 17] have shown greater relevance to the neuronatomic locus of the auditory
system in humans. The organization of the human auditory system as described by Celesia
[18] reviews the role of the medial geniculates in carrying tonotopic afferents to the
primary auditory cortex of Brodmann 41 and 42 (Heschls gyrus). Surrounding this
primary auditory cortex is area 22 known as Wemickes cortex, which apparently has
great relevance to the understanding of the spoken word. The cytoarchitectonic
organization of the human auditory cortex has been detailed to show [19] the
thalamocortical connections with their absolute tonotopic cortical display. It was the
definitive work of Minckler [20], however, that demonstrated a heretofore unidentified
bundle of fibers radiating from the lateral pulvinar of the thalamus directly to
Wernickes area of the temporal lobe, therefore bypassing primary auditory cortex.
This human auditory bundle is posterior to that portion of area 22 from which radiates a
bundle of fibers down to Brocas convolution, known as the arcuate fasciculus. This
Minckler-identified bundle of fibers is postulated to contain both afferent and efferent
fibers between auditory association cortex and the pulvinar body of the thalamus. Can this
bundle have some relevance to the neuroanatomical location of an auditory-governance
system?
Since some kind of auditory-govemance system appears to be a vital part of most human
vocalization, there is obvious relevance of such a system to voice therapy. In voice
therapy, we place heavy reliance on self-hearing and monitoring, external auditory
modeling, and helping a patient hear a good voice from a bad one.
Influences on Respiration and
Phonation
The mechanical role of the larynx is discussed [21] with particular emphasis given to
its role of resistance in respiration. As a resistor, the larynx is placed in series with
the lung and "contributes up to 25% of the total airway resistance." This
resistance falls on inspiration and increases during expiration. This mechanism of
laryngeal braking is prominent in neonatal life, becoming less as the human matures. Of
some clinical relevance is the importance of these data specific to speech pathologist
participation in therapy for asthmatics and patients with paradoxical vocal fold function,
where there may be greater laryngeal resistance to inspiratory airflow than expiratory
air.
Studies [22] of the neuronal loci for vocalization in the decerebrate cat have
demonstrated that neurons in the lateral part of the intermediate periaqueductal gray
(PAG) matter integrate respiratory-laryngeal-facial muscles responsible for vocalization.
The results of the current study and previous studies [23] have shown that emotional
vocalization seems to be monitored by sequenced neuronal templates within the PAG,
resulting in altered breathing patterns observed as emotional vocalization. Of some
clinical relevance are recent respiratory-vocalization studies [24] that suggest
linguistic demands by the human speaker dictate changes in respiratory function to match
ongoing linguistic needs while speaking. Once again, we see patterns of vocalization
emerging as prosodic linguistic patterns, rather than as isolated vocalization or isolated
phonemic movements.
Further study [25] of the PAG and emotional expression found that distinct coordinated
patterns of skeletal, autonomic, and antinociceptive adjustments are mediated by
longitudinal PAG neuronal columns, located lateral and ventrolateral to the aqueduct. They
concluded that the PAG "lies at a crossroads for a multitude of neural circuits"
and is required for animal survival with emotional vocalizations used for coping with
"stress, threat, and pain." It might be postulated that emotional situations
trigger a series of respiratory and other movements as a reaction to stress, that could
interfere with such higher cortical directives as linguistic vocalization.
Studies [26] of laryngeal muscle patterning during speech and contrasted with nonspeech
laryngeal gestures find profound differences between the two behaviors. Using a focal
stimulator, laryngeal muscle movements during speech demonstrated a rapidly conducting
neural pathway from the cerebral cortex to the periphery. Nonspeech laryngeal gestures,
including respiration, sniffing, throat clearing, and voluntary cough, showed much more
neuronal patterning at lower brain levels, lacking the direct neural flow from cortex to
periphery seen during speech voicing. Once again, we must appreciate the existence of
sequenced motor patterning of vocal responses. In normal speech, one might postulate that
the equivalent premotor control system required for somatic movements is supplemented by
an auditory-governance system that provides the silent modeling need for on-target
vocalization.
Through electrical stimulation of the midbrain of the anesthetized dog [27], howls,
growls, and whines have been observed. At cortical levels, there appear to be discrete
areas of dog motor cortex that when stimulated can produce changes in dog phonation. In
developing a cortical map of dog phonation, Luschei hopes eventually to demonstrate some
neural connection between dog cortex and midbrain structures. The wolf or hounddog, which
seem to possess the most complex vocal system, may exhibit cortically directed
vocalization. However, human volitional phonation requiring obvious cortical initiation
that has the capability of becoming phonologically elaborate probably still requires
sequential neural templates in the midbrain for its actual motoric execution.
Traditional voice analysis of vocal fold and laryngeal function has received a
complimentary assist from nonlinear dynamics. So much of vocal function can be said to be
deterministic and unpredictable, with the term chaotic well characterizing
these nonlinear data. The acoustics of our chaotic voicing system is shaped by complex,
nonlinear phonation coupled with "multimode resonators both downstream and upstream
from the glottis" [28]. Add to the equation human performance variability, and we can
appreciate the dilemma of the clinician who attempts to make order and predictability out
of chaotic performance.
Herzel [28] analyzed voice signals from a nonlinear dynamics point of view, concluding
that rough voice may be caused by a number of physical instabilities. In attempting to
measure vocal roughness, he found that widely used jitter and shimmer calculations measure
only the amount of perturbation ("but not its correlations") and, therefore, are
not sufficient to quantify roughness. It appears clinically that not only do bifurcations
and chaos contribute to patient voice roughness, but the patients motivations,
internal homeostasis, and fatigue will vary according to the time of day, clinic noise
levels, and with the overall interactive effectiveness of the voice clinician. Our
clinic-laboratory hardware is among the more stable components of the clinical scene.
Discussion and Summary
Since some kind of auditory-governing system appears to be a vital part of most human
vocalization, there is obvious relevance of such a system to voice therapy. In voice
therapy, we place heavy reliance on self-hearing and monitoring, external auditory
modeling, and helping the patient hear a good voice from a bad one.
What we hear is what we say and sing. The use of transformed auditory and delayed
auditory feedback tells us how immediately one can correct vocal production to compensate
for distortions of auditory feedback. It appears that the human has a silent auditory
system that provides the modeling required for specific vocalization. Some faulty voices
may be the result of a faulty auditory-governance system. The use of masking noise seems
to defeat the impact of the faulty auditory model, and under masking conditions the
patient may show a much improved voice. We record the patients voice under masking
conditions, and if this voice appears to be desirable phonation, we then use it as a model
for the voice patient to copy
The tape recorder enables us to use the patients voice as a model. By using
various voice therapy facilitating approaches [29,30], the patient is often able to
produce a target voice (the voice the clinician thinks would be good for the patient to
use). When using a specific therapy approach, we record the patients vocal
responses. Once the target voice is produced, we stop recording and play back for the
patient his or her target voice. We use the patients "best" voice as an
auditory model to which the patient listens and then matches. Intensive practice repeating
the target model will often provide the patient the "feeling" of what the target
voice should feel like, as well as practice in matching an internal-external auditory
model.
My focus on using auditory modeling in therapy has led me to use amplification with
patients wearing earphones in voice therapy. The slight amplification provided seems to
help the patient focus on the auditory aspects of voicing. With slight ongoing
amplification, as the patients speak, they often exhibit a stronger, clearer voice with
less perturbation. The earphones are obviously useful for patients who are using either
masking or an auditory model in therapy.
There appears to be an auditory-govemance system that monitors and directs the
vocalization of speech and song. Such an auditory system needs to be described,
neuroanatomically located, and tested for its function and effects. Meanwhile, for
patients with faulty vocalization, judicious use of masking noise will often help
particular patients produce a better-sounding voice. Use of the auditory-govemance system
in voice therapy often produces desirable phonation, by employing auditory modeling
(preferably of the patients own voice) and/or by using amplification of the
patients vocalization attempts.
References
[1]. Hubbell R. "Language and linguistics" in Speech,
Language, and Hearing. 2nd. ed., Eds. P. Skinner and R. Shelton, (Wiley, New York
1985).
[2]. Crystal D. "Linguistic mythology and the first year of life", British
J Disorders of Communic. 8, 29-36 (1973).
[3]. Blount, B. "Emotional expression" in Language Development, Vol. 2:
Language, Thought, and Culture, Ed. S. Kuczaj, (Eribaum, Hillsdale, NJ 1982).
[4]. Boone, D.R. and Plante, E. Human Communication and its Disorders, 2nd ed,
(Prentice Hall, Englewood Cliffs, NJ 1993).
[5]. B. Wyke. "Advances in the neurology of phonation: Phonatory reflex
mechanisms in the larynx" British J. Communic. 2, 2-14 (1967).
[6]. Mesulam, M.M. Principles of Behavioral Neurology, (F.A. Davis, Philadelphia
1985).
[7]. Kawahara, H. "Interactions between speech production and perception under
auditory feedback perturbations on fundamental frequencies", J. Acoust. Socjapan 15,
201-202 (1994).
[8]. Kawahara, H. "Transformed auditory feedback: Effects of fundamental frequency
perturbation," ATR Tech. Rep. 12, 1-14 (1993).
[9]. Kawahara, H. and Williams, J.C. "Effects of auditory feedback on voice pitch
trajectories: Characteristic responses to pitch perturbations", in Vocal Fold
Physiology, Controlling Complexity and Chaos, Ch. 18, (Singular Publishing Group, San
Diego, 1996).
[10]. Larson, C.R., White, J.P., Freedland, M.B., and Burnett, T.A. "Interactions
between voluntary pitch modulations and pitch-shifted feedback signals: Implications for
neural control of voice pitch", in Vocal Fold Physiology, Controlling Complexity
and Chaos, Ch. 19, (Singular Publishing Group, San Diego, 1996).
[11]. Watson, P.J. and Hixon, T J. "Respiratory behavior during the learning of a
novel aria by a highly trained classical singer" in Vocal Fold Physiology,
Controlling Complexity and Chaos, Ch. 22, (Singular Publishing Group, San Diego,
1996).
[12]. Sundberg, J., Prame, E., and Lwarsson, J. "Replicability and accuracy of
pitch patterns in professional singers", Ch. 20 (Singular Publishing Group, San
Diego, 1996).
[13]. Rose, P. "Between- and within-speaker variation in the fundamental frequency
of Cantonese citation tones", in Vocal Fold Physiology, Controlling Complexity and
Chaos, Ch. 21 (Singular Publishing Group, San Diego,1996).
[14]. Estill, J., Fujimura, 0., Sawada, M., and Beechler, K. "Temporal
perturbation and voice qualities", in Vocal Fold Physiology, Controlling
Complexity and Chaos, Ch. 16, (Singular Publishing Group, San Diego,1996).
[15]. Schreiner, C.E. and Cynader, M.S. "Basic fundamental organization of second
auditory cortical field (AII) of the cat", J. NeurophysioL 51, 1284-1305
(1984).
[16]. Kasdon ,D.L. and Jacobson, S. "The thalamic efferents to the inferior
parietal lobule of the rhesus monkey", J. Comp. NeuroL 177, 685-706
(1978).
[17]. Pfingst, B.E., Altschuler, R.A., Watkin, K.L., and Larson, C.R.
"Neuroanatomic bases of hearing and speech" in Handbook of Speech-Language
Pathology and Audiology, Eds. N.J. Lass, L.V McReynolds, J.L. Northern, and D.E.
Yoder, (Decker, Philadelphia, 1988), 77-127.
[18]. Celesia, G.G. "Organization of auditory cortical areas in man", Brain
99, 403-414 (1976).
[19]. Galaburda, A. and Sanides, E. "Cytoarchitectonic organization of the
human auditory cortex", J. Cotnp. Neurol. 227, 511-539 (1984).
[20]. Minckler, J. "Functional organization and maintenance", Introduction
to Neuroscience (C.V. Mosby, St. Louis, 1972).
[21]. Brancatisano, A. "Respiratory control of the larynx", in Vocal Fold
Physiology, Controlling Complexity and Chaos, Ch. 8 (Singular Publishing Group, San
Diego, 1996).
[22]. Davis, P., Zhang, S.P., and Bandler, R. "Midbrain and medullary regulation
of vocalization", in Vocal Fold Physiology, Controlling Complexity and Chaos,
Ch. 8 (Singular Publishing Group, San Diego, 1996).
[23]. Zhang, S.P., Davis, P.J., Bandler, R., and Carrive, P. "Brain stem
integration of vocalization: Role of the midbrain periaqueductal gray", J.
Neurophysiol 72 1337-1356 (1994).
[24]. Winkworth, A.L., Davis, P.J., Adams, R.D., and Ellis, E. "Breathing patterns
during spontaneous speech", J. Speech and Hearing Res. 38, 124-144 (1995).
[25]. Bandler, R., Keay, K.A., Vaughan, C.W., and Shipley, M.T. "Columnar
organization of PAG neurons regulating emotional and vocal expression", in Vocal
Fold Physiology, Controlling Complexity and Chaos, Ch.10 (Singular Publishing Group,
San Diego, 1996).
[26]. Ludlow, C. and Lou, G. "Observations on human laryngeal muscle
control", in Vocal Fold Physiology, Controlling Complexity and Chaos, Ch. 14,
(Singular Publishing Group, San Diego,1996).
[27]. Jaffe, D.M., Soloman, N.R., and Luschei, E.S. "Activation of laryngeal
muscle by electrical stimulation of the canine motor cortex", in Vocal Fold
Physiology, Controlling Complexity and Chaos, Ch. 13 (Singular Publishing Group, San
Diego, 1996).
[28]. Herzel, H. "Possible mechanisms of vocal instabilities", in Vocal
Fold Physiology, Controlling Complexity and Chaos, Ch. 5 (Singular Publishing Group,
(1996).
[29]. Boone, D.R. and McFarlane, S.C. The Voice and Voice Therapy, 5th ed.,
(Prentice Hall, Englewood Cliffs, NJ, 1994).
[30]. Colton, R.H. and Casper, J.K. Understanding Voice Problems, (Williams and
Wilkins, Baltimore, 1990).
Previous /
Next |