Chapter 3: COMPUTER
MANIPULATIONS OF A DIGITIZED AUDIO PERFORMANCE OF A POETRY READING.
Aspects of text reading
and music performance
Electro-acoustic
examples of speech manipulation
Altered voice: I am
sitting in a room
Enhanced voice:
Smalltalk and Late August
Composition of the
piece Someone
The computer and
composition processes used in Someone
The algorithms used
for Someone and their functions
Speed of the different
glissandi used in Part One
Construction of the
Nine Parts
This chapter looks at the
process taken in deriving a musical composition from a poetry reading. To this
end a recording of a reading of the poem Saint Dymphna's Bells by its
author Barry Dickins, is manipulated using algorithms built with the IRCAM
Signal Processing Workstation (ISPW) to produce the musical work Someone
(compact discs Three, Four, Five, Six and Seven). The whole poem and a ten
second segment from the beginning of the poem are repeated and adjusted in
several ways to produce an installation of indeterminate duration.
The installation is made up of
nine separate parts of different durations and incorporating different ways of
using similar techniques; the parts and the techniques used to produce them are
discussed below in the section titled: The composition processes used in Someone.
In
the paper Music and speech performance: Parallels and contrasts. Rolf
Carlson, and others, put forward that:
in speech many different
prosodic factors are mixed together as one single acoustic parameter. [For
example] segmental inherent pitch, word tone, sentence type, lexical stress,
emphasis etc. can all be signalled in one single parameter, such as the voice
fundamental frequency. In the same way, the duration of speech sounds is
affected by a variety of conditions including stress, position in the
utterance, and local phonetic context.
The
same applies to music. There are many different reasons to lengthen or shorten
a note beyond its nominal duration [For example] emphasis, marking of phrase
endings, and sharpening the contrast between categories[1].
Lengthening or shortening note
durations are just one of the many tools available to musicians when
interpreting music. Other tools include varying the pitch, amplitude and/or
timbre of a note or phrase. These tools serve to allow each musician to imbue a
composition with their own expressiveness.
The degree to which a musician
can interpret a composition is dependent on the idiom in which he or she is
playing. For example: jazz musicians are expected to be able to interpret and
extemporise on a pre-existing melody within the jazz idiom and in order to
maintain idiomatic correctness. The degree of variations that can be made and
the palette of variations available in this idiom are quite broad, depending on
the idiomatic subset of the jazz idiom they are playing in.
On the other hand,
non-improvising musicians, such as classical orchestral musicians and classical
music soloists, are expected to interpret with a smaller palette. Here the art
of interpretation is far more critical. A musician whose role is as part of an
orchestra is expected to subject their own interpretation of a composition to
that of the conductor, who in turn is subject to the composer's act of
self-expression, that is, the composition.
Table 3.1 lists some of the
variables that are available to musicians and speakers in adding a degree of
self-expression when performing a text or composition. Under each heading three
possible variables are listed; there are, of course, other options available to
the performing speaker or musician.
|
Possible
Speech Variables |
Possible
Musical Variables |
|
3.1.1
Amplitude variation a)
varying the emphasis placed on certain syllables within a word or phrase; b)
wholly increasing or decreasing the amplitude of a word or phrase; c)
varying the amplitude within a syllable. |
Amplitude
variation a)
varying the emphasis placed on certain notes within a section or phrase; b)
wholly increasing or decreasing the amplitude of a note or phrase; c)
varying the amplitude within a note. |
|
3.1.2
Pitch variation a)
varying the pitch of certain syllables within a word or phrase; b)
wholly raising or lowering the pitch of a word or phrase; c)
varying the pitch within a syllable. |
Pitch
variation a)
varying the pitch of certain notes within a section or phrase; b)
wholly raising or lowering the pitch of a section or phrase; c)
varying the pitch within a note. |
|
3.1.3
Rhythmic variation a)
varying the inter-onset time between certain syllables within a word or
phrase; b)
wholly increasing or decreasing the duration of a word or phrase; c)
varying the inter-onset times or durations of syllables. |
Rhythmic
variation a)
varying the inter-onset time between certain notes within a section or
phrase; b)
wholly increasing or decreasing the duration of a note, section or phrase; c)
varying the inter-onset times or durations of notes. |
|
3.1.4
Timbre variation a)
varying the timbre of certain syllables within a word, or words within a
phrase; b)
wholly changing the timbre of a word or phrase; c)
varying the timbre of a syllable. |
Timbre
variation a)
varying the timbre of certain notes within a section or phrase; b)
wholly changing the timbre of a note or phrase; c)
varying the timbre of a note. |
|
3.1.5
Lexical variation varying
certain words or certain syllables within a word or phrase; |
Melodic
variation varying
the role of certain notes within a section or phrase; |
For the most part these
variations are generated intuitively by the performer. The performer's
intuitions are rooted in cultural knowledge of the effect of intonation on the
listener. This is well observed when listening to Eberhard Blum's performance
of John Cage's Sixty-two mesostics re Merce Cunningham.[2]
Here the text reading is manipulated in all of the ways given in Table 3.1 and
more. Blum uses the text as a vehicle for a wide variety of vocal expressions.
Phones, syllables and larger groups of vocal sounds are stretched, bent,
constricted and, in general, distorted from their normal use in the English
language so as to become a hybrid language existing somewhere between music and
English.
The manipulation of vocal sounds
via electronic media has been happening since recording technology became
available. Early practitioners, such as Henri Chopin, used the analogue audio
tape domain with striking results. As digital electronics became available the
palette became broader and composers were able to enhance and alter the vocal,
speech or textual input in a wider variety of ways. This can be seen in the
more contemporary works of Paul Lansky and Roger Reynolds, among others.
The use of speech in
electro-acoustic composition can be divided into three categories: direct
voice, altered voice and enhanced voice. I have used the works of three
composers which exemplify each of these categories. The discussion of the
pieces below serves to give a background to my approach to using voice and is
not intended as a proper or definitive analysis of the pieces. The pieces discussed
here are: Come out[3],
by Steve Reich; I am sitting in a room[4],
by Alvin Lucier; and Smalltalk and Late August,[5]
by Paul Lansky.
In each of these pieces the
composer creates an environment, through electro-acoustic media, in which the
text mutates without too much guidance from the composer, with the possible
exception of Smalltalk and Late August. By reducing the input of
the composer during the composition of the piece the changes that occur are
created by either the text or the environment used in the recording. This means
that what the listener hears is not so much driven by the taste of the composer
as by the text itself.
These works are precedents to
the pieces presented here and exemplify the processes of electro-acoustic
composition used for my dissertation. For each piece I list important changes
as they occur in the form of a timeline.
Steve Reich's Come Out
uses an analogue tape recording of a man saying the sentence "I had to let
the bruise blood come out to show them". His technique is to overlay
repetitions of the words "come out to show them" in such a way that
the layers move in and out of phase with themselves. This results in shifting
rhythmic patterns which draw the listener's attention away from the lexical
meaning of the words and towards the interplay of sonic patterns found within
the words. Reich describes the piece thus:
The phrase 'come out to show them' was recorded on both
channels, first in unison then channel 2 slowly beginning to move ahead. As the
phase begins to shift, a gradually increasing reverberation is heard which
slowly passes into a sort of canon or round. Eventually the two voices divide
into four and then eight[6].
The
main structural and driving element of the piece is the rhythmic counterpoint
between the voices. As this counterpoint progresses, through the perceived
adding of more iterations, the text is increasingly obscured until it becomes
unintelligible as text. Figure 3.1 is a rough melodic and rhythmic
transcription of the rhythm and contour of the main motif, "Come out to
show them".

This rhythmic motif fits very comfortably
into use as a hocket, which could be the main reason for Reich's use of it.
As the piece progresses the
three distinct sections become apparent, as Reich describes. In the first
section we hear the voice gradually gain spatial depth through perceived, not
actual, reverberation, then lose it's textual characteristics for musical
characteristics as repetitions of the voice increase, or "divide",
this "division" is the most important compositional process used in Come
Out. The entire acoustic signal used for Come Out is made up of
simply one, two, four or eight iterations of the phrase "Come out to show
them". These iterations operate as a very finely displaced hocket creating
illusions of traditional signal processing devices even though signal processing
plays no part in the composition.
Apart from choosing the sound
source, setting up the tape machines and then switching them on, Reich's only
other compositional input to Come Out was to decide when the piece
should "gradually [pass] into a canon or round for two voices, then four
voices, and finally eight"[7].
When listening to one side of
the stereo recording the doubling of the voice from one to two to four and
finally eight is striking, but is obscured when listening in stereo. Table 3.2
shows an approximate timeline of the perceived changes as Come Out
progresses; these changes are what the ear is drawn to over the duration of the
piece.
|
Time |
Perceived changes |
|
|
Single voice |
|
0" |
Complete phrase "I had to let the bruise blood
come out to show them" repeated 3 times; the phasing effect is not used. |
|
21" |
Two voices are now heard. |
|
21" |
"Come out to show
them" phrase begins and phasing effect begins. |
|
|
Slight shifts in stereo placement of the voice. |
|
|
Flanging slowly transferring into a delay; |
|
|
Depth is added through the phasing technique
simulating 'reverberation'. |
|
1' 45" |
Two voices appear, but their role as distinct
voices is not apparent. |
|
1' 50" |
"sh" sound becomes prominent. |
|
2' 0" |
The two voices become distinct as two voices. |
|
2' 19" |
The tempo and beat division of the phrase becomes
ambiguous, seeming to slightly slow down and speed up regularly. |
|
2' 59" |
Four voices are now heard. |
|
3' 0" |
Placement of voice moves in stereo field. |
|
3' 20" |
"Come out" and "show them"
become two distinct phrases. |
|
3' 55" |
The whole piece begins to be heard in a more
'reverberant' space. |
|
4' 30" |
Text becomes hard to distinguish, gradually losing
its meaning. |
|
4' 71" |
'Reverberation' becomes an important part of the
overall composition. |
|
5' 6" |
The text becomes less intelligible and more like a
musical sound source. |
|
6' 0" |
Two similar rhythmic motifs appear: come-a-come
out and show-de-show them, forming the most obvious hocket. |
|
6' 57" |
The two motifs move in stereo space. |
|
7' 10" |
"sh" sound becomes prominent again. |
|
7' 20" |
The rhythmic motifs fracture. |
|
8' 0" |
Each phone has a rhythmic pattern of its own, the
piece uses repetition to build intensity and its hocket nature becomes less
of a driving force |
|
8' 37" |
Eight voices are now heard. |
|
8' 40" |
Each phone, particularly the voiced and vowel
phones of the text, glissandos downward. |
|
9' 10" |
The downward glissandi begin to sound more
scale-like. |
|
11' 0" |
The piece sounds more like a pulsing timbre than a
succession of musical events; the effect of the phasing has reached its peak
and a very gradual fade begins. |
|
12' 54" |
Piece ends. |
This descriptive, foreground
analysis of Come Out shows how Reich's process shows features associated
with more usual musical composition processes, namely the use of motivic and
phrase repetition and variation.
While there are the obvious
repetitions of the text, other aspects of the sonic palette are also repeated.
'Reverberation', or proximity to the listener, is the foreground feature at 3'
55", 4' 17"; the perception of different numbers of voices at 2'
0", 3' 20", 6' 0"; motion in stereo space at 21" and 6'
57"; the "sh" sound at 1' 50" and 7' 10".
According to Richard Boulanger
"an important aspect of Reich's Come Out is that the natural
declamation of the text is preserved yet the speech undergoes a unique and
significant transformation"[8].
By taking this approach Reich has maintained the integrity of the text and its
reading and in doing so produced music which has evolved out of speech.
Reich's use of the repetition of
a short phrase in Come Out is influential in composing the
installation Someone, which is presented here. In Come Out shifting
repetitions of a phrase eventually obscure the textual meaning of the phrase,
and mutate the phrase from language into music. This method of transformation
by repetition of a single phrase is used and extended in Someone. In Someone
there are many more iterations of the phrase and the phrase is repeated in many
different ways. Its speed, pitch, the number of phrases being repeated at the
same time and stereo placement, all vary greatly when compared to Come Out,
where there are only two iterations of the phrase being repeated, no pitch or
speed variations and the stereo placement of each phrase is panned to hard left
and to hard right.
Throughout Someone the
text is heard in part and, in some sections, in full. Each repetition is heard
in and out of phase with each other, just as the repetitions of the text are
heard in and then out of phase in Come Out. This process serves
to spread the text over the listening area both spatially and temporally:
segments are heard in one area and then repeated in another, or a series of
segments are heard concurrently, depending on the listener's position. Segments
may follow in the order of the text or be reordered so as to lose the intended
flow of the text. Segments may also play in close temporal proximity to each
other, creating an effect similar to that of Come Out depending
on the placement of the listener.
This piece is built on an
analogue tape recording of Alvin Lucier describing what he is doing and why.
The process Lucier used was:
to record his voice onto one tape player, play it back on
another tape player through a loudspeaker and record that rendition onto the
first tape player[9].
This
process was carried out nine times for the performance of I am sitting in a
room, as it is presented in Source: music of the avant garde. The
recording was done in Lucier's living room.
The purpose of Lucier's
composition is described in the composition itself. Lucier uses this
description as the underlying and driving element of the composition. It is
what he says into the first tape recorder, creating the seed of the
composition, and is given below.
I am sitting in a room different from the one you are in
now. I am recording the sound of my speaking voice and I am going to play it
back into the room again and again until the resonant frequencies of the room
reinforce themselves so that any semblance of my speech, with perhaps the
exception of rhythm, is destroyed. What you will hear, then, are the natural
resonant frequencies of the room articulated by speech. I regard this activity
not so much as a demonstration of a physical fact, but, more as a way to smooth
out any irregularities my speech might have[10].
As each further rendition is
heard the sense of the text disappears. The cascading effect of the resonances
causes first distinct pitches to be heard, which create motifs, and then
metamorphose into a musical composition. Table 3.3 below looks in more detail
at each section, taking each rendition of the text as a section. References to
pitches are approximate.
|
Rendition |
Perceived
changes |
|
|
Rendition
1 |
Normal
text reading is recorded. |
|
|
Rendition
2 |
Reverberation
of the room becomes apparent, also the mid-range frequencies of the voice are
accentuated. The sound of Lucier's sibilances are accentuated. |
|
|
Rendition
3 |
Room
reverberation increases creating an impression of distance; it is now a
distinctive part of the sound of the voice. A single pitch is now heard,
triggered by the voice, which creates a harmony for the reading. |
|
|
Rendition
4 |
The
background noise becomes a feature in the piece. The text is becoming obscured
but is intelligible. More pitches are heard, still obviously triggered by the
voice, which create a melodic motif, or "accompaniment", around the
voice. |
|
|
Rendition
5 |
Text
is barely intelligible; whatever remaining intelligibility there is may be a
result of previous exposure to the text. The intonational pitch changes of
the reading now more obviously affect the melodic pitch changes of the
"accompaniment". |
|
|
Rendition
6 |
The
background noise is now a prominent feature and appears to have a number of
distinct pitches, forming a non-tempered cluster around C# 5. (F# and C# seem
to be the most resonant pitches of the room). The "accompaniment"
is now the main feature of the piece; the text serves as part of the timbre
of the "notes" of the "accompaniment". |
|
|
From
this point on it becomes difficult to separate the sections one from the
other. Between sections six and seven there is a tape glitch or break which
defines the beginning of the new section. |
||
|
Rendition
7 |
The
background noise now has two distinct pitches which it glides between
regularly. The text is completely obscured but the rhythm of the reading
continues to drive the piece. Three distinct parts now run simultaneously:
the background noise, the "accompaniment" and now an adjunct to the
"accompaniment" follows it but seems to have a different motif. |
|
|
Rendition
8 and Rendition 9 |
From
here there is a general smoothing of the overall sound of the piece. The text
has degraded to the point of being unintelligible and it is now difficult to
hear even the intonational aspects, which are now obscured as the piece takes
on all the aspects of music and loses all aspects of speech. |
|
Alvin Lucier uses repetition as
the main structural element in I am sitting in a room. He uses the many
and changing resonances that are produced within a room by the intonational
aspects of his text reading to create music from speech. This transformation of
speech into music using resonance is reflected in Someone. Here forty
artificial rooms have been created for the whole text and text segment to
resonate in. As in I am sitting in a room the pitch and duration of each
resonance is affected by the intonational aspects of the reading, creating
melodies and harmonies which transform the text reading into a musical piece.
In Someone and I am sitting in a room the resonances that we hear
created by the voice are amplified through repetition.
The difference in Someone
is in the characteristics of the artificial rooms. The dimensions of each room
change as the intonational aspects (rhythm, timbre, pitch and amplitude) of the
reading change, thus changing the qualities of each resonance and its effect on
the reading. By doing this the reading becomes the only causal aspect in the
piece. In I am sitting in a room the reading and the room have equal
roles in creating the piece; in Someone the reading has become the only
agent in creating the piece.
I am sitting in a room
influences Someone in two ways. First in the use of repetition of a long
phrase: in the case of Someone the whole poem is repeated; and second,
by using the intonational aspects of that text reading as the trigger to alter
the sound of both the whole poem and the segment of the poem. This process is
discussed later in chapter three under the heading "The computer and
composition processes used inSomeone."
Paul
Lansky's Smalltalk and Late August also alter the voice to
become unintelligible in a lexical sense. According to Lansky
Conversation [has the] ability
to change its nature when one no longer concentrates on the meaning of the
words. [What is heard is the] intonations, rhythms and contours of the speech.[11]
Smalltalk
is based on a recording of a domestic conversation between Lansky and his
wife, Hannah MacKay. The recording was treated on a DEC Micro Vax II running
software Lansky wrote for the project. This resulted in obscuring "the
words we spoke while capturing the rhythms, pitches and contours of our
conversation"[12].
Late
August resulted from Lansky wondering
what would happen if I tried the
same sort of thing with another language, say Chinese, in which pitch and
contour have different meanings. [The result] is similar to Smalltalk on
the surface, but quite different in substance the sound world of the Chinese
language led to a very different kind of music[13].
While
the music may be of a different substance and kind this surface similarity can
make distinguishing the two pieces difficult, especially in the first few times
they are listened to. This may be due to Lansky's use of conversational rather
than highly structured and stylised text, such as a poem. His desire to create
tonal centres that do not appear to be based in the pitch field of the voices
also obscures the difference between the substance of Smalltalk and Late
August.
Table
3.4 describes the perceived changes over time in both Smalltalk and Late
August. In this description of the changes over time in both pieces it is
not essential that any of the textual foreground be mentioned. This aspect of
the pieces is subject to and obscured by the effects that Lansky applies to it.
To include a more detailed description of the foreground melody or harmonic
changes would also obscure the description of Lansky's large scale structural
composition of the pieces.
|
Smalltalk |
Late August |
||
|
Time |
Perceived changes |
Time |
Perceived changes |
|
0'0" |
Melodic foreground and middle tessitura. |
0'0" |
Melodic foreground with high, wide tessitura. The
accompaniment is based on aspirant-like sounds. |
|
38" |
Faint single note accompaniment background. This
accompaniment is based on back vowel-like sounds[14]. |
|
|
|
1'13" |
Accompaniment background becomes lower in pitch. |
|
|
|
1'22" |
Return to original accompaniment pitch. |
|
|
|
1'35" |
Accompaniment becomes more dense. |
|
|
|
1'45" |
Accompaniment background an octave lower. |
|
|
|
2'07" |
Complete change in background, moves to a different
scale/key. |
|
|
|
2'35" |
Background accompaniment increases in activity. |
|
|
|
|
|
|
|
|
|
|
3'02" |
Large change in accompaniment. |
|
3'29" |
Return to original background accompaniment harmony/scale. |
|
|
|
3'54" |
Accompaniment shift. |
|
|
|
|
|
4'02" |
Change in accompaniment harmony. |
|
|
|
4'36" |
Increase in foreground and background activity. |
|
|
|
4'51" |
Accompaniment lowers in tessitura. |
|
|
|
5'08" |
Accompaniment raises in tessitura. |
|
5'19" |
New accompaniment harmony. |
|
|
|
|
|
6'18" |
Accompaniment change. |
|
6'32" |
Loud background accompaniment, increased harmonic motion. |
|
|
|
7'50" |
Background tessitura raised, the effect of the panning
becomes less pronounced. This could be due to the ear getting accustomed to
the panning activity. |
|
|
|
|
|
8'17" |
Accompaniment drops out. |
|
10'0" |
Changed accompaniment background as if moving to a
different key. |
|
|
|
|
Gradual fade out begins. |
|
|
|
|
|
11'18" |
Change in accompaniment harmony. |
|
12'44" |
End. |
|
|
|
|
|
13'45" |
End. |
Lansky
uses a long segment of improvised text as the structuring element for both Smalltalk
and Late August. In each case natural speech is altered through an
imposed and entirely computer dependent process to produce the audible surface.
The
techniques used in this process appear to be mostly comb filters with short
feedback times and synthetic or sampled voice-like sounds. The comb filters are
tuned around the frequencies of the speech used as well as being reflected in
the background harmonic drone.
Both
Late August and Smalltalk use quartal or triad based harmonic
sequences in the voice like background drone. By using these fairly traditional
and well understood processes Lansky has been able to make an unusual, and
perhaps challenging, idea in a more well known and less challenging context.
This makes the more challenging aspects of the two compositions easier to
digest.
Some
of the techniques used by Lansky are also used in Someone. In Someone
sets of comb filters driven by the spoken text are used to create a harmonic,
pitch based foreground. In this foreground the text is obscured, though not as
heavily as in Lansky's two pieces; this is especially evident in Parts Five,
Six, Seven and Eight where only the text is used.
The
composition methods and style of Someone are influenced by the methods
and styles of the three voice- and text-based electro-acoustic compositions
given above. Each of the three pieces offers a precedent which has been
expanded upon in Someone exclusively in the digital domain.
While
it may not be immediately obvious how each of the three styles of
electro-acoustic composition (the Direct Voice, Altered Voice and the Enhanced
Voice) are used, their processes are either directly appropriated or used as a
starting point from which the composition processes used in Someone are
drawn.
Someone uses a reading by
Barry Dickins of his poem Saint Dymphna's Bells. The poem is Dickins'
commentary on the last execution carried out in the state of Victoria, the
execution of Ronald Ryan for the murder of a prison guard while attempting his
escape. The reading is very expressive; the pitch, volume, timbre and tempo
fluctuations enhance the sense of impotence, tragedy and perverse justice that
informs the poem.
The composition presented is an
eight channel installation of indeterminate duration designed to create an
aural environment for a transient or stationary audience. It is designed to be
heard in a large space either as the main focus for the audience or as part of
a performance or exhibition involving other art forms such as a dance
performance, a video presentation, or a painting or sculpture exhibition. It is
presented on four stereo compact discs which may also be listened to
individually.
The piece uses a ten second
sample, the first four lines, of Dickins' reading to provide a setting
in which the entire text can be heard. For large sections of the piece this
setting is all that is heard, and therefore it functions both as a background
and as a foreground.
The text segment used is:
Someone
rang Saint Dymphna's Bells,
Someone
did.
At
precisely eight in the country morning,
Someone
did.
The
entire text is given in Appendix 6.
When looked at purely as a set
of phonemes, that is, when only the sound of the text and not the connotative
or denotative meanings are considered, the repetition of the four lines of the
poem creates a phonemic motif around which the other phonemes are based. Table
3.5 gives a large scale structural analysis of the text segment used. In
deciding on the sections A and B of this analysis, the rhythm and pitch
contours of the reading are taken into account. These aspects are examined in
Table 3.5 on the following page.
A1 / Aextension
Someone
rang / Saint Dymphna's Bells,
A2
Someone
did.
B
At
precisely eight in the country morning,
A2
Someone
did.
Table 3.6 shows a large scale
description of the intonation characteristics of the text segment. Pitch curve
is shown by the relative height of the line. Rhythm and elision is shown by the
gaps in the line. A continuous, static, rhythm is shown by a continuous line,
as is the reader's use of elision. Broken speech is shown by a broken line.

The simple sonata-like form of
this segment of the reading creates the sense of direction and return inherent
within that form. As the text segment used has a major structural role it is
essential that the segment be recognised to have the attributes of well
constructed musical phrase and that it be easily recognisable as musical
structure in itself. This is because repetitions of the phrase are fed through
a set of continually changing signal processing algorithms, altering the
surface sound of the phrase. The use of repetition within the phrase, the word Someone,
gives the listener a particular sound within the phrase to become familiar
with; as the repetition of the phrase continues this familiarity extends to the
phrase itself.
The repetitions in the text can
be heard at up to twenty-eight different speeds at the one time throughout the
piece. As well as these differing speeds the text is heard harmonised in up to
eight different ways and at an almost infinite set of pitch levels due to the
twenty-eight possible glissandi speeds at which the text is iterated.
As mentioned above Someone was
created using algorithms created using the IRCAM Signal Processing Workstation
(ISPW) and then edited using Digidesign's ProToolsª and SoundDesignerª. The
ISPW algorithms were created using the standard libraries available in release
0.24. These algorithms produced near final pieces; ProTools and Sound Designer
were used to perform topping and tailing and normalising duties.
The ISPW algorithm can be
divided into four sections or sub-algorithms: a granular process which acts
mainly as a time stretching and glissando producing device; the Fast Fourier
Transform process; two banks of comb filters, which act as a harmonising
device; and a panning algorithm. The result of the first three sub-algorithms
is finally fed through the panning algorithm. Figure 3.2 shows an overview of
the main algorithm and the flow of sonic information through the
sub-algorithms.

Here the digitally recorded
sound file of the poetry reading is repeated as a six minute loop. This allows
a twenty second gap between each repetition of the sound file. These
repetitions of the sound file are occasionally heard throughout the piece,
either after being filtered through the comb filters and/or in its natural
state. It is also used to vary the amount of feed back of each of the two banks
of comb filters, as discussed below in the section titled Comb filters.
Here the ten second samples of
the sound file are played through seven play back algorithms in sixty millisecond
grains. The beginning point of each sixty millisecond grain moves at varying
speeds through the ten second sample. This means that the sample appears to be
stretched or shortened, depending on the speed with which the starting point
moves through the sample. The seven sample play back units, called samplePlay,
play the ten second sample at different speeds and pitches, as discussed below.
The speed at which each grain
moves through the sample is set by a division of the total duration of the Part
by increasing numbers from the Fibonnaci series. Part One, for example, uses
eight repetitions of the poem and has a duration of forty-eight minutes (2880
seconds) before it repeats. The duration of the ten second sample in each of
these samplePlay units is listed below. The samplePlay units are numbered
according to the Fibonacci series.
Describing the duration of each
sample playback in Part One:
samplePlay unit one will take
the full 2880 seconds to play through the ten second sample;
samplePlay unit two will take
1440 seconds to play through the sample (2880 divided by 2);
samplePlay unit three will take
960 seconds to play through the sample (2880 divided by 3);
samplePlay unit five will take
576 seconds to play through the sample (2880 divided by 5);
samplePlay unit eight will take
360 seconds to play through the sample (2880 divided by 8);
samplePlay unit thirteen will
take 221.538[15] seconds to play
through the sample (2880 divided by 13);
samplePlay unit twenty one will
take 137.142 seconds to play through the sample (2880 divided by 21).
This list uses
Part One as an example; the process of dividing the total duration of the Part
by increasing numbers from the Fibonacci series is identical in each Part.
Each grain of the sample is
played back at speeds varying between 15 milliseconds and 240 milliseconds.
Changing the speed of sample playback causes pitch shifts and glissandi to be
heard; in this case the glissandi span four octaves. Figure 3.3 shows this
process; the black playback window, or grain, loops continuously from its
starting point.

Each
samplePlay unit reproduces the granulated sample with different glissandi. The
width of the glissandi is four octaves, ranging from two octaves below to two
octaves above the nominal pitch of the incoming granular signal. The speed of
the glissandi is set by dividing the duration of each Part by a process similar
to that used to set the overall length of each Part. For example: if the
duration of the Part is 2880 seconds, as in Part One, the glissandi speed of
each of the seven samplePlay units uses two adjacent numbers from the Fibonnaci
series to divide the total duration of the Part. This creates an elliptical
shaped loop.
SamplePlay unit one does not use
any glissandi, it maintains this nominal pitch throughout the Part.
The
glissandi process for samplePlay unit two divides the total duration of the
Part by 3 and glissandos from the playback limits of 15 to 240. This results in
an ascending glissandi of two octaves. To return from 240 to 15 the total
duration of the Part is divided by 2 resulting in a descending glissandi of two
octaves to return from 240 to 15.. This means that in the case of Part One, for
example, which lasts 48 minutes, samplePlay 2 takes 960 seconds to go from 15
to 240 and 1440 seconds to return from 240 to 15.
The
glissandi for samplePlay unit three takes the total duration of the part
divided by 5 to go from 15 to 240 and divided by 3 to return.
The
glissandi for samplePlay unit five takes the total duration of the part divided
by 8 to go from 15 to 240 and divided by 5 to return.
The
glissandi for samplePlay unit eight takes the total duration of the part
divided by 13 to go from 15 to 240 and divided by 8 to return.
The
glissandi for samplePlay unit thirteen takes the total duration of the part
divided by 21 to go from 15 to 240 and divided by 13 to return.
The
glissandi for samplePlay unit twenty one takes the total duration of the part
divided by 34 to go from 15 to 240 and divided by 21 to return.
"Fourier transformation can
be used to associate a unique spectrum with any waveform. The spectrum shows,
in effect, how to construct the analysed (sic) waveform out of a set of
sinusoidal harmonics, each with a particular amplitude and phase"[16]. In this case
the spectra is represented in ten sets of sinusoidal harmonics, each of which
is tuned to be sensitive to ten distinct spectrum of the reader's voice.
The spectral motion of the
poetry reading is sampled every fifteen milliseconds, and this sample is
represented numerically by the sampeek~ object. As these numbers change
according to the changing spectra of the poetry reading they set the amount of
feedback each comb filter is allowed. The amount of feedback is then scaled to
create ever changing overlaps in the pitch field created by each of the ten
comb filters. This results in shifting harmonies, triggered by the voice of the
reader, being heard.
In this composition the Max fft~
object is used to analyse the changing
spectra of the readers voice. The resulting information is then used to control
the bank of ten comb filters.
A comb filter is "similar
to a tape loop delay echo. As long as the feedback gain is less than [the
amplitude of the signal] the impulse response consists of a series of repeating
echoes that [change in inter-onset time and decrease in amplitude, or feedback][17].
If the delay time used in a comb filter is set to between the
frequencies that produce pitches, say twenty Hertz to twenty kiloHertz, an
extra pitch produced by the delay can be heard along with the input signal. By
altering the feedback time of the comb filter the duration of the extra pitch
can be altered. Here the feedback of the ten filters is changed dynamically by
the FFT process given above.
The ten filters are divided into
two banks of five; each bank has a seed value to set its resonance, this value
remains static for the duration of its respective Part. This seed value is
multiplied by a floating point number derived by reversing the ratios used to
build a scale in the Pythagorean tuning system[18]. For example:
if the seed value for one bank is 6 and is multiplied by 0.75, the first filter
resonates at 6 milliseconds (166.666 Hz), the second comb filter at 4.5
(222.222 Hz), the third resonates at 3.375 milliseconds (296.296 Hz), the
fourth at 2.531259 milliseconds (395.061 Hz) and the fifth at 1.8984375
milliseconds (526.748 Hz). This results in a set of stacked fourths.
The tuning of each stack is set
to oscillate 0.01 either side of the multiplication number used to create the
proper interval for the stack. For example in the case of the stacked fourths
mentioned above, the tuning of each filter in the stack oscillates between 0.749
and 0.751. By moving in and out of tune, additional harmonies and melodies are
created by the beat frequencies that occur; these often sound like sine tones.
The speed of the oscillation
around the interval is set by the panning process, which is defined below. For
example: in Part One, where there are stacked out of tune fourths an octave
apart, the tuning takes the total time of the Part divided by 55 to travel from
beneath the perfect fourth to above it and the total time divided by 34 to
return.
The speed of the oscillations is
set by the overall length of the part. In Part One, for example, one bank takes
approximately 84.705 seconds to travel from 0.749 to 0.751 and approximately
52.363 seconds to return, the other bank reverses this, taking 52.363 seconds
to travel from 0.749 to 0.751 and 84.705 to return.
The audio signal from each bank
of comb filters moves in the stereo space. This panning process uses
elliptically shaped loops based on Fibonacci divisions of the overall duration
of each of the nine Parts used in the installation. The panning of one
signal from the comb filter banks across the stereo spread takes the total
duration of the Part divided by 55 to get from one side to the other and total
duration divided by 34 to return. The signal from the other bank of comb
filters follows the same panning motion but is delayed by the total duration of
the Part divided by 89.
For example: in Part Four, which
has a total duration of 727.992 seconds, the signal from one comb filter bank
takes 132.362 seconds to move from one side of the stereo field to the other
and 21.411 seconds to return. This same movement is delayed by 8.197 seconds
for the signal coming from the other filter bank.
Someone is made up of nine
Parts and each Part is stored on four compact discs. The number of repetitions
of the text in each part is based on the Fibonacci series:
Part One lasts forty-eight
minutes and is made up of eight repetitions of the poem. The actual length of
the reading is five minutes and fifty-one seconds; adding the extra nine
seconds allows time to delineate the repetitions with a short period of less
activity. The comb filter banks are tuned to be an octave apart, the seed
values for each bank being set to six and twelve and then multiplied by a
number between 0.749 and 0.751 to produce a stack of intervals of slightly out
of tune fourths. The information under the heading Comb filters above
gives a more detailed description of the process used in creating the stacks; a
similar process is used in all the other parts except Part Nine.
Part Two has five repetitions
with the comb filter bank seed values set to 7.992[19] and 5.322,
which produce intervals of a fifth from the frequency of the seed value of
twelve used in Part One, and multiplied by a number between 0.665 and 0.667 to
produce a stack of intervals of slightly out of tune fifths.
Part Three has three repetitions
and lasts eighteen minutes. The comb filter bank seed values are set to 10.125
and 8.542, producing intervals of a minor third from the original seed value of
twelve. Each bank is multiplied by a number between 0.842 and 0.844, producing
a stack of intervals of slightly out of tune minor thirds.
Part Four has two repetitions
with the comb filter bank seed values set to 7.593 and 4.804, which produce
intervals of a minor sixth from the seed value of twelve, and multiplied by a
number between 0.631 and 0.633 to produce a stack of intervals of slightly out
of tune minor sixths.
Part Five lasts six minutes and
is one rendition of the poem. This rendition is fed through the same set of
comb filter banks as Part One. In this Part, as in Parts Six, Seven and Eight,
the text reading and granulation of the text segment for Part One is fed
through the comb filters.
Part Six also lasts six minutes
and is one rendition of the poem. This rendition uses the same comb filter
tuning as Part Two. The text reading and granulation of the text segment for
Part two is fed through the comb filters.
Part Seven lasts six minutes, is
one rendition of the poem and uses the same comb filter tuning as Part Three.
The text reading and granulation of the text segment for Part Three is fed
through the comb filters.
Part Eight lasts six minutes, is
one rendition of the poem and uses the same comb filter tuning as Part Four.
The text reading and granulation of the text segment for Part Four is fed
through the comb filters.
Part Nine lasts six minutes and
is one rendition of the poem without any adjustments or modifications: it is
the actual recording of the poetry reading.
On the four compact discs which
make up Someone each Part is represented twice, except for Part Nine,
which is represented on each disc:
Disc one, channels one and two, contains
Part One, Part Four, Part Five and Part Nine;
Disc two, channels three and
four, contains Part Two, Part Three, Part Six and Part Nine;
Disc three, channels five and
six, contains Part Three, Part Two, Part Seven and Part Nine;
Disc four, channels seven and
eight, contains Part Four, Part One, Part Eight and Part Nine.
The normal performance of Someone
is as an installation of indeterminate duration, using the four compact discs
as prescribed in the scheme above. A submitted compact disc, Compact Disc Two,
provides a fifteen minute study made up of a mix of all the Parts of Someone.
This is the version of the piece discussed in the evaluation.
Each Part on each compact disc
is a stereo rendition of Someone and can be listened to as individual
stereo versions of the piece. If this is the listening choice then it should be
listened to as a part of the aural environment. This approach to composition
follows in the traditions of "ambient" music such as Music for
Airports, composed by Brian Eno, and Vexations by Erik Satie.
While Someone is not
necessarily designed to be listened to as the sole focus of the listener,
excerpts of it can be used in a more traditional concert setting. If it is
presented in this way then any of the Parts can be used, as can sections of any
Part. It is also possible to present any number of Parts in a traditional
concert setting. In this case any number of Parts can be selected, either
randomly or intentionally, and mixed together to be performed simultaneously
through whatever sound system is available for the concert.
When Someone is presented
to an audience as an installation each of the stereo channels should be
regarded as separate and distinct mono parts, even though there are definite
stereo relationships within each Part. Figure 3.4 shows the placement of each
speaker in a rectangular room. The placement of each speaker should be as far
apart from each other as possible and the amplitude for each channel should be
equal and set to such a level that the two nearest speakers can be easily heard
but not so loud as to drown out conversation.

It is not necessary for Someone
to be presented in a square or rectangular room, however it is essential that
the speakers for channels one and two, and three and four, be in the corners of
the room and as far apart as possible. This will create a diagonal movement of
the sound. The speakers for channels five and six must be as far from each
other as possible and in the centre of the walls of the room; the same applies
to the speakers for channels seven and eight. In this case the stereo movement
of the sound should create a circular motion. It is important that the eight
channels be balanced so that a listener standing in the centre of the room
hears each channel at an equal amplitude.
If Someone is presented
in a number of rooms then it is important that each of the channels be grouped together
as close to their numerical order as possible. For example: channels one, two
and three in one room, channels four and five in another room, and channels
six, seven and eight in another room.
The Parts on each compact disc
can be selected randomly or played in the order in which they appear on the
compact discs. If the Parts are played in order the compact disc players should
be set to continually repeat for the duration of the installation, otherwise
they should be set to play randomly for the duration of the installation.
An
evaluation of Someone must examine three separate and distinct areas:
first, how effective this process of creating a piece of music from a text
reading is; second, how effective the result is in reflecting the content of
the text; and third, the effect of its presentation as an installation. The
second area is not essential to the goal of the thesis presented here, the
creation of a musical composition from a text, but is important as the
intentions of the poem are influential in the composition processes taken. An
obvious example of this influence, from a compositional point of view, is the
decision to make the poem audible in section of Someone, Part Nine of
the composition. Also, the content of the text is of great influence in
Dickins' reading. His intonations are generated by the mood of this content and
it is these intonations that drive both the underlying, structural aspects and
the audible surface of Someone.
Because Someone
is presented as an installation the role of the listener in their
perception of the piece must also be accounted for. The listener is able to
move about within the audio area and thereby to select, at first somewhat
randomly and then by either conscious or unconscious design, the Parts of the
piece they hear. In doing so they create their own personal mix of Someone.
The
rendition of Someone presented on compact disc two is discussed for this
evaluation. This rendition attempts to simulate a listener's experience of the
piece. Each of the Parts is mixed together in such a way as to simulate the
movement of a listener through a virtual room in which Someone is being
performed.
The
harmonies that are created in Someone vary in colour and emotional
impact, from the dark minor thirds and minor sixths to the lighter fourths and
fifths. These intervals were chosen to exemplify the opposing emotional states
of the poem, the horror of murder and execution and the compassion shown by the
anonymous ringing of Saint Dymphna's bells. The tuning shifts of each of the
intervals also create other ancillary harmonies; these are repercussions of the
overall harmonic action.
The
listener can move to a place where the mix of the Parts suits them best. This
is where Someone's success lies, in the ability of the listener to
create their own relationship with the piece and its content.
[1]Rolf Carlson,
Anders Friberg, Lars Frydn, Blrn Granstrm and Johan Sundberg, 'Music and
speech performance: Parallels and contrasts.' Contemporary Music Review
4, 1989, pp. 391-404.
[2] John Cage, Sixty-two
Mesostics re Merce Cunningham for Voice Unaccompanied using microphone. Hat
Hut Records Ltd, 1991.
[3] Steve Reich,
'Come Out.' Music Of Our Time: New Sounds In Electronic Music. Producer
David Behrman. CBS, Inc., 1972.
[4] Alvin Lucier,
'I Am Sitting in a Room.' Source record number three. Source, 4, 7. BMI,
1970.
[5] Paul Lansky,
'Smalltalk.' Smalltalk. New Albion Records, 1990.
[6] Reich, op.
cit.., liner notes.
[7] Steve Reich,
'Come Out.' Early Works. Nonesuch Records, 1987. Liner notes, page 2.
[8] Richard
Boulanger, The Transformation of Speech Into Music: A Physical Exploration
and Interpretation of Two Recent Digital Filtering Techniques. Phd. thesis,
UCSD, 19.4.85, p. 31.
[9] Lucier, op.
cit., p. 60.
[10] Alvin Lucier,
'I Am Sitting In A Room.' I Am Sitting In A Room, Lovely Music, 1990,
Liner notes.
[11]Lansky, op.
cit., Liner notes, p.1.
[12] ibid., p.1.
[13] ibid., p.1.
[14] A back vowel is
produced in the back of the vocal tract, nearest the glottis.
[15] Frequencies and
durations are taken to three decimal points.
[16] F. Richard
Moore, Elements of Computer Music. Englewood Cliffs, New Jersey,
Prentice-Hall, 1990, p. 29.
[17] ibid.., p. 381.
[18] Don Randel, Harvard
Concise Dictionary of Music. Cambridge , Mass, Belknap Press, 1978. p.238.
[19] Again the
frequencies are to three decimal points. As the tuning of each bank is
continually shifting the actual frequency is not critical.