H        O        M        E

Chapter 3: COMPUTER MANIPULATIONS OF A DIGITIZED AUDIO PERFORMANCE OF A POETRY READING.

Aspects of text reading and music performance

Electro-acoustic examples of speech manipulation

Direct voice: Come Out

Altered voice: I am sitting in a room

Enhanced voice: Smalltalk and Late August

Composition of the piece Someone

The computer and composition processes used in Someone

The algorithms used for Someone and their functions

Sound file playback

Granulation process

Glissandi process

Speed of the different glissandi used in Part One

FFT process

Comb filters

Panning

Construction of the Nine Parts

EVALUATION

Chapter 3

COMPUTER MANIPULATIONS OF A DIGITIZED AUDIO PERFORMANCE OF A POETRY READING

                This chapter looks at the process taken in deriving a musical composition from a poetry reading. To this end a recording of a reading of the poem Saint Dymphna's Bells by its author Barry Dickins, is manipulated using algorithms built with the IRCAM Signal Processing Workstation (ISPW) to produce the musical work Someone (compact discs Three, Four, Five, Six and Seven). The whole poem and a ten second segment from the beginning of the poem are repeated and adjusted in several ways to produce an installation of indeterminate duration.

                The installation is made up of nine separate parts of different durations and incorporating different ways of using similar techniques; the parts and the techniques used to produce them are discussed below in the section titled: The composition processes used in Someone.

Aspects of text reading and music performance

                In the paper Music and speech performance: Parallels and contrasts. Rolf Carlson, and others, put forward that:

in speech many different prosodic factors are mixed together as one single acoustic parameter. [For example] segmental inherent pitch, word tone, sentence type, lexical stress, emphasis etc. can all be signalled in one single parameter, such as the voice fundamental frequency. In the same way, the duration of speech sounds is affected by a variety of conditions including stress, position in the utterance, and local phonetic context.

                The same applies to music. There are many different reasons to lengthen or shorten a note beyond its nominal duration [For example] emphasis, marking of phrase endings, and sharpening the contrast between categories[1].

                Lengthening or shortening note durations are just one of the many tools available to musicians when interpreting music. Other tools include varying the pitch, amplitude and/or timbre of a note or phrase. These tools serve to allow each musician to imbue a composition with their own expressiveness.

                The degree to which a musician can interpret a composition is dependent on the idiom in which he or she is playing. For example: jazz musicians are expected to be able to interpret and extemporise on a pre-existing melody within the jazz idiom and in order to maintain idiomatic correctness. The degree of variations that can be made and the palette of variations available in this idiom are quite broad, depending on the idiomatic subset of the jazz idiom they are playing in.

                On the other hand, non-improvising musicians, such as classical orchestral musicians and classical music soloists, are expected to interpret with a smaller palette. Here the art of interpretation is far more critical. A musician whose role is as part of an orchestra is expected to subject their own interpretation of a composition to that of the conductor, who in turn is subject to the composer's act of self-expression, that is, the composition.

                Table 3.1 lists some of the variables that are available to musicians and speakers in adding a degree of self-expression when performing a text or composition. Under each heading three possible variables are listed; there are, of course, other options available to the performing speaker or musician.

Table 3.1 Variables in music and speech performance and composition.

Possible Speech Variables

Possible Musical Variables

3.1.1 Amplitude variation

a) varying the emphasis placed on certain syllables within a word or phrase;

b) wholly increasing or decreasing the amplitude of a word or phrase;

c) varying the amplitude within a syllable.

 

Amplitude variation

a) varying the emphasis placed on certain notes within a section or phrase;

b) wholly increasing or decreasing the amplitude of a note or phrase;

c) varying the amplitude within a note.

3.1.2 Pitch variation

a) varying the pitch of certain syllables within a word or phrase;

b) wholly raising or lowering the pitch of a word or phrase;

c) varying the pitch within a syllable.

 

Pitch variation

a) varying the pitch of certain notes within a section or phrase;

b) wholly raising or lowering the pitch of a section or phrase;

c) varying the pitch within a note.

3.1.3 Rhythmic variation

a) varying the inter-onset time between certain syllables within a word or phrase;

b) wholly increasing or decreasing the duration of a word or phrase;

c) varying the inter-onset times or durations of syllables.

 

Rhythmic variation

a) varying the inter-onset time between certain notes within a section or phrase;

b) wholly increasing or decreasing the duration of a note, section or phrase;

c) varying the inter-onset times or durations of notes.

3.1.4 Timbre variation

a) varying the timbre of certain syllables within a word, or words within a phrase;

b) wholly changing the timbre of a word or phrase;

c) varying the timbre of a syllable.

 

Timbre variation

a) varying the timbre of certain notes within a section or phrase;

b) wholly changing the timbre of a note or phrase;

c) varying the timbre of a note.

 

3.1.5 Lexical variation

varying certain words or certain syllables within a word or phrase;

Melodic variation

varying the role of certain notes within a section or phrase;

                For the most part these variations are generated intuitively by the performer. The performer's intuitions are rooted in cultural knowledge of the effect of intonation on the listener. This is well observed when listening to Eberhard Blum's performance of John Cage's Sixty-two mesostics re Merce Cunningham.[2] Here the text reading is manipulated in all of the ways given in Table 3.1 and more. Blum uses the text as a vehicle for a wide variety of vocal expressions. Phones, syllables and larger groups of vocal sounds are stretched, bent, constricted and, in general, distorted from their normal use in the English language so as to become a hybrid language existing somewhere between music and English.

Electro-acoustic examples of speech manipulation

                The manipulation of vocal sounds via electronic media has been happening since recording technology became available. Early practitioners, such as Henri Chopin, used the analogue audio tape domain with striking results. As digital electronics became available the palette became broader and composers were able to enhance and alter the vocal, speech or textual input in a wider variety of ways. This can be seen in the more contemporary works of Paul Lansky and Roger Reynolds, among others.

                The use of speech in electro-acoustic composition can be divided into three categories: direct voice, altered voice and enhanced voice. I have used the works of three composers which exemplify each of these categories. The discussion of the pieces below serves to give a background to my approach to using voice and is not intended as a proper or definitive analysis of the pieces. The pieces discussed here are: Come out[3], by Steve Reich; I am sitting in a room[4], by Alvin Lucier; and Smalltalk and Late August,[5] by Paul Lansky.

                In each of these pieces the composer creates an environment, through electro-acoustic media, in which the text mutates without too much guidance from the composer, with the possible exception of Smalltalk and Late August. By reducing the input of the composer during the composition of the piece the changes that occur are created by either the text or the environment used in the recording. This means that what the listener hears is not so much driven by the taste of the composer as by the text itself.

                These works are precedents to the pieces presented here and exemplify the processes of electro-acoustic composition used for my dissertation. For each piece I list important changes as they occur in the form of a timeline.

Direct voice: Come Out

                Steve Reich's Come Out uses an analogue tape recording of a man saying the sentence "I had to let the bruise blood come out to show them". His technique is to overlay repetitions of the words "come out to show them" in such a way that the layers move in and out of phase with themselves. This results in shifting rhythmic patterns which draw the listener's attention away from the lexical meaning of the words and towards the interplay of sonic patterns found within the words. Reich describes the piece thus:

The phrase 'come out to show them' was recorded on both channels, first in unison then channel 2 slowly beginning to move ahead. As the phase begins to shift, a gradually increasing reverberation is heard which slowly passes into a sort of canon or round. Eventually the two voices divide into four and then eight[6].

The main structural and driving element of the piece is the rhythmic counterpoint between the voices. As this counterpoint progresses, through the perceived adding of more iterations, the text is increasingly obscured until it becomes unintelligible as text. Figure 3.1 is a rough melodic and rhythmic transcription of the rhythm and contour of the main motif, "Come out to show them".

Figure 3.1 Melodic structure of Come out.

                This rhythmic motif fits very comfortably into use as a hocket, which could be the main reason for Reich's use of it.

                As the piece progresses the three distinct sections become apparent, as Reich describes. In the first section we hear the voice gradually gain spatial depth through perceived, not actual, reverberation, then lose it's textual characteristics for musical characteristics as repetitions of the voice increase, or "divide", this "division" is the most important compositional process used in Come Out. The entire acoustic signal used for Come Out is made up of simply one, two, four or eight iterations of the phrase "Come out to show them". These iterations operate as a very finely displaced hocket creating illusions of traditional signal processing devices even though signal processing plays no part in the composition.

                Apart from choosing the sound source, setting up the tape machines and then switching them on, Reich's only other compositional input to Come Out was to decide when the piece should "gradually [pass] into a canon or round for two voices, then four voices, and finally eight"[7].

                When listening to one side of the stereo recording the doubling of the voice from one to two to four and finally eight is striking, but is obscured when listening in stereo. Table 3.2 shows an approximate timeline of the perceived changes as Come Out progresses; these changes are what the ear is drawn to over the duration of the piece.

Table 3.2 Timeline of perceived changes in Come Out.

Time

Perceived changes

 

Single voice

0"

Complete phrase "I had to let the bruise blood come out to show them" repeated 3 times; the phasing effect is not used.

21"

Two voices are now heard.

21"

"Come out to show them" phrase begins and phasing effect begins.

 

Slight shifts in stereo placement of the voice.

 

Flanging slowly transferring into a delay;

 

Depth is added through the phasing technique simulating 'reverberation'.

1' 45"

Two voices appear, but their role as distinct voices is not apparent.

1' 50"

"sh" sound becomes prominent.

2' 0"

The two voices become distinct as two voices.

2' 19"

The tempo and beat division of the phrase becomes ambiguous, seeming to slightly slow down and speed up regularly.

2' 59"

Four voices are now heard.

3' 0"

Placement of voice moves in stereo field.

3' 20"

"Come out" and "show them" become two distinct phrases.

3' 55"

The whole piece begins to be heard in a more 'reverberant' space.

4' 30"

Text becomes hard to distinguish, gradually losing its meaning.

4' 71"

'Reverberation' becomes an important part of the overall composition.

5' 6"

The text becomes less intelligible and more like a musical sound source.

6' 0"

Two similar rhythmic motifs appear: come-a-come out and show-de-show them, forming the most obvious hocket.

6' 57"

The two motifs move in stereo space.

7' 10"

"sh" sound becomes prominent again.

7' 20"

The rhythmic motifs fracture.

8' 0"

Each phone has a rhythmic pattern of its own, the piece uses repetition to build intensity and its hocket nature becomes less of a driving force

8' 37"

Eight voices are now heard.

8' 40"

Each phone, particularly the voiced and vowel phones of the text, glissandos downward.

9' 10"

The downward glissandi begin to sound more scale-like.

11' 0"

The piece sounds more like a pulsing timbre than a succession of musical events; the effect of the phasing has reached its peak and a very gradual fade begins.

12' 54"

Piece ends.

                This descriptive, foreground analysis of Come Out shows how Reich's process shows features associated with more usual musical composition processes, namely the use of motivic and phrase repetition and variation.

                While there are the obvious repetitions of the text, other aspects of the sonic palette are also repeated. 'Reverberation', or proximity to the listener, is the foreground feature at 3' 55", 4' 17"; the perception of different numbers of voices at 2' 0", 3' 20", 6' 0"; motion in stereo space at 21" and 6' 57"; the "sh" sound at 1' 50" and 7' 10".

                According to Richard Boulanger "an important aspect of Reich's Come Out is that the natural declamation of the text is preserved yet the speech undergoes a unique and significant transformation"[8]. By taking this approach Reich has maintained the integrity of the text and its reading and in doing so produced music which has evolved out of speech.

                Reich's use of the repetition of a short phrase in Come Out is influential in composing the installation Someone, which is presented here. In Come Out shifting repetitions of a phrase eventually obscure the textual meaning of the phrase, and mutate the phrase from language into music. This method of transformation by repetition of a single phrase is used and extended in Someone. In Someone there are many more iterations of the phrase and the phrase is repeated in many different ways. Its speed, pitch, the number of phrases being repeated at the same time and stereo placement, all vary greatly when compared to Come Out, where there are only two iterations of the phrase being repeated, no pitch or speed variations and the stereo placement of each phrase is panned to hard left and to hard right.

                Throughout Someone the text is heard in part and, in some sections, in full. Each repetition is heard in and out of phase with each other, just as the repetitions of the text are heard in and then out of phase in Come Out. This process serves to spread the text over the listening area both spatially and temporally: segments are heard in one area and then repeated in another, or a series of segments are heard concurrently, depending on the listener's position. Segments may follow in the order of the text or be reordered so as to lose the intended flow of the text. Segments may also play in close temporal proximity to each other, creating an effect similar to that of Come Out depending on the placement of the listener.

Altered voice: I am sitting in a room

                This piece is built on an analogue tape recording of Alvin Lucier describing what he is doing and why. The process Lucier used was:

to record his voice onto one tape player, play it back on another tape player through a loudspeaker and record that rendition onto the first tape player[9].

This process was carried out nine times for the performance of I am sitting in a room, as it is presented in Source: music of the avant garde. The recording was done in Lucier's living room.

                The purpose of Lucier's composition is described in the composition itself. Lucier uses this description as the underlying and driving element of the composition. It is what he says into the first tape recorder, creating the seed of the composition, and is given below.

I am sitting in a room different from the one you are in now. I am recording the sound of my speaking voice and I am going to play it back into the room again and again until the resonant frequencies of the room reinforce themselves so that any semblance of my speech, with perhaps the exception of rhythm, is destroyed. What you will hear, then, are the natural resonant frequencies of the room articulated by speech. I regard this activity not so much as a demonstration of a physical fact, but, more as a way to smooth out any irregularities my speech might have[10].

                As each further rendition is heard the sense of the text disappears. The cascading effect of the resonances causes first distinct pitches to be heard, which create motifs, and then metamorphose into a musical composition. Table 3.3 below looks in more detail at each section, taking each rendition of the text as a section. References to pitches are approximate.

Table 3.3 Timeline of perceived changes in I am sitting in a room.

Rendition

Perceived changes

 

Rendition 1

Normal text reading is recorded.

 

Rendition 2

Reverberation of the room becomes apparent, also the mid-range frequencies of the voice are accentuated. The sound of Lucier's sibilances are accentuated.

 

Rendition 3

Room reverberation increases creating an impression of distance; it is now a distinctive part of the sound of the voice. A single pitch is now heard, triggered by the voice, which creates a harmony for the reading.

 

Rendition 4

The background noise becomes a feature in the piece. The text is becoming obscured but is intelligible. More pitches are heard, still obviously triggered by the voice, which create a melodic motif, or "accompaniment", around the voice.

 

Rendition 5

Text is barely intelligible; whatever remaining intelligibility there is may be a result of previous exposure to the text. The intonational pitch changes of the reading now more obviously affect the melodic pitch changes of the "accompaniment".

 

Rendition 6

The background noise is now a prominent feature and appears to have a number of distinct pitches, forming a non-tempered cluster around C# 5. (F# and C# seem to be the most resonant pitches of the room). The "accompaniment" is now the main feature of the piece; the text serves as part of the timbre of the "notes" of the "accompaniment".

 

From this point on it becomes difficult to separate the sections one from the other. Between sections six and seven there is a tape glitch or break which defines the beginning of the new section.

Rendition 7

The background noise now has two distinct pitches which it glides between regularly. The text is completely obscured but the rhythm of the reading continues to drive the piece. Three distinct parts now run simultaneously: the background noise, the "accompaniment" and now an adjunct to the "accompaniment" follows it but seems to have a different motif.

 

Rendition 8 and Rendition 9

From here there is a general smoothing of the overall sound of the piece. The text has degraded to the point of being unintelligible and it is now difficult to hear even the intonational aspects, which are now obscured as the piece takes on all the aspects of music and loses all aspects of speech.

 

                Alvin Lucier uses repetition as the main structural element in I am sitting in a room. He uses the many and changing resonances that are produced within a room by the intonational aspects of his text reading to create music from speech. This transformation of speech into music using resonance is reflected in Someone. Here forty artificial rooms have been created for the whole text and text segment to resonate in. As in I am sitting in a room the pitch and duration of each resonance is affected by the intonational aspects of the reading, creating melodies and harmonies which transform the text reading into a musical piece. In Someone and I am sitting in a room the resonances that we hear created by the voice are amplified through repetition.

                The difference in Someone is in the characteristics of the artificial rooms. The dimensions of each room change as the intonational aspects (rhythm, timbre, pitch and amplitude) of the reading change, thus changing the qualities of each resonance and its effect on the reading. By doing this the reading becomes the only causal aspect in the piece. In I am sitting in a room the reading and the room have equal roles in creating the piece; in Someone the reading has become the only agent in creating the piece.

                I am sitting in a room influences Someone in two ways. First in the use of repetition of a long phrase: in the case of Someone the whole poem is repeated; and second, by using the intonational aspects of that text reading as the trigger to alter the sound of both the whole poem and the segment of the poem. This process is discussed later in chapter three under the heading "The computer and composition processes used inSomeone."

Enhanced voice: Smalltalk and Late August

                Paul Lansky's Smalltalk and Late August also alter the voice to become unintelligible in a lexical sense. According to Lansky

Conversation [has the] ability to change its nature when one no longer concentrates on the meaning of the words. [What is heard is the] intonations, rhythms and contours of the speech.[11]

                Smalltalk is based on a recording of a domestic conversation between Lansky and his wife, Hannah MacKay. The recording was treated on a DEC Micro Vax II running software Lansky wrote for the project. This resulted in obscuring "the words we spoke while capturing the rhythms, pitches and contours of our conversation"[12].

                Late August resulted from Lansky wondering

what would happen if I tried the same sort of thing with another language, say Chinese, in which pitch and contour have different meanings. [The result] is similar to Smalltalk on the surface, but quite different in substance the sound world of the Chinese language led to a very different kind of music[13].

                While the music may be of a different substance and kind this surface similarity can make distinguishing the two pieces difficult, especially in the first few times they are listened to. This may be due to Lansky's use of conversational rather than highly structured and stylised text, such as a poem. His desire to create tonal centres that do not appear to be based in the pitch field of the voices also obscures the difference between the substance of Smalltalk and Late August.

                Table 3.4 describes the perceived changes over time in both Smalltalk and Late August. In this description of the changes over time in both pieces it is not essential that any of the textual foreground be mentioned. This aspect of the pieces is subject to and obscured by the effects that Lansky applies to it. To include a more detailed description of the foreground melody or harmonic changes would also obscure the description of Lansky's large scale structural composition of the pieces.

Table 3.4 Timeline comparing perceived changes in Smalltalk and Late August.

Smalltalk

Late August

Time

Perceived changes

Time

Perceived changes

0'0"

Melodic foreground and middle tessitura.

0'0"

Melodic foreground with high, wide tessitura. The accompaniment is based on aspirant-like sounds.

38"

Faint single note accompaniment background. This accompaniment is based on back vowel-like sounds[14].

 

 

1'13"

Accompaniment background becomes lower in pitch.

 

 

1'22"

Return to original accompaniment pitch.

 

 

1'35"

Accompaniment becomes more dense.

 

 

1'45"

Accompaniment background an octave lower.

 

 

2'07"

Complete change in background, moves to a different scale/key.

 

 

2'35"

Background accompaniment increases in activity.

 

 

 

 

 

 

 

 

3'02"

Large change in accompaniment.

3'29"

Return to original background accompaniment harmony/scale.

 

 

3'54"

Accompaniment shift.

 

 

 

 

4'02"

Change in accompaniment harmony.

 

 

4'36"

Increase in foreground and background activity.

 

 

4'51"

Accompaniment lowers in tessitura.

 

 

5'08"

Accompaniment raises in tessitura.

5'19"

New accompaniment harmony.

 

 

 

 

6'18"

Accompaniment change.

6'32"

Loud background accompaniment, increased harmonic motion.

 

 

7'50"

Background tessitura raised, the effect of the panning becomes less pronounced. This could be due to the ear getting accustomed to the panning activity.

 

 

 

 

8'17"

Accompaniment drops out.

10'0"

Changed accompaniment background as if moving to a different key.

 

 

 

Gradual fade out begins.

 

 

 

 

11'18"

Change in accompaniment harmony.

12'44"

End.

 

 

 

 

13'45"

End.

                Lansky uses a long segment of improvised text as the structuring element for both Smalltalk and Late August. In each case natural speech is altered through an imposed and entirely computer dependent process to produce the audible surface.

                The techniques used in this process appear to be mostly comb filters with short feedback times and synthetic or sampled voice-like sounds. The comb filters are tuned around the frequencies of the speech used as well as being reflected in the background harmonic drone.

                Both Late August and Smalltalk use quartal or triad based harmonic sequences in the voice like background drone. By using these fairly traditional and well understood processes Lansky has been able to make an unusual, and perhaps challenging, idea in a more well known and less challenging context. This makes the more challenging aspects of the two compositions easier to digest.

Composition of the piece Someone

                Some of the techniques used by Lansky are also used in Someone. In Someone sets of comb filters driven by the spoken text are used to create a harmonic, pitch based foreground. In this foreground the text is obscured, though not as heavily as in Lansky's two pieces; this is especially evident in Parts Five, Six, Seven and Eight where only the text is used.

                The composition methods and style of Someone are influenced by the methods and styles of the three voice- and text-based electro-acoustic compositions given above. Each of the three pieces offers a precedent which has been expanded upon in Someone exclusively in the digital domain.

                While it may not be immediately obvious how each of the three styles of electro-acoustic composition (the Direct Voice, Altered Voice and the Enhanced Voice) are used, their processes are either directly appropriated or used as a starting point from which the composition processes used in Someone are drawn.

The computer and composition processes used in Someone

                Someone uses a reading by Barry Dickins of his poem Saint Dymphna's Bells. The poem is Dickins' commentary on the last execution carried out in the state of Victoria, the execution of Ronald Ryan for the murder of a prison guard while attempting his escape. The reading is very expressive; the pitch, volume, timbre and tempo fluctuations enhance the sense of impotence, tragedy and perverse justice that informs the poem.

                The composition presented is an eight channel installation of indeterminate duration designed to create an aural environment for a transient or stationary audience. It is designed to be heard in a large space either as the main focus for the audience or as part of a performance or exhibition involving other art forms such as a dance performance, a video presentation, or a painting or sculpture exhibition. It is presented on four stereo compact discs which may also be listened to individually.

                The piece uses a ten second sample, the first four lines, of Dickins' reading to provide a setting in which the entire text can be heard. For large sections of the piece this setting is all that is heard, and therefore it functions both as a background and as a foreground.

                The text segment used is:

Someone rang Saint Dymphna's Bells,

Someone did.

At precisely eight in the country morning,

Someone did.

The entire text is given in Appendix 6.

                When looked at purely as a set of phonemes, that is, when only the sound of the text and not the connotative or denotative meanings are considered, the repetition of the four lines of the poem creates a phonemic motif around which the other phonemes are based. Table 3.5 gives a large scale structural analysis of the text segment used. In deciding on the sections A and B of this analysis, the rhythm and pitch contours of the reading are taken into account. These aspects are examined in Table 3.5 on the following page.

Table 3.5 Large scale structural analysis of the text segment used.

A1              / Aextension

Someone rang / Saint Dymphna's Bells,

A2

Someone did.

B

At precisely eight in the country morning,

A2

Someone did.

                Table 3.6 shows a large scale description of the intonation characteristics of the text segment. Pitch curve is shown by the relative height of the line. Rhythm and elision is shown by the gaps in the line. A continuous, static, rhythm is shown by a continuous line, as is the reader's use of elision. Broken speech is shown by a broken line.

Table 3.6 Intonational characteristics of the text segment used in Someone.

                The simple sonata-like form of this segment of the reading creates the sense of direction and return inherent within that form. As the text segment used has a major structural role it is essential that the segment be recognised to have the attributes of well constructed musical phrase and that it be easily recognisable as musical structure in itself. This is because repetitions of the phrase are fed through a set of continually changing signal processing algorithms, altering the surface sound of the phrase. The use of repetition within the phrase, the word Someone, gives the listener a particular sound within the phrase to become familiar with; as the repetition of the phrase continues this familiarity extends to the phrase itself.

                The repetitions in the text can be heard at up to twenty-eight different speeds at the one time throughout the piece. As well as these differing speeds the text is heard harmonised in up to eight different ways and at an almost infinite set of pitch levels due to the twenty-eight possible glissandi speeds at which the text is iterated.

                As mentioned above Someone was created using algorithms created using the IRCAM Signal Processing Workstation (ISPW) and then edited using Digidesign's ProToolsª and SoundDesignerª. The ISPW algorithms were created using the standard libraries available in release 0.24. These algorithms produced near final pieces; ProTools and Sound Designer were used to perform topping and tailing and normalising duties.

                The ISPW algorithm can be divided into four sections or sub-algorithms: a granular process which acts mainly as a time stretching and glissando producing device; the Fast Fourier Transform process; two banks of comb filters, which act as a harmonising device; and a panning algorithm. The result of the first three sub-algorithms is finally fed through the panning algorithm. Figure 3.2 shows an overview of the main algorithm and the flow of sonic information through the sub-algorithms.

Figure 3.2 Overview of the main algorithm and the flow of sonic information through the sub-algorithms used in Someone.

The algorithms used for Someone and their functions

                Sound file playback

                Here the digitally recorded sound file of the poetry reading is repeated as a six minute loop. This allows a twenty second gap between each repetition of the sound file. These repetitions of the sound file are occasionally heard throughout the piece, either after being filtered through the comb filters and/or in its natural state. It is also used to vary the amount of feed back of each of the two banks of comb filters, as discussed below in the section titled Comb filters.

                Granulation process

                Here the ten second samples of the sound file are played through seven play back algorithms in sixty millisecond grains. The beginning point of each sixty millisecond grain moves at varying speeds through the ten second sample. This means that the sample appears to be stretched or shortened, depending on the speed with which the starting point moves through the sample. The seven sample play back units, called samplePlay, play the ten second sample at different speeds and pitches, as discussed below.

                The speed at which each grain moves through the sample is set by a division of the total duration of the Part by increasing numbers from the Fibonnaci series. Part One, for example, uses eight repetitions of the poem and has a duration of forty-eight minutes (2880 seconds) before it repeats. The duration of the ten second sample in each of these samplePlay units is listed below. The samplePlay units are numbered according to the Fibonacci series.

                Describing the duration of each sample playback in Part One:

                samplePlay unit one will take the full 2880 seconds to play through the ten second sample;

                samplePlay unit two will take 1440 seconds to play through the sample (2880 divided by 2);

                samplePlay unit three will take 960 seconds to play through the sample (2880 divided by 3);

                samplePlay unit five will take 576 seconds to play through the sample (2880 divided by 5);

                samplePlay unit eight will take 360 seconds to play through the sample (2880 divided by 8);

                samplePlay unit thirteen will take 221.538[15] seconds to play through the sample (2880 divided by 13);

                samplePlay unit twenty one will take 137.142 seconds to play through the sample (2880 divided by 21).

This list uses Part One as an example; the process of dividing the total duration of the Part by increasing numbers from the Fibonacci series is identical in each Part.

                Each grain of the sample is played back at speeds varying between 15 milliseconds and 240 milliseconds. Changing the speed of sample playback causes pitch shifts and glissandi to be heard; in this case the glissandi span four octaves. Figure 3.3 shows this process; the black playback window, or grain, loops continuously from its starting point.

Figure 3.3 Example of the motion of the 60 msec grain through the ten second sample window.

                Glissandi process

                Each samplePlay unit reproduces the granulated sample with different glissandi. The width of the glissandi is four octaves, ranging from two octaves below to two octaves above the nominal pitch of the incoming granular signal. The speed of the glissandi is set by dividing the duration of each Part by a process similar to that used to set the overall length of each Part. For example: if the duration of the Part is 2880 seconds, as in Part One, the glissandi speed of each of the seven samplePlay units uses two adjacent numbers from the Fibonnaci series to divide the total duration of the Part. This creates an elliptical shaped loop.

                Speed of the different glissandi used in Part One

                SamplePlay unit one does not use any glissandi, it maintains this nominal pitch throughout the Part.

                The glissandi process for samplePlay unit two divides the total duration of the Part by 3 and glissandos from the playback limits of 15 to 240. This results in an ascending glissandi of two octaves. To return from 240 to 15 the total duration of the Part is divided by 2 resulting in a descending glissandi of two octaves to return from 240 to 15.. This means that in the case of Part One, for example, which lasts 48 minutes, samplePlay 2 takes 960 seconds to go from 15 to 240 and 1440 seconds to return from 240 to 15.

                The glissandi for samplePlay unit three takes the total duration of the part divided by 5 to go from 15 to 240 and divided by 3 to return.

                The glissandi for samplePlay unit five takes the total duration of the part divided by 8 to go from 15 to 240 and divided by 5 to return.

                The glissandi for samplePlay unit eight takes the total duration of the part divided by 13 to go from 15 to 240 and divided by 8 to return.

                The glissandi for samplePlay unit thirteen takes the total duration of the part divided by 21 to go from 15 to 240 and divided by 13 to return.

                The glissandi for samplePlay unit twenty one takes the total duration of the part divided by 34 to go from 15 to 240 and divided by 21 to return.

                FFT process

                "Fourier transformation can be used to associate a unique spectrum with any waveform. The spectrum shows, in effect, how to construct the analysed (sic) waveform out of a set of sinusoidal harmonics, each with a particular amplitude and phase"[16]. In this case the spectra is represented in ten sets of sinusoidal harmonics, each of which is tuned to be sensitive to ten distinct spectrum of the reader's voice.

                The spectral motion of the poetry reading is sampled every fifteen milliseconds, and this sample is represented numerically by the sampeek~ object. As these numbers change according to the changing spectra of the poetry reading they set the amount of feedback each comb filter is allowed. The amount of feedback is then scaled to create ever changing overlaps in the pitch field created by each of the ten comb filters. This results in shifting harmonies, triggered by the voice of the reader, being heard.

                In this composition the Max fft~ object  is used to analyse the changing spectra of the readers voice. The resulting information is then used to control the bank of ten comb filters.

                Comb filters

                A comb filter is "similar to a tape loop delay echo. As long as the feedback gain is less than [the amplitude of the signal] the impulse response consists of a series of repeating echoes that [change in inter-onset time and decrease in amplitude, or feedback][17].

                 If the delay time used in a comb filter is set to between the frequencies that produce pitches, say twenty Hertz to twenty kiloHertz, an extra pitch produced by the delay can be heard along with the input signal. By altering the feedback time of the comb filter the duration of the extra pitch can be altered. Here the feedback of the ten filters is changed dynamically by the FFT process given above.

                The ten filters are divided into two banks of five; each bank has a seed value to set its resonance, this value remains static for the duration of its respective Part. This seed value is multiplied by a floating point number derived by reversing the ratios used to build a scale in the Pythagorean tuning system[18]. For example: if the seed value for one bank is 6 and is multiplied by 0.75, the first filter resonates at 6 milliseconds (166.666 Hz), the second comb filter at 4.5 (222.222 Hz), the third resonates at 3.375 milliseconds (296.296 Hz), the fourth at 2.531259 milliseconds (395.061 Hz) and the fifth at 1.8984375 milliseconds (526.748 Hz). This results in a set of stacked fourths.

                The tuning of each stack is set to oscillate 0.01 either side of the multiplication number used to create the proper interval for the stack. For example in the case of the stacked fourths mentioned above, the tuning of each filter in the stack oscillates between 0.749 and 0.751. By moving in and out of tune, additional harmonies and melodies are created by the beat frequencies that occur; these often sound like sine tones.

                The speed of the oscillation around the interval is set by the panning process, which is defined below. For example: in Part One, where there are stacked out of tune fourths an octave apart, the tuning takes the total time of the Part divided by 55 to travel from beneath the perfect fourth to above it and the total time divided by 34 to return.

                The speed of the oscillations is set by the overall length of the part. In Part One, for example, one bank takes approximately 84.705 seconds to travel from 0.749 to 0.751 and approximately 52.363 seconds to return, the other bank reverses this, taking 52.363 seconds to travel from 0.749 to 0.751 and 84.705 to return.

                Panning

                The audio signal from each bank of comb filters moves in the stereo space. This panning process uses elliptically shaped loops based on Fibonacci divisions of the overall duration of each of the nine Parts used in the installation. The panning of one signal from the comb filter banks across the stereo spread takes the total duration of the Part divided by 55 to get from one side to the other and total duration divided by 34 to return. The signal from the other bank of comb filters follows the same panning motion but is delayed by the total duration of the Part divided by 89.

                For example: in Part Four, which has a total duration of 727.992 seconds, the signal from one comb filter bank takes 132.362 seconds to move from one side of the stereo field to the other and 21.411 seconds to return. This same movement is delayed by 8.197 seconds for the signal coming from the other filter bank.

Construction of the Nine Parts

                Someone is made up of nine Parts and each Part is stored on four compact discs. The number of repetitions of the text in each part is based on the Fibonacci series:

                Part One lasts forty-eight minutes and is made up of eight repetitions of the poem. The actual length of the reading is five minutes and fifty-one seconds; adding the extra nine seconds allows time to delineate the repetitions with a short period of less activity. The comb filter banks are tuned to be an octave apart, the seed values for each bank being set to six and twelve and then multiplied by a number between 0.749 and 0.751 to produce a stack of intervals of slightly out of tune fourths. The information under the heading Comb filters above gives a more detailed description of the process used in creating the stacks; a similar process is used in all the other parts except Part Nine.

                Part Two has five repetitions with the comb filter bank seed values set to 7.992[19] and 5.322, which produce intervals of a fifth from the frequency of the seed value of twelve used in Part One, and multiplied by a number between 0.665 and 0.667 to produce a stack of intervals of slightly out of tune fifths.

                Part Three has three repetitions and lasts eighteen minutes. The comb filter bank seed values are set to 10.125 and 8.542, producing intervals of a minor third from the original seed value of twelve. Each bank is multiplied by a number between 0.842 and 0.844, producing a stack of intervals of slightly out of tune minor thirds.

                Part Four has two repetitions with the comb filter bank seed values set to 7.593 and 4.804, which produce intervals of a minor sixth from the seed value of twelve, and multiplied by a number between 0.631 and 0.633 to produce a stack of intervals of slightly out of tune minor sixths.

                Part Five lasts six minutes and is one rendition of the poem. This rendition is fed through the same set of comb filter banks as Part One. In this Part, as in Parts Six, Seven and Eight, the text reading and granulation of the text segment for Part One is fed through the comb filters.

                Part Six also lasts six minutes and is one rendition of the poem. This rendition uses the same comb filter tuning as Part Two. The text reading and granulation of the text segment for Part two is fed through the comb filters.

                Part Seven lasts six minutes, is one rendition of the poem and uses the same comb filter tuning as Part Three. The text reading and granulation of the text segment for Part Three is fed through the comb filters.

                Part Eight lasts six minutes, is one rendition of the poem and uses the same comb filter tuning as Part Four. The text reading and granulation of the text segment for Part Four is fed through the comb filters.

                Part Nine lasts six minutes and is one rendition of the poem without any adjustments or modifications: it is the actual recording of the poetry reading.

                On the four compact discs which make up Someone each Part is represented twice, except for Part Nine, which is represented on each disc:

                Disc one, channels one and two, contains Part One, Part Four, Part Five and Part Nine;

                Disc two, channels three and four, contains Part Two, Part Three, Part Six and Part Nine;

                Disc three, channels five and six, contains Part Three, Part Two, Part Seven and Part Nine;

                Disc four, channels seven and eight, contains Part Four, Part One, Part Eight and Part Nine.

                The normal performance of Someone is as an installation of indeterminate duration, using the four compact discs as prescribed in the scheme above. A submitted compact disc, Compact Disc Two, provides a fifteen minute study made up of a mix of all the Parts of Someone. This is the version of the piece discussed in the evaluation.

                Each Part on each compact disc is a stereo rendition of Someone and can be listened to as individual stereo versions of the piece. If this is the listening choice then it should be listened to as a part of the aural environment. This approach to composition follows in the traditions of "ambient" music such as Music for Airports, composed by Brian Eno, and Vexations by Erik Satie.

                While Someone is not necessarily designed to be listened to as the sole focus of the listener, excerpts of it can be used in a more traditional concert setting. If it is presented in this way then any of the Parts can be used, as can sections of any Part. It is also possible to present any number of Parts in a traditional concert setting. In this case any number of Parts can be selected, either randomly or intentionally, and mixed together to be performed simultaneously through whatever sound system is available for the concert.

                When Someone is presented to an audience as an installation each of the stereo channels should be regarded as separate and distinct mono parts, even though there are definite stereo relationships within each Part. Figure 3.4 shows the placement of each speaker in a rectangular room. The placement of each speaker should be as far apart from each other as possible and the amplitude for each channel should be equal and set to such a level that the two nearest speakers can be easily heard but not so loud as to drown out conversation.

Figure 3.4 Speaker placement for Someone.

                It is not necessary for Someone to be presented in a square or rectangular room, however it is essential that the speakers for channels one and two, and three and four, be in the corners of the room and as far apart as possible. This will create a diagonal movement of the sound. The speakers for channels five and six must be as far from each other as possible and in the centre of the walls of the room; the same applies to the speakers for channels seven and eight. In this case the stereo movement of the sound should create a circular motion. It is important that the eight channels be balanced so that a listener standing in the centre of the room hears each channel at an equal amplitude.

                If Someone is presented in a number of rooms then it is important that each of the channels be grouped together as close to their numerical order as possible. For example: channels one, two and three in one room, channels four and five in another room, and channels six, seven and eight in another room.

                The Parts on each compact disc can be selected randomly or played in the order in which they appear on the compact discs. If the Parts are played in order the compact disc players should be set to continually repeat for the duration of the installation, otherwise they should be set to play randomly for the duration of the installation.

EVALUATION

                An evaluation of Someone must examine three separate and distinct areas: first, how effective this process of creating a piece of music from a text reading is; second, how effective the result is in reflecting the content of the text; and third, the effect of its presentation as an installation. The second area is not essential to the goal of the thesis presented here, the creation of a musical composition from a text, but is important as the intentions of the poem are influential in the composition processes taken. An obvious example of this influence, from a compositional point of view, is the decision to make the poem audible in section of Someone, Part Nine of the composition. Also, the content of the text is of great influence in Dickins' reading. His intonations are generated by the mood of this content and it is these intonations that drive both the underlying, structural aspects and the audible surface of Someone.

                Because Someone is presented as an installation the role of the listener in their perception of the piece must also be accounted for. The listener is able to move about within the audio area and thereby to select, at first somewhat randomly and then by either conscious or unconscious design, the Parts of the piece they hear. In doing so they create their own personal mix of Someone.

                The rendition of Someone presented on compact disc two is discussed for this evaluation. This rendition attempts to simulate a listener's experience of the piece. Each of the Parts is mixed together in such a way as to simulate the movement of a listener through a virtual room in which Someone is being performed.

                The harmonies that are created in Someone vary in colour and emotional impact, from the dark minor thirds and minor sixths to the lighter fourths and fifths. These intervals were chosen to exemplify the opposing emotional states of the poem, the horror of murder and execution and the compassion shown by the anonymous ringing of Saint Dymphna's bells. The tuning shifts of each of the intervals also create other ancillary harmonies; these are repercussions of the overall harmonic action.

                The listener can move to a place where the mix of the Parts suits them best. This is where Someone's success lies, in the ability of the listener to create their own relationship with the piece and its content.

 

 



[1]Rolf Carlson, Anders Friberg, Lars FrydŽn, Blšrn Granstršm and Johan Sundberg, 'Music and speech performance: Parallels and contrasts.' Contemporary Music Review 4, 1989, pp. 391-404.

[2] John Cage, Sixty-two Mesostics re Merce Cunningham for Voice Unaccompanied using microphone. Hat Hut Records Ltd, 1991.

[3] Steve Reich, 'Come Out.' Music Of Our Time: New Sounds In Electronic Music. Producer David Behrman. CBS, Inc., 1972.

[4] Alvin Lucier, 'I Am Sitting in a Room.' Source record number three. Source, 4, 7. BMI, 1970.

[5] Paul Lansky, 'Smalltalk.' Smalltalk. New Albion Records, 1990.

[6] Reich, op. cit.., liner notes.

[7] Steve Reich, 'Come Out.' Early Works. Nonesuch Records, 1987. Liner notes, page 2.

[8] Richard Boulanger, The Transformation of Speech Into Music: A Physical Exploration and Interpretation of Two Recent Digital Filtering Techniques. Phd. thesis, UCSD, 19.4.85, p. 31.

[9] Lucier, op. cit., p. 60.

[10] Alvin Lucier, 'I Am Sitting In A Room.' I Am Sitting In A Room, Lovely Music, 1990, Liner notes.

[11]Lansky, op. cit., Liner notes, p.1.

[12] ibid., p.1.

[13] ibid., p.1.

[14] A back vowel is produced in the back of the vocal tract, nearest the glottis.

[15] Frequencies and durations are taken to three decimal points.

[16] F. Richard Moore, Elements of Computer Music. Englewood Cliffs, New Jersey, Prentice-Hall, 1990, p. 29.

[17] ibid.., p. 381.

[18] Don Randel, Harvard Concise Dictionary of Music. Cambridge , Mass, Belknap Press, 1978. p.238.

[19] Again the frequencies are to three decimal points. As the tuning of each bank is continually shifting the actual frequency is not critical.