Exploded sounds: spatialised partials in two recent multi-channel installations

Abstract

I discuss two recent sound installations that both explore a spectral sound diffusion technique based on partial tracking that allows individual partials of a sampled sound to occupy individual locations in space. The two installations, The Exploded Sound (60 channels) and Significant Birds (12 channels), use similar techniques and modes of presentation to different ends. The former creates an essentially static listening environment, in which the listeners’ movements in space allow them to explore the inner structure of sound, while the latter focuses on the aural illusion that results from this approach, presenting decomposed speech as electronic “birdsong”, which is reconstructed by the brain into intelligible words. I will discuss technical aspects of the approach alongside the aesthetic aims, describing the research process and some conclusions that have been drawn from experiencing the works in situ. The discussion locates the work within my wider research on navigable sonic structures.


Introduction

In this paper I discuss two recent sound installations that both explore a spectral sound diffusion technique based on partial tracking that allows individual partials of a sampled sound to occupy individual locations in space. The two installations, The Exploded Sound (60 channels), presented at the Jacopic Gallery in Ljubljana as part of ICMC 2012, and Significant Birds (12 channels), first presented at the Science Gallery, Trinity College, Dublin as part of their ILLUSION exhibition, use similar techniques and modes of presentation to different ends. The former creates an essentially static listening environment, in which the listeners’ movements in space allow them to explore the inner structure of sound, while the latter focuses on the aural illusion that results from this approach, presenting decomposed speech as electronic “birdsong”, which is reconstructed by the brain into intelligible words.

I will discuss technical aspects of the installations alongside the aesthetic aims, describing the research process and some conclusions that have been drawn from experiencing the works in situ. The discussion locates the work within my wider research on navigable sonic structures.

Background and impetus

In 1998 I created a site–specific sound installation at the Kew Bridge Steam Museum in London [1] in which the processed sounds of steam engines were presented in a dialogue with their live counterparts in the space. The primary aesthetic driving force for this composition was the dialogue between ambiguous transformed materials and the very obvious presence of the original sound sources in the space . At one moment in this piece an engine sound was gradually built up by introducing groups of partials located in four of the eight speakers, starting with a single partial, adding two more in a different location, and then adding ever larger groups of partials on each iteration of the sound, always in a different corner of the room. This was achieved quite simply by successively un-muting different groups of fft bins in each speaker, using the Spectral Assistant in Tom Erbe’s SoundHack. [2]

A similar process is referred to by Robert Normandeau, who has developed the technique in numerous works in the last decade under the term “spectral diffusion” as an extreme variant of the spectral panning now available in such plug-ins as Izotope’s Spectron. [3] The effect is frequently achieved through band pass filtering or fft resynthesis techniques, sending different frequency ranges to discrete channels. The installations discussed in this article can be seen as a specialised form of the spectral diffusion technique in which individual partials are spatialised as point sources.

Topper et al. have examined the control of individual partials using physical models of singing bowls. Many of their findings are confirmed by my work on The Exploded Sound, which implements partial tracking analysis/re-synthesis techniques to extend the idea to sampled sounds. It was observed, for example, that the ambiguity of the spatial information increased as the source of the sound became recognisable. The temporal coherence or “common fate” of the partials leads the listener to perceive a single source while at the same time the diffusion of individual partials inhibits our ability to localise the sound. This inability to locate a single sound source, coupled with the equally strong drive to perceive the individual partials as a perceptually fused entity, became one of the most important features of The Exploded Sound installation and later led me to address the idea explicitly in Significant Birds.

In addition to my earlier experience of spectral diffusion in Living Steam, this research was greatly inspired by the visual image of Cornelia Parker’s installation Cold Dark Matter: An Exploded View (1991). [4] In this installation, the fragments of a shed, which has been blown apart by a violent explosion, are suspended in space to reflect their positions at a single point in time just moments after the explosion. I wanted similarly to freeze a single moment of sound in space, allowing listeners to explore its inner structure of partials by walking among the fragments of The Exploded Sound.

Navigating sonic structures

My desire to walk around inside a sound relates to many of the ideas I have been exploring in my installation work in the past 15 years. I have argued that the development of sound installation practice itself reflects a fundamental shift in the conceptual metaphors through which we understand musical time, from one in which music is seen as moving past a stationary observer to one in which music is conceived of as a fixed landscape that is explored by a moving subject. I suggest that the latter perspective, which Johnson and Larson identify as that of the analyst (while the former is more associated with the listener), came to dominate the musical discourse of the 1950s’ avant garde, laying the foundation for many of the spatial practices of sound installation and locative media.

The modular and reconfigurable open form approaches to musical composition in the 1950s and 60s, which M.J. Grant suggests arose as a direct consequence of serial thought, seem to emerge naturally from a notion of music in which the listener is an active participant, the explorer of a musical landscape rather than the stationary observer of a musical journey undertaken by an unacknowledged protagonist who is embodied in the musical material itself. While, in the context of concert music, this shift in perspective may provide a useful metaphor with which to conceptualise the works of this period, in which the teleology of tonality and classical musical narrative are absent, sound installation practice might provide a way to open these spatial structures up to direct embodied experience by allowing the physical exploration of musical space. My installation, The Exploded Sound, in turn attempts to extend the spatial exploration of musical structures by offering the possibility of navigating the structure of sound itself.

Early experiments

Preliminary experiments were carried out in 2010 and 2011 in an Ambisonic cube. As the project grew this approach was abandoned in favour of the multiple point sources of the final installations; however, much was learnt from these early experiments about the potential and psychoacoustics of spatialised partials, and the basic techniques were developed on this system.

The Ambisonic cube was selected for a number of reasons. The availability of a cube at the Lansdown Centre for Electronic Arts at Middlesex was important. The cube’s strong sense of a “bounded acoustic space,” which in itself reminded me of the Cornelia Parker piece, was also a driving force behind the idea. I am interested in the way that Ambisonics defines an area of acoustic activity which seems to have an identity even if encountered from outside the cube. On a more practical level, it is possible, with Ambisonics, to define an arbitrary number of separate spatial positions or trajectories without needing to define individual channels or tracks, thus keeping file sizes manageable. A four-channel file (WXYZ) can contain any number of separately moving partials. Increasing the number of locations does not equate to increasing the number of channels. Finally, though this was not done, it would have been a trivial task to scale up to higher order systems and more speakers, giving greater definition without having to completely recompose the pieces.

For the tests in the Cubecube, sounds were analysed using Spear and its implementation of a partial tracking algorithm by J. McAulay and Quatieri and re-synthesised in Csound for playback. This combination was also adopted in The Exploded Sound installation, with additional realtime control supplied by Max/MSP. [5] The use of partial tracking techniques allowed me to treat each partial as a single object with a fixed (or moving) location, rather than locating frequency bins from a raw fft. These may be inactive for certain periods or contain incomplete partials of dynamic sounds – glissandi and vibrato for example – while Ambisonics allowed me to experiment with an unlimited number of locations without a massive diffusion system.

Because my goal to “freeze” a sound was already clear, time-frozen musical sounds with a broad but stable spectrum formed the basis for many of these investigations. A single note sung by a soprano proved very satisfying when broken down into just 32 of its strongest partials. Orchestral and ensemble chords from Morton Feldman’s Rothko Chapel and Handel’s Zadok the Priest also yielded interesting results, the latter being analysed into 300 partials, which were particularly effective when introduced singly, building the sound up gradually one partial at a time. The idea of introducing partials gradually and modulating their amplitudes dynamically became a feature of the work that added a temporal dimension to complement the spatial.

Analysis of the original file was done in Spear and the output saved as a textfile. Spear offers two formats for textfile output, one sorted by analysis frame and the other sorted by partial. Earlier experiments with realtime resynthesis in Max/MSP using the frame-sorted data to drive a bank of oscillators proved too processor hungry on the machines available to me at the time to allow individual control of every partial. Instead a non-realtime approach was adopted in which the frequency and data for each partial was read into a function table in Csound and used to drive a simple sinewave oscillator.

Translation from Spear to a Csound score file was accomplished using a simple Java program, which creates pitch and amplitude function tables and instrument calls for each partial. The orchestra is thus very simple, consisting of a single instrument containing at its most basic a single oscillator. For this, Spear’s alternative textfile format was used. This describes each partial separately. The variable nature of the time stamping in this format means that, in addition to tables for pitch and amplitude, a separate function table containing delta-times was required, so that each partial could be resynthesised correctly.

To achieve the time-freezing, the index to the function tables scans back and forth across a number of frames of data, allowing pitch fluctuations in the frozen segment (typically of between one and three seconds) to be heard. Inevitably there is a payoff between the realism or recognisability of the sound and the effectiveness of the freeze. The more frames are used, the more recognisable the sound. However, using more frames increases the likelihood of repetitions of patterns of pitch fluctuation being perceived, giving a “loopy” result. In sounds with a large number of partials some success was had in reducing the loopiness by using different scan lengths for each partial. This gives a slightly less realistic rendering of the original sound because the partials are de-correlated; however, the source is still clearly recognisable and the sensation of looping is greatly reduced. In The Exploded Sound some sounds use this de-correlated approach while others are strictly synchronised. The de-correlated sounds add a shimmer to the overall effect that proves attractive.

The most stable results were achieved when the majority of partials were fairly consistent over the scanned period; however, interesting effects were also achieved by highlighting inconsistencies. In the earliest experiments using a single soprano note the analysis file was reduced to 32 stable harmonics, which were present for the duration of the scan period. However, in later experiments such as the Zadokchord, up to 300 partials were resynthesised, including those that had start or end points within the scan period. Within the context these created rhythmic patterns when isolated, but that could be re-merged into the original sound to interesting effect. Listeners reported that this experience was closer to the visual inspiration of the Cornelia Parker exploded shed, presumably because the partials appear more fragmented when not all are present.

The experiments carried out in the Ambisonic cube confirmed the findings of Topper et al. with synthesised sound, that rotation reduces the tendency to interpret the sound as a single fused entity and increases the awareness of individual partials and their locations. Static positioning of the partials tended to cause sounds to be perceived simply as a single, if spatially ambiguous whole and was in itself sonically rather uninteresting. While this stasis was a problem in the cube, it became a central feature of the later installations in which it was possible for the listener to physically move around and thus impart the necessary perspectival shifts through dynamic exploration of the space rather than through virtual movement of the sound itself. Setting the partials in motion allows the brain to separate them into individual units while still retaining a sense of the whole. In the early cube experiments this was mostly achieved through wholesale rotation of the soundfield. In the installations, the movement is physical and belongs to the listener, allowing a genuine exploration of the sonic structure.

Experiments in the Ambisonic cube were encouraging, but a number of limitations of the system led me to try to find alternative ways of presenting these sounds. First the lack of spatial precision of first order Ambisonics was noted. In my opinion Ambisonics (specifically first order) achieves a high level of spatiality, a representation perhaps of the complexity of real world sonic spatial behaviour, but without a great specificity in terms of location. The realism of soundfield recordings derives in particular not from locating sounds precisely but from giving the impression that reflections are coming off many surfaces. For many purposes this is not such a problem and suits the vagaries of the human auditory system. Sine waves in particular are hard to localise, which was a specific issue in my experiments. Indeed, there seems to be something perverse about using phase relationships between eight speakers to localise individual sine waves, which effectively multiplies each simple component of the exploded sound into eight actual locations.

The Exploded Sound

Most importantly, however, I wanted to bring in the possibility of listener movement through the field of harmonics. I had made some experiments in which the partials were reproduced from a limited number of point sources. In one simple piece, entitled Harmonic Study, I simply assigned half the partials to one speaker and half to another. Even this was found to be surprisingly effective as small changes to the listening position revealed different nodes and interference patterns, which strongly affected the spectral balance.

The breakthrough came when I was introduced to the work of Jamie Campbell, who has developed a portable digital eight-channel amplifier that takes an ADAT input and outputs directly to eight speakers. Using a (now sadly discontinued) M-Audio ProFire Lightbridge interface it is possible to get up to 64 channels (on eight amplifiers) from a single Firewire port. This enabled me to design the Exploded Sound installation, which would run 60 channels of audio form a single Mac Mini computer.

The Exploded Sound was created for ICMC 2012/Earzoom Festival in Ljubljana and consists of 60 small loudspeakers hung at various heights in a roughly 5x4 metre area (see figure 1). In the current version of the piece each speaker is mounted in a plastic baffle and suspended by its own audio cable. Listeners walk among the speakers experiencing individual sine waves or, in some cases, short patterns consisting of two or three frequencies per speaker, while never losing awareness of the composite sound. Six chords were used in the final piece, three orchestral and three choral. These gradually replace each other one partial at a time according to an algorithmic system controlled by a Max/MSP patch, constantly creating new hybrids that gradually resolve onto the pure chords, before moving on.

Figure 1 Exploded Sound, Jacopic Gallery

As before, Spear was used for the analysis of the static sounds and Csound for the resynthesis. This allowed quite detailed work to be carried out in the graphic environment of Spear before the analysis was exported, selecting the partials that persist over longer periods of time and rejecting those that are too high or too quiet to contribute. Rather than control the overall evolution of the piece in time in Csound, however, 60-channel soundfiles of each of the frozen chords were created, which are played back over the system with realtime enveloping and amplitude modulation applied by Max/MSP. Having put together this methodology and a hardware system capable of handling 60 channels of audio, two focused research periods at CRiSAP, LCC, in the University of the Arts London, were used to develop the Max patch and design the algorithms that control the temporal development of the material. At the end of the first period a small showing was organised at which I also played some speech through the system using a different technique. This was deemed particularly interesting by some listeners, as the source was so obviously a human voice that the principle of decomposition was particularly clearly observed. I personally still preferred the more obviously “beautiful” sound of the frozen chords, and considered the speech resynthesis more as a scientific demonstration and the frozen chords as the intended artwork. In the second period at CRiSAP, however, I decided the “scientific demonstration” would be useful as a way of helping people understand what they were hearing, so I devised a structure where some text (the Radio 4 Shipping Forecast) occasionally emerged, using time shifting of individual partials to create transitions between a rippling texture and intelligible speech.

The method used for this, which was explored more fully in the second installation, Significant Birds, was originally developed for a live performance by myself and the Mongolian Khoomii (overtone) singer Michael Ormiston on a 77-speaker soundwall at the Science Museum in London. The method used in Significant Birds is described below.

Significant Birds

The installation Significant Birds was specifically created for the exhibition ILLUSION, which ran at Dublin’s Science Gallery, a part of Trinity College, from July to September 2013 (see figure 2). A touring version of the exhibition is planned for 2014/15. The Science Gallery is a unique exhibition space, which focuses on artistic projects which illustrate and communicate scientific ideas. The ILLUSION exhibition specifically addressed illusions as a means of elucidating the processes of human perception. I devised Significant Birds as a response to an open call, drawing specifically on what I had considered the more illustrative aspect of The Exploded Sound, the disintegration of human speech.

Figure 2 Significant Birds, ILLUSION

In Significant Birds, 12 loudspeakers in individual birdcages are suspended in the installation space. Each speaker produces a chirping sound. By manipulating amplitude and timing a human voice that is a composite of the individual “birds” is brought in and out of focus. The cages act as visual cues to one possible interpretation of the sounds, emphasising their separate natures, while the brain cannot help but reassemble the sounds into coherent speech. The idea of the birdcage came from another small installation I had made for Trinity Laban Conservatoire’s Out of the Cage festival in November 2012, entitled Bird:Cage, in which a single partial of John Cage’s voice was isolated and presented in a cage hanging from a very long rope in one of the baroque stairwells of the Royal Naval College, Greenwich (see figure 3).

Figure 3 Bird:Cage, Royal Naval College

For the speech material in both pieces (and Bird:Cage) I used Sigmund~ in Max/MSP to analyse a soundfile in real time, producing the sine wave partials directly in Max/MSP. This technique would have been too unstable for the chords, as partials may be picked up and dropped and appear again in different tracks of the analysis. It is this very instability, coupled with the rapidly changing nature of the source sound, that gives each partial an uncanny resemblance to birdsong.

Significant Birds takes a unique approach to the demonstration of psychoacoustics. Visitors are presented with an aural illusion that allows them to become aware of their own perceptive process. The sound they hear is created in their brains rather than in physical space. This can be verified by listening to each speaker and attempting to find the source of the speech – an impossible task. The brain is given conflicting stimuli: the spatial information provided by the numerous sound sources is overridden by the brain’s propensity to group the partials into a coherent whole. Speech in particular is almost impossible to hear as separate point sources even when the number of partials is severely reduced. To retain some sense of the birds, the number of speakers was reduced to 12 and two techniques were used to make transitions between the perception of separate “birds” and coherent speech. The first was to gradually reduce the number of voices to one and then build the sound up again one cage at a time. The second was to delay each partial with respect to the others to create a wild chattering and gradually bring them into phase, as I had done in The Exploded Sound. With the 60 speakers of The Exploded Sound this created rippling and sweeping textures, whereas with 12 it has the intended effect of creating an electronic aviary that gradually coalesces into recognisable speech. It is interesting to note that the speech remains intelligible even when quite significant delays are introduced (about 10ms between each partial, so with 12 channels 120ms between the most distant partials).

Up close, or when the partials are de-correlated as just described, the spatial information becomes dominant and separate sounds (the chirping “birds”) are perceived. The work is designed to highlight the moment at which perception flips between the isolated partials and the coherent virtual gestalt. The cages give a visual cue to this alternative reading of the sound.

The text I used was an excerpt from Hermann von Helmholtz’s On the Sensation of Tone , which deals with the way the ear breaks down sound into individual partials, and was based on Helmholtz’s own pioneering experiments synthesising vowel sounds with tuning forks.

One of the most striking features of the work is the way that, as in The Exploded Sound, the vocal source of the sound is clearly recognised but non-locatable because it emanates from multiple points. While this effect was planned and formed a major part of the illusion, it had an unintended consequence that requires further investigation. I described above how, when the partials are in phase, the ear is incapable of perceiving them separately and will always hear the voice. There does seem to be an exception to this rule. At the private viewing, at which visitors were conversing in the vicinity of the installation, it was very difficult to perceive the illusion. The “chirping” of the individual partials became dominant, while the interpretation of the sound as a voice was almost impossible. The “cocktail party effect”, by which we can isolate an individual voice in a crowd, appears not to work when that voice is spatialised in this way, suggesting that localisation of the source is more important to this effect than the recognition of timbral quality, for example. The lack of a single locatable source for the resynthesised voice prevents it from being isolated from the conversations that surround it. The presence of other voices causes a loss of intelligibility in the diffuse partials, which become individualised elements in the noisy room.

Conclusion

The two installations The Exploded Sound and Significant Birds use similar techniques of spectral diffusion using partial tracked recordings to quite different ends. The former focuses on the navigation of a sonic structure, giving access to the internal frequency detail of complex sounds. The behaviour of the listeners reflects this. They tend to walk slowly through the space, leaning in to hear the subtle sonic details, even picking up individual speakers and holding them to their ears. Significant Birds is more of a demonstration of a principle of perception: the fusion of complex stimuli into a perceived whole, often referred to as the binding problem . The installation stands as an analogy for the cochlear membrane, separating out the frequencies that make up a complex wave into individual components and thus highlighting the role of the brain in reconstituting the perceived whole.

References

Cage, J. (1978) Silence: Lectures and Writings, new edition. London: Marion Boyars.

Engel, A.K. and Singer, W. (2001) Temporal binding and the neural correlates of sensory awareness. Trends in Cognitive Sciences, 5(1), pp. 16–25.

Gann, K. (1996) The Outer Edge of Consonance. In Duckworth, R.F. and Fleming, R. (eds) Sound and Light: La Monte Young and Marian Zazeela. Lewisburg, Penn.: Bucknell University Press, pp. 153–194.

Grant, M.J. (2005) Serial Music, Serial Aesthetics: Compositional Theory in Post-War Europe. Cambridge: Cambridge University Press.

Helmholtz, H.L.F. (1954) On the Sensations of Tone as a Physiological Basis for the Theory of Music. New York: Dover.

Johnson, M.L. and Larson, S. (2003) “Something in the Way She Moves”: Metaphors of Musical Motion. Metaphor and Symbol, 18(2), pp. 63–84.

Klingbeil, M. (2009) Spectral Analysis, Editing, and Resynthesis: Methods and Applications. PhD, Columbia University.

McAulay, R.J. and Quatieri, T.F. (1986) Speech Analysis/Synthesis Based on A Sinusoidal Representation. IEEE Transactions on Acoustics, Speech, and Signal Processing, 34(4), pp. 744–754.

Normandeau, R. (2009) Timbre Spatialisation: The Medium is the Space. Organised Sound, 14(3), pp. 277–285.

Parry, N. et al. (2008) Locating Drama: A Demonstration of Location-Aware Audio Drama. In Spierling, U. and Szilas, N. (eds) Interactive Storytelling. Lecture Notes in Computer Science. Springer Berlin Heidelberg, pp. 41–43. Available at: http://link.springer.com/chapter/10.1007/978-3-540-89454-4_6 [Accessed 4 December 2013].

Parry, N. (2014) Navigating Sound: Locative and Translocational Approaches to Interactive Audio. In Collins, K., Kapralos, B. and Tessler, H. (eds) The Oxford Handbook of Interactive Audio. New York: Oxford University Press, pp. 31–44.

Parry, N. (2003) The Relocation of Concrete Music in the Environment, Boomtown and Living Steam: Two Site Specific Installations. In Enders, B. and Stange-Elbe, J. (eds) Global Village Global Brain Global Music, KlangArt- Kongress 1999. Osnabrück, Germany: Epos, pp. 370–381.

Puckette, M., Lippe, C. and Apel, T. (n.d.) Sigmund~ Sinusoidal Analysis and Pitch Tracking. Version .05 for Mac Max/Msp 5. Software application.

Shepard, R. (2001) In Cook, P.R. (ed.) Music, Cognition, and Computerized Sound: An Introduction to Psychoacoustics. Cambridge, Mass.: MIT Press, pp. 21–36.

Tittel, C. (2009) Sound Art as Sonification, and the Artistic Treatment of Features in our Surroundings. Organised Sound, 14(1), pp. 57–64.

Topper, D., Burtner, M. and Serafin, S. (2002) Spatio-operational Spectral (S.O.S.) Synthesis. In Proceedings of the 5th International Conference on Digital Audio Effects (DAFX-02). Hamburg, Germany.


Links

Significant Birds documentation: http://www.youtube.com/watch?v=vLlYExpYnSw

The Exploded Sound video and audio: http://www.nyeparry.com/exploded/

Binaural recording of The Exploded Sound: https://soundcloud.com/nyesonic/exploded-sound-binaural

The Exploded Sound 32-channel research show and tell video: http://www.youtube.com/watch?v=d11C3iXudPk

Stereo mockup demonstration of processes used in Significant Birds: https://soundcloud.com/nyesonic/significantbirdsdemo



About the Author:

Nye Parry has made sound installations for major museums in the UK including the National Maritime Museum, the Science Museum and the British Museum in addition to concert works and more than 20 contemporary dance scores. From 2011–13 he was research fellow at CRiSAP (Centre for Research in Sound Arts Practice), University of the Arts, London, during which time he developed the installations “The Exploded Sound” and “Significant Birds” for the Science Gallery Dublin. The exhibition is currently touring museums around the world including in San Diego, North Carolina and Kuala Lumpur. His writing has appeared in journals and books including Organised Sound, Neue Zeitschrift Fur Musik and The Oxford Handbook of Interactive Audio. From 2003 – 2011 he was programme leader for the MA in Sonic Arts at Middlesex University. He has a PhD in Composition from City University and teaches at the Guildhall School of Music and Drama and Trinity Laban Conservatoire. www.nyeparry.com