An alternative approach to 3D audio recording and reproduction

Posted on 1st December 20149th May 2019 by Augustine Leudar (Queen's University)

DOI: 10.5920/divp.2015.34

Abstract

The following paper provides an overview of an alternative method of recording 3D sound scenes using several separate SD card microphones as opposed to using single multi capsule ambisonic or surround sound microphones. Instructions are provided on how to set the microphones up, appropriate directivity and positioning, and speaker setup for reproduction. The advantages and limitations of the approach compared to other sound spatialisation techniques the artist has tried, such as wavefield synthesis and ambisonics, are discussed. Two sound installations that use the technique are used as case studies to illustrate how to implement it effectively. A description of the effectiveness of the spatialisation and public response is also given. The paper is intended to be of practical use to sound artists, sound designers and composers who work with multichannel audio, especially those who create site specific sound installations.

Introduction

When an artist first decides to branch into multichannel audio and the recording of 3D soundscenes they are usually presented with two options; Ambisonic microphones such as the Soundfield, or quadraphonic microphones, such as the Zoom h2. These usually consist of a single mic placed in a central location which records the soundscene around it. However, there is an alternative technique which uses multiple microphones. Although it is simple and effective, it is not well represented in the literature probably due to the fact that it has only recently been easy to implement due to the advent of small, portable standalone SD card microphones. Such technology has allowed for new ways to record relatively large areas at a relatively low cost, making technology and techniques available to artists that were not previously accessible, or which were prohibitively expensive.

The original (and on-going) project, which inspired the technique, required recording a 3D soundfield of 10000m² of tropical rainforest and reproducing it as identically as possible in a 10000m² of tropical botanic garden. The author decided to experiment with different approaches to recording dynamic 3D audio scenes due to dissatisfaction with commonly used techniques. Encouraging initial results from this on-going project led to a diverse range of applications which shall be discussed later.

Proximity illusions

One of the main challenges encountered in spatial audio composition is creating “proximity illusions”, or what are known in wavefield synthesis (WFS) as “focussed sources”. In laymen’s terms this consists of making a sound seem as if it is coming close to the listener. Creating a sense of depth and layering within the listening area, such as the sound of a person walking around a room with accurate localisation, can be problematic as in most multichannel facilities the speakers are lined up around the periphery of the room and ceiling . Ambisonics and WFS’s use of focused sources was not found to be suitable for many site specific applications due to the irregular shape of the spaces involved and the availability of more effective alternatives. One way to describe this recording technique is to say that it is to a distance based amplitude (DBAP) panner what a soundfield mic is to a VST ambisonic panner [1].

The technique consists of placing many microphones around a soundscene and then placing speakers in exactly the same relative positions as the microphones were on playback. This is not simply the distribution of point sources around a room but seeks to fulfil the same function as an ambisonic mic in that it is capable of recording dynamic moving sound- scenes in 3D spaces. Though it requires its own fairly precise speaker placement, it does not require the same degree of accuracy of speaker placement and calibration as other techniques such as WFS and Ambisonics. However, due to its irregular speaker arrays which will change from installation to installation, it has little application in bringing audio into people’s homes or in cinema. Where it does prove very useful is in sound installation, theatre and any site specific event especially where it is not easy to implement precise peripheral speaker arrays such as the ones required by ambisonics and WFS and where convincing proximity illusions are desirable.

Case Study: Ulster American Folk Park and the Royal Opera House

A commission for a sound installation at the Ulster American Folk Park provided an opportunity to experiment with this method. For this installation, the desired effect was of a Presbyterian meeting house (1850s) filling up with people followed by the congregation taking to their seats and conversing a while before a pastor walks up the steps of the pulpit and begins his sermon. Another element of the installation plays with the cocktail party effect so that the listener could hear either the murmur of the room, or sit beside an individual conversation and “eavesdrop”, each conversation containing pertinent historical data. A similar installation was also put in a Catholic mass-house in the same folk park, though this used a different technique.

Microphone and speaker placement and type

8 microphones and 8 speakers were used (Fig. 1) and the speakers were placed in the same positions as the microphones were originally. In general, the microphones should not be placed in a grid or any regular type of configuration; instead they should be placed nearest to the most important sound sources in the scene. In this case they were placed at the entrance of the door (to capture people’s footsteps as they walked in), at the pulpit (to capture the sound of the pastor walking up the steps and beginning his sermon) and then in six separate pews (to capture the sounds of people filling the spaces, sitting down and conversing).

Figure 1. A top view of microphone placement in a church amongst the pews and pulpit. Speakers subsequently occupied the same positions

Synchronisation

SD card microphones can be synced by placing all of them on a table, pressing record on each one and then clapping before deploying them round the sound scene. These recordings can later be easily lined up later in a DAW.

A test in which a second clap was made at the end of the recording showed that after 50 minutes on some recordings a slight desynchronisation between some machines occurred. This drift never exceeded 3 milliseconds. This desynchronisation is unlikely to cause noticeable problems unless very long recordings are made and does not occur at all on shorter recordings (e.g. less than ten minutes). The degree to which this error occurs is likely to vary from machine to machine and it is advisable to test recording devices to see how long you can record for before audible desynchronisation occurs.

The panning law is defined by the directivity of the microphones and the dispersion pattern of the speakers:

In general the directivity patterns of the microphone should be matched to the dispersion pattern of the speakers being used, though this should also be adjusted to reduce bleedthrough from other areas of the room. For example, when people enter the room it is undesirable for the microphone by the pulpit to pick up the sound of people entering the church by the door because when the recording is played back it is undesirable for the sound of the people entering the church to come from the pulpit, only the entrance. Care must be taken that the microphone has the correct directivity and as much as possible only records the sounds in its immediate vicinity. One solution is to use a fairly directional microphone dangling from the ceiling pointing downwards two or three metres above the ground (fig. 2). In this diagram the microphones are spaced in a regular pattern; this would not be the case in most situations. The same principle could be applied by pointing microphones upwards or in various different directions depending on the peculiarities of the soundscene to be recorded.

Figure 2. With careful consideration of microphone directivity and placement, interlocking ‘soundpools’ can be recorded by pointing microphones downwards and only recording what is immediately below them, thus reducing bleedthrough from other areas of the room.

Inevitably there will be some bleedthrough from other areas of the room due to room reflections, etc., no matter what precautions are taken, but the more this effect is minimised the better. Conversely if microphone directivity is too narrow, panning information will be lost as there will be a ‘hole’ between the two pools of sound where neither microphone has picked up, so it is a balancing act between reducing bleedthrough and retaining the approach of moving sound elements to recreate natural panning. Experimenting with different microphones, microphone placement, speakers and directivity could help to further improve the technique in future.

Speaker placement

Once the recording is made, it is important to make sure that the speakers are placed in exactly the same relative positions as the microphones were in whilst recording, or in the case of a permanent sound installation in a museum, as close as the builders and curator will allow. In this case the speakers were placed just out of sight under the pew in front of the original microphone position, approximately 12 inches from their original position. This did not overly distort the spatialisation. Generally they should also be placed in such a way so that the driver emits sound in the opposite direction to which the microphone received the sound, though this is not always strictly necessary.

Royal Opera House

Another sound installation involved a theatre production at the Royal Opera House in Belfast. A steam train was recorded using the technique and then, by placing speakers under the audience’s seats and above them in a similar configuration to the microphones, the effect of a steam train passing directly through the audience was achieved. The effect was made all the more convincing by the vibrations of the seats caused by the speakers.

Enhancing the recordings in the studio

Some of the bleedthrough from microphones may record unwanted sounds, such as the above example of the pulpit recording very faint sounds of people entering despite attempts to adjust directivity. These can be simply faded out in your DAW at key moments in the soundscape such as when people are entering the room, though it will be impossible to remove them entirely.

Wherever possible speakers should be hidden as this enhances the cognitive effects and believability of sound illusions.

Related work

Despite an extensive and on-going search, almost no literature on this technique has been found by the author to date. This is likely due to developments in technology which have made the technique easier to implement. However there are some earlier developments and related techniques. The first was Harvey Fletcher’s “Curtain of sound” in the 1930’s which sought to reproduce a sound of an orchestra by recording using a curtain of microphones whereby one microphone was reproduced by one speaker (see fig. 3).

Figure 3. Harvey Fletcher’s “Curtain of sound”

A similar more recent example is Erwin Roebrook’s 2011 “sonic window”2 in which he placed many microphones in a 2 dimensional square array to record a “window”. Speakers were then placed in the same place on playback to reproduce the same sound window. In the current case this 2D soundfield is being extended into 3 dimensions and dispensing with the gridlike array as well as eliminating microphones from where there are less significant sound events. Another related work is Gilbert Briggs’ “live vs recorded” demonstrations in the 1950s, whereby he recorded 4 instruments and subsequently played them back with the speakers in the same places as the players had been sitting [3] (see fig. 4).

Figure 4. Gilbert Briggs “live vs recorded”

This of course was not a dynamic moving 3D soundscene, but rather four static point sources.

The most similar technique used was by sound artist Jean-Marc Duchenne who replied to an email by the author;

“I actually work on some recordings using a similar technique, ……..I try to keep most of the original spaces, while making them overlap between each other.

The recordings are 10 channels, in different spatial and microphones arrangements, and the diffusion will be 16 or 18 channels. “ [4]

This technique, although similar, differs in its use of diffusion. In the case of the current technique, the amount of playback channels/speakers and microphones must correspond precisely, in both quantity and location, or the effect is lost.

Results

The soundscape in the meeting house portrays people walking into the meeting house, congregating and taking their seats next to you. Members of the public expressed bewilderment and even fear as if the meeting house was filling with ghosts. As the speakers are placed irregularly throughout the space, the sensation of layers of sound and of closeness and distance is present. The listener can walk around some speakers and sit on the outside or the inside of parts of the array. As a result, there is no ‘sweet spot’. The panning relationships are reasonably well preserved; you can hear people walk to distant parts of the room and then sit down, and then you can go and sit down and eavesdrop on their conversation.

For this installation, as most of the recordings were created in the reproduction space, the sounds recorded on-location have a slightly more convincing aspect than if they had been recorded elsewhere, such as in a studio. The sound of feet on the concrete floor is on exactly the correct concrete floor, so the sound is appropriate. The sound of the pews being slammed shut actually is the sound that those very same pews make when a member of the public slams them shut. The sound of the pastor walking up the steps is the sound of someone walking on those exact steps. In this way the sound installation is, although perhaps subtly, more convincing to the audience.

The train effects in the Opera House installation, although played back in a different location, proved extremely effective in showing that to some extent the technique is transferrable as long as the original speaker configuration can be preserved. The effect was no doubt also enhanced due to the fact it was listened to in complete darkness. Public response and reviews were positive:

“Wireless Mystery Theatre use sound to great effect in this show. Using surround sound technology, trains rattle through the theatre, making the audience feel very much like they’re right in the middle of the action.” [5]

“The audio soundscape is wonderfully evocative. The sound of steam trains rumble through the theatre.” [6]

Discussion of advantages and disadvantages

One disadvantage is that room reverb will be doubled – once on the original recording, and again on playback in the space, however, as long as the reverb is not too long, it is not particularly noticeable. There will always be some bleedthrough from other areas of the room onto all microphones, however despite this the author found it the most suitable for site-specific applications.

Another limitation is that the 3D soundfield may be tailored to match the acoustics and environment of a specific space, meaning that it may not be easy to reproduce the results anywhere else unless the environment closely matches the original space. This of course is also an advantage in that it will match the space perfectly for which it is designed. For this same reason, whereas elements such as narrative might be better recorded in the studio, as much as possible any final mixing should also be done in the space and not in the studio.

Reproduction, however, need not be restricted to exactly the same space in all cases. Obviously in the case of a church it would be extremely difficult to convincingly reproduce sounds elsewhere without building another church with the same features. However, as the example of the steam train in the theatre demonstrates, it is possible to recreate the illusions elsewhere in some cases. Potentially conflicting visual cues can also be eliminated by listening to certain elements of installations in darkness. If another environment is reasonably similar, such as a tropical forest soundfield transferred to a tropical botanic garden, as long as the speaker positions are maintained, reproduction should still be effective.

Another disadvantage is that if it is desired that a sound is to come close to the listener, a speaker has to be placed close to that listener; this presents obvious practical and aesthetic problems in open spaces, especially since it is desirable that the speakers be hidden.

The main advantage is that it creates a much more authentic representation of the original 3D soundfield than other techniques the author has used or heard especially with regard to height and proximity.

Future directions

Current projects involve recording large scale sound scenes in the Amazon rainforest with a view to recreating these sound ‘maps’ in tropical botanic gardens in the UK (see Fig. 3. )

Figure 5. Shows microphone placement in the rainforest and subsequent reproduction in similar area of tropical botanic garden. Microphones are placed near important sources – here, a toucan, frogs, water and wasp nest. This sonic map is transferred to the garden.

Miniatures

Experiments have also begun by recording large soundscenes and then playing them back in miniature environments. For example, 10,000m² of rainforest played back in 1m² of Bonzai forest in a gallery. Initial results have proven encouraging. As long as the initial relative positions of the microphones are replicated and the distance between them scaled down proportionately, the miniature soundscape retains coherent spatialisation.

Notes:

[1] Lossius, Trond, Pascal Baltazar, and Théo de la Hogue. DBAP–distance-based Amplitude Panning. Ann Arbor, MI: MPublishing, University of Michigan Library, 2009.

[2] Roebroeks E, 2011, Last accessed Sep 2013, http://www.roebroeks.nl/?paged=2

[3] Gearplus 2007 The 70 year history of Gilbert Briggs and his company – Wharfedale
http://www.gearplus.com.au/products/wharfedale/history/0-history-wharfedale.htm Last accessed Sep 2013

[4] Jean-Marc Duchenne 2013 Personal email communication

[5] http://classygenes.blogspot.ie/2014/02/noel-cowards-brief-encounter-wireless.html Last accessed 09/03/14

[6] http://www.culturenorthernireland.org/article/6260/theatre-review-brief-encounter Last accessed 09/03/14

About the Author:

Augustine Leudar is a sound artist specialising in 3D soundart, audio holograms, and sonic illusions. Extensive experience delivering events and exhibitions has led to a very pragmatic approach to spatial audio. Augustine’s work has been exhibited internationally, including at the National Gallery of the Czech Republic, Glastonbury festival UK, and many other galleries and venues across Latin America and Europe. In 2010, Augustine delivered the world’s largest walk-through multichannel sound installation at the Eden Project, UK which covered over 4 acres of indoor rainforest. Augustine is currently in his third year of his PhD researching plant electrophysiology and spatial audio at the Sonic Arts Research Centre at Queen’s University. Recent work has focused on making tangible unseen processes in the biosphere such as communication across the mycorhizal network. Recent multichannel sound installations have used electrical signaling in plants to influence the composition and spatialisation of the installations.