Sound and image relations: a history of convergence and divergence


This paper examines the topic of sound-image relations in its evolution towards the contemporary context of digital computational audiovisuality and its interactive forms. It addresses the multiplicity of sound and image relations and their different conceptions, and then focuses on aesthetic artefacts that propose interactive experiences articulated through images and sounds. It begins by outlining an overview of the topic in light of a convergence between artistic forms of expression and media technologies, while addressing their different models of sound and image articulation. This overview highlights not only a convergence of sound and image, and their modes of creative production and live presentation, but also a divergence of themes of creative exploration that shape this history. Finally, it focuses on practices that use software as a creative medium for devising dynamic and interactive experiences, their audiovisual modes of expression and the performative nature of their dynamic behaviour.


Overview: Themes and models

The topic of sound-image relations is characterised by the multiplicity of its conceptions. Its possible histories cross the domains of art, technology, and perception, entailing changes in the theoretical and practical foundation on which these relations rest (Daniels and Naumann, 2010). From a diachronic viewpoint its multiple themes comprise debates on the merging of the arts, technological innovation, and the search for new artistic forms of expression. Their foundations and models can range from sensory, structural, or conceptual analogies, to the coupling, transformation or direct manipulation of sound and image through technological means, which already point towards the process-based and interactive nature of contemporary forms of audiovisuality.

This paper summarises the topic, and considers that these contemporary forms do not necessarily claim historical themes of audiovisuality, but creatively reshape it, by exploring the possibilities of software and proposing potentially unique dynamic configurations of images and sounds. Rather than mapping their diversity, the paper highlights strands of artistic and technological development that explore what we term performative connections between sound and image: a convergence between modes of creative production and presentation, and ultimately the performative nature of their procedurally enacted dynamic behaviour.

A meta-overview of the topic is outlined by mapping the paradigms that shape its evolution, while stressing not only a convergence of sound and image, but also a divergence of their models of articulation. Since digital technologies expand all previously conceived possibilities for linking and manipulating the visual and auditory, the starting point of this overview is a contrasting media-imposed separation between sound and image and the concurrent ideal of their aesthetic synthesis.

Separation, correspondence, and synthesis

The contemporary ubiquity of the audiovisual can be seen as an offspring of the beginning of the media society towards the end of the nineteenth century (with Edison’s inventions developing into standardised machines) and the concurrent conceptual interweaving of the arts (with Wagner’s aesthetic effect) whose historical impact bypassed a history of artistic tinkering with technical apparatuses for correlating the visual and auditory realms (Daniels, 2005). These developments relate to the ideas of a technical separation, artistic synthesis, and correspondence between the visual and auditory as put into practice with experimental devices.

The aspiration to see sound as a graphic trace gives way to that of hearing sound when, in 1877, Thomas Edison devised a mode of inscription that enabled the reproduction of sound. Edison later described the idea of a “Kinetoscope” that would “do for the eye what the phonograph does for the ear” (Zielinski, 1999, p.63), and the Lumière and the Skladanowsky brothers merely had to “add a means of projection in order to turn Edison’s invention into cinema” (Kittler, 1999, p.3). These apparatuses “stored sensory data as media fabricated sense perceptions” and “autonomized” ears and eyes (Kittler, 1999, p.3). They imposed a technical separation of sound and image whose re-wiring would be achieved only in the early twentieth century.

The history that led to the invention of the phonograph illustrates a technical separation between sound and image, as the first sound reproduction device could only become functional at the expense of the visibility of its graphic trace. A prerequisite for this invention can be traced back to Ernst Chladni’s experiments in the eighteenth century that first provided an objective relationship between sound vibration and its graphic transcriptionality (Levin, 2003).

In contrast to this objective relation between acoustic and optical phenomena, Louis-Bertrand Castel’s colour-tone correspondences were essentially subjective, as a substitution of sounds by colours. Influenced by models of colour-tone analogies such as those proposed by Aristotle or Athanasius Kircher, Castel aspired to a “mathematically, physically and aesthetically compelling model” of correspondence, and proposed a device for its practical application (Daniels, 2011, p.11). Around 1725, he designed a Clavecin Oculaire that would give colour a lively quality in correspondence to the notes of the western musical scale; performing colour as a musique muette.

Figure 1: Louis-Bertrand Castel's 1740 colour concept compared to Newton's

Following Castel, various artists and inventors deployed devices that either produced light and sound simultaneously or explored the aesthetic qualities of colour and light in a purely visual manner.[1] Within these developments there is a gradual shift from strict colour-tone correspondences to a free play of kinetic light and colour, as a new art form emancipated from music, as proposed by Thomas Wilfred with the art of Lumia. In spite of the proliferation of these devices in the beginning of the twentieth century,[2] these inventions remained tied to their individual creators, failing to be widely adopted as performance tools. Moved by fascination rather than proof, almost every artist or inventor developed his own model of correspondences that, in their lack of compatibility, cancelled each other (Daniels, 2011, p.12).

Conversely, Wagner’s aesthetic ideas had an influential impact, while opposing to (synesthetic) correspondences the ideal of a (synthetic) fusional artistic form. The Gesamtkunstwerk would address in a unified manner both eye and ear: the “entirety of the human capacity for artistic receptivity, not just one element of it” (quoted in Shaw-Miller, 2002, p.50). Within this synthesis, music came to assume a prime position as the integrative banner for the arts, as the “condition to which all arts aspire”, addressing itself to the general, the universal, through its perceived immateriality (Shaw-Miller, 2002, p.56).[3]

Through these developments in modernism, two tendencies coexist: synthesis and purity (Shaw-Miller, 2011). Beyond an integrative synthetic role, seen in operatic performances as holistic sensory experiences, music also became the model for the pursuit of purity and maturation of each art’s material or mode, namely in the visual arts. Concerning conception rather than reception, music becomes “a model as a method, rather than some kind of effect” (Shaw-Miller, 2011, p.31). Freed from the obligation to imitate nature, painting would regard colour, form, and composition per se (Gottdang, 2010), through a formal and procedural analogy to the musical.

Analogies and hybridity

The “ubiquity of the musical in painting” first appeared merely by evocation (musical titles and depictions of instruments), evolving as an effort to integrate the time dimension, namely rhythm or temporal sequences (Gottdang, 2010). Music provided a method to organise pictorial material according to a set of rules, from Bach’s musical rules and poetics to Schoenberg’s move towards atonality and serialism (Maur, 2004). The latter represented, especially for Kandinsky, a way for the arts to learn mutually, “not superficially but fundamentally” (Gottdang, 2010, p.250). Seeking to emulate the non-materiality of music, Kandinsky established parallels between the elements of pictorial and musical vocabularies based on their sensory impressions. This connection then evolved from a spiritual basis towards a formal analogy; a search for a “explicit formal grammar” as a foundation for each art, wherein both the elements of the musical and pictorial are thought of in terms of an abstract structure(Holtzman, 1995, pp.78–84).

These formal analogies moved towards more rational, systematic, and structural approaches. For example, Paul Klee defined relationships between elements, formal sequences, and compositional arrangements by transposing principles such as polyphony or counterpoint into pictorial space (Maur, 2004). Ultimately, new forms of abstraction emerged following more constructive or concrete approaches, as seen in the work of Mondrian, although not necessarily claiming structural links to the musical (Gottdang, 2010).

Jewanski and Naumann (2010) refer to this “gradual transfer of structural methods of creative production”, both in painting and in music, as “structural analogies” that include the qualities of the elements and formal vocabulary of each art, the integration of the space and time dimension, and a borrowing of techniques, methods, and procedures.[4]

These developments reflect a convergence of the arts characterised by Parrat (1994) as both “phenomenological and structural”, in that it concerns their “sensory” and “rational” foundations, from their formal vocabulary and space and time dimensions to their structural “logical-mathematical” principles of organisation. This search for a formal language with explicit rules is seen by Steven Holtzman (1995) as an anticipation of rule-governed creativity, as developed in the abstract, digital computational realms.

As each art explored its non-exclusive dimensions, or what they shared as a formal and procedural interdependency, the established conceptions of the visual concerning the spatial and the acoustic concerning the temporal were gradually defied. As Shaw-Miller (2002) argues, even within the specialisation strains of modernism, music and the visual arts were not so ontologically discrete or mutually exclusive, but rather hybrids transgressing media-demarcated boundaries.

Other forms of hybridity emerged, entailing not only the expansion of materials but also of the procedures and processes for making art. In Futurist and Dadaist “poly-sensory, open forms” (Bosseur, 2006) music provides the model for “performative acts” (Shaw-Miller, 2010), as a performative connection based on actions and events.[5]

The futurist “modernism of sensation” was also a direct reaction to the intensity of machines and media-technologies of the industrialised society, as “media gradually started taking over an area of human perception that used to be reserved for the classical arts and their various genres” (Daniels, 2004). As media take over, art responds by conquering new possibilities for aesthetic creation.

New media, new art forms

New media and new art forms emerged, particularly with the new medium of film. Walter Ruttmann proposed a new form of “painting with time” (Malerei mit Zeit) described in 1919 as a way of “bringing an entirely new kind of life-feeling into artistic form” (Helfert, 2004). In continuity with the idea of musical as a model for a visual art, Ruttmann first embraced the film medium in its silent form, developing hand-made animations such as Light-Play Opus 1 (1921). He later abandoned this form of manual production, exploring representation of the concrete world that could only be achieved with media-technologies, with the film Berlin: Symphony of a Metropolis (1927), or Weekend (1930) as a concrete sound montage (Daniels, 2005). His work was that of a new kind of artist, not only embracing both sound and image, but also exploring the creative possibilities of media technologies.

Figure 2: Light-Play Opus 1 (1921) by Walter Ruttmann [Stills]

Abstraction as non-narrative form

The cinematic abstraction developed by Ruttmann was termed Absolute Film (by analogy with absolute music), as a non-representational art form without any reference to things other than the intrinsic values of light, colour, form developing in time, or “things which could be expressed uniquely with cinematographic means” (Moritz, 1999). Its developments in the 1920s already suggested what Malcolm Le Grice (2001) defines as two broad directions of experimental film as an art. The first concerned “image as abstraction”, or a dynamic non-representational form based on colour, shape, movement, and rhythm that Walter Ruttmann, Viking Eggeling, or Hans Richter devised by analogy to music as a “non-narrative temporal structure”. The second direction entailed incorporating (and often abstracting) filmed images through montage in a non-narrative, or even “anti-narrative” fashion, as in Ballet Mécanique (1924) by Fernand Léger (Le Grice, 2001).

Both these conceptions of absolute film, as developed in its silent (with sound) form,[6] find their continuity with the advent of sound. The idea of an abstract art uniting the visual and auditory is explored through synchronisation practices. The possibilities of montage, already explored as an art in itself, would extend to auditory elements, namely under the notion of a contrapuntal, asynchronous use of sound (Jutz, 2009).

Audiovisual coupling and transformation

In line with this break with the naturalistic effect and mere reproductive function of the cinematic medium, new productive possibilities emerged with optical sound-technology.[7] The inscription of sound and image in the same medium gave rise to a diversity of machine-based image and sound relations: not only their coupling and simultaneity but also new possibilities for their transformation and synthesis.

Sound and image could be synchronised in order to achieve a “sensorial totality”, an abstract sensory cinema (Lista, 2004). According to Michel Chion, synchronisation promotes specific phenomena of audio-vision,[8] where the concomitance of synchronous audio and visual events leads to the forging ofsynchresis, as a form of perceptual synthesis. It allows a flexibility in the conjugation of sound and images that “strictly speaking have nothing to do with each other” but may form “inevitable and irresistible agglomerations in our perception” (Chion, 1994, p.63).

This flexibility is particularly explored through either tight or loose synchronisation of abstract visual forms with pre-existing or to specially devised musical soundtracks. For example, Oskar Fischinger started using strict synchronisation of coloured forms with popular music, but soon moved towards free forms of association.[9] His idea of a complementary nature of form, as opposed to a direct correspondence or illustration of music, can also be seen in the work of Mary Ellen Bute, who explored different ways of “seeing sound” (Naumann, 2009).

Image-to-sound and sound-to-image conversion

Optical sound also allowed for a direct analogue translation between sound and image as a form of technical and aesthetic synthesis. Oskar Fischinger’s artistic intents, along with Rudolf Pfenninger’s scientific investigation in the 1930s, pertain to the scope of experiments often named “hand-drawn”, “animated”, “ornamental”, and/or “synthetic” sound, which used the photocell as an image-to-sound converter.[10] Norman McLaren would explore this possibility, namely by drawing sound modulations on the filmstrip, later producing one of the most well known synthetic sound films with Synchromy (1971). He also made experimental use of a cathode-ray oscillograph for generating visual patterns in Around is Around (1951); the technique also finds continuity in the work of Mary Ellen Bute. She found in the oscilloscope the “light brush she had always dreamed of” as a means to visualise tones directly as well as to combine art, technology and science (Naumann, 2009, p.48).[11]

Experiments with synthetic sound gave way to electronic image generation with cathode ray oscilloscopes that allowed the observation of waveforms. Sound could be used as an input, but even if there is no clear correlation between the frequency’s perceived pitch and the visual figures, these processes were aesthetically explored when filmed and integrated in abstract animations. These forms of abstraction not only were based on mathematical principles of music composition, in Bute’s case inspired by Joseph Schillinger’s mathematical basis of the arts (1948), but also explored the waveform as a foundation interlinking the visual and auditory (Thoben, 2010).

While the mathematical basis of music still provided a method for structuring time (as continued by John Whitney), optical sound technology and electronic image generation presented a technical counterpoint to this model, as an image-to-sound and sound-to-image conversion. These were new foundations for linking auditory and visual phenomena, intimately tied to the materiality and operative possibilities of media technologies.

Operative possibilities of film and video

As media-technological operations become the model of sound-image articulation in the middle of the twentieth century, we can observe two tendencies: the exploration of film as a perceptual device through operative analogies between visual and auditory processes, and the exploration of the analogue electronic unicity or the audiovisual, emphasising transformation and paving the way towards interaction (Lista, 2004).

A new form of engagement with the film medium involved a shift from the musical analogy towards the material foundations and operative processes for structuring audio-visions.[12] In this sense, structural film “expressly makes its form its content” (Helfert, 2004). Within these practices, the study of film perception played a central role in the exploration of “operative analogies between seeing and hearing”(Lista, 2004, p.71). This play with the process of perception becomes a means of participation in the work; a strategy that is particularly evident in the flicker films of the 1960s and 70s.

Material and structure: film as perception device

Peter Kubelka’s film Arnulf Rainer (1958–60) promoted a discontinuity of the visual and auditory, as effects perceived differently by eye and ear. Alternatively, the display of the film on a wall reveals the nature of its serial method of structural organisation; a metric process contrasting with the flickering effect of its projection. Tony Conrad also extended the idea of film to the viewer as a cinema of perception with his film The Flicker (1966), where patterns of frames are organised by analogy with the combination-tone effects that are responsible for consonance in musical sound, producing an accelerating stroboscopic effect (Conrad, 1996).

FIG03-kubelka AR.jpg

[Figure 3: Arnulf Rainer (1958–60) by Peter Kubelka, as exhibited at See this Sound, Lentos Art Museum, Linz, 2009

These flicker films aimed at “making the spectator conscious of the preconditions of film technology” by playing with the human perceptive apparatus (Helfert, 2004), implicating the audience in an active reception process. Paul Sharits took this notion further with his installation Shutter Interface (1975), where partially overlapping projections of flickering colours induce chromatic phenomena, promoting a type of reception that involves the perceptual and physical activity of the viewer.

According to Buchmann and Bellenbaum (2010), these ways of emphasising the formal structural elements of cinema relate to a broader spectrum of “conceptual correlations between sound and image”, in affinity with the formal and processual principles of minimalism “based on repetition premised on reception”, transposed to media-technological procedures.[13] As a countercurrent to the former, the categories of intuition and chance served as the motor for a gradual movement towards open, process-based, and participatory forms (Buchmann and Bellenbaum, 2010).[14] While structural approaches to film promoted perception as a form of participation, the emergence of analogue electronic technologies provided the means for the interactive manipulation of audiovisual signals.

Electronic unicity: video as interaction device

While music provided a method for structuring time in the film medium, electronic sound, in its openness to interference and interaction, would provide the operative model for video. This is reflected in the way that Nam June Paik transfers the principles of Cage’s experimental music to electronic television, while opposing indeterminism to serial formalism: “INDETERMINISM and VARIABILITY is the very UNDERDEVELOPED parameter in the optical art, although this has been the central problem in music for the last ten years”, therefore “a new decade of electronic television should follow the past decade of electronic music” (quoted in Daniels, 2005). This is achieved in the exhibition Exposition of Music – Electronic Television (1963) through an interactive repurposing of the broadcasting functions of TV and the reproductive functions of record players and tape recorders that were directly manipulated by the audience. Due to the lack of recording technology these first experiments were with television as a continuous flux of electronic signals, whose processual nature allowed for real-time manipulation (Spielmann, 2010). By subsequently creating a number of acoustic-oriented inferences in the image process, Paik “inaugurates the road to manipulable images through sound” (Kwastek, 2010, p.165).

Video is defined by its manipulation of electronic signals and, as Spielmann (2010) explains, it can be simply signal processing rather than recording; artists soon engaged in an exploration of these aspects through the development of video synthesisers and image processing techniques. The early work of Steina and Woody Vasulka reflected an interest in immediately processed and displayed video; they used feedback loops and audio synthesiser inputs to generate and alter the video signal, as in Soundsize (1974), while also merging camera-fed imagery further processed with video noise, as in Noise Fields (1974).

Figure 4: Soundsize (1974) by Steina and Woody Vasulka [still]

The new electronic medium then represented a new stage in the machine-supported manipulation of sounds and images in which direct manipulation of real-time processes is paramount.[15] As Woody Vasulka has stated, “there is an unprecedented affinity between electronic sound and image-making. […] this time the material, i.e. the frequencies, voltages and instruments which organised the material were identical” (Vasulka, 1992, p.12). This “unicity” of the raw material of video, “noise, as an unformed electronic signal”, forms the basis of electronic audiovisuality (Spielmann, 2010, p.318). It is this technical continuity between sound and image that allows a conception of video as interaction device. However, contrary to the forms of audience interaction promoted by Paik, in the work of Steina Vasulka, for example Violin Power (1970–78), interaction is applied to the creative process, while playing the video as an instrument.

Many of the inherent characteristics of electronic technologies already contain possibilities that would be extended with digital technologies, namely the “transformative, process-oriented, multidimensional, open-ended” forms of audiovisuality they allow (Spielmann, 2004). The real-time manipulation of audio and visual signals stresses their performative connection, as the simultaneous live presentation of a production process.

Figure 5: Violin Power (1970–78) by Steina Vasulka [stills]

Digital audiovisuality

With the medium of digital computation all previously conceived relations can coexist and be reformulated, rendering virtually infinite the possibilities for linking, generating, and manipulating the visual and auditory. We can see this as a gradual convergence towards a virtualisation and “digital fusion” of sound and image on an underlying amodal level (Zénouda, 2006); when transcoded into a numerical representation and governed by algorithmic procedures, the optical and the acoustic become “calculable, transformable, and manipulable at will” (Daniels and Naumann, 2010, p.8).

Due to this creative potential, new forms of audiovisuality arise in both continuity and rupture with previous forms, as they evoke but also reshape its themes; in particular concerning digital computational (software-based) audiovisuality and the creation of interactive audiovisual experiences.[16] These themes ultimately converge, however, pertaining to different motivations and relating to conceptually differentiated phases of the artistic use of computers; as proposed by Mathias Weiss (2005), the first concerns the creation of form through algorithmic processes, and the second, the use of the computer as a dynamic interactive medium, as explored by Myron Krueger.

Digital computational audiovisuality

Many of the artists that first had access to computer laboratories focused on the algorithmic generation of images and not particularly on the use of computers to articulate relations between sound and image. In this context, John Whitney emerges as a paradigmatic pioneer whose interest was in the music-like qualities of abstract dynamic form. He finds in the computer the means to define precise compositional relations between visual elements and music. Initially, these were mathematically structured animations conceived and designed in relation to pre-existing music, such as Permutations (1966–68) assisted by Jack Citron at IBM Labs, or Arabesque (1975), assisted by Larry Cuba.[17] Only in the 1980s, with the advent of personal computing and real-time graphics, was Whitney able to fully develop his ideas of “digital harmony” and the search for a common syntax where “tone-for-tone, played against action-for-action” (quoted in Levin, 2010, p.279), as demonstrated in Spirals (1987) or MoonDrum (1989).

Figure 6: Permutations (1966–68) by John Whitney [still]

In spite of technical advances in computer graphics and computer music during the 1960s and 70s, it was not yet possible to generate both sound and image in real-time. In line with these motivations, Lillian Schwartz and Ken Knowlton often mixed computer-generated imagery with animations that developed in collaboration with computer musicians, for example Pixillation or Mathoms (1970–77) with music by F. Richard Moore, or Mis-Takes (1972) with Max V. Mathews.[18] Schwartz later used these animations in a live performance context with On-line (1976), accompanied by musical improvisations.

A computer system capable of synthesising both animation and sound in real-time was developed by Laurie Spiegel at Bell Laboratories. The VAMPIRE (1974–76)[19] included a number of controls to modulate and perform image and sound parameters in real-time. Even if it remained confined to the laboratory, Spiegel defines it as an “unrecordable room sized live performance visual instrument” (Spiegel, 1998), emphasising the real-time production of sound and image. From then on, and with the introduction of the personal computer, the scope of real-time audiovisual software significantly broadened, as did the possibilities for the composition or generation of sound and image as well as their interactive manipulation, not only on the part of their creators but also on the part of the audience.

Audiovisuality and interactivity

We then encounter a theme of creative exploration that concerns the artistic use of the computer as a dynamic medium, through the creation of interactive experiences articulated through images and sounds. However, its focus is not necessarily on the creation of relations between the visual and auditory, but rather on the interactive experience itself. As suggested by Myron Kruger “Response is the medium! … It is the composition of these relationships between action and response that is important … The beauty of the visual and aural response is secondary” (Krueger, 2003, p.385).

Krueger was one of the pioneers of an artistic application to real-time interaction through graphic interfaces in the 1970s, with Videoplace, an installation that allowed the audience to interact with their silhouettes and other graphical objects (Kwastek, 2010). The system was gradually perfected as a continuous experimentation in interactive art, using various techniques of image processing to mediate the interaction of distinct interactive cause-effect relations, while also introducing audio responses(Krueger, 2003). With a similar aim of exploring intuitive forms of interaction with computers, David Rokeby developed an interactive installation named Very Nervous System (1986–90), where the video image (hidden to the user) is the interface for sound as an extension of the body. In contrast to Krueger’s concern with a precise attribution of cause and effect, Rokeby intrigued the audience exclusively with the immediacy of sound responses to their bodily movements.

Figure 7: Projection [stills] from Videoplace by Myron Krueger

These works extend the idea of a performative connection between sound and image towards the role of the audience, according to the notion of interactive performativity proposed by Levin (2010). This notion is used to encompass a diversity of audiovisual interactive systems as artworks, experimental creations, games, or interfaces, that often moving between different forms and contexts. It underlines what they share as the “quality of a feedback loop” that can be established between the system and its users, “allowing them to explore the possibility-space of an open work” while “discovering their potential as actors” (Levin, 2010, p.271).[20] Interactive performativity emphasises a dimension of the experience of a work that is performed by its audience; the system depends on, and allows the user to perform, its outcomes. Sound and image are both the means through which the user interacts and the products of interaction, as an aesthetic experience.

Practices, principles, and medium

In its contemporary manifestations audiovisuality becomes ubiquitous, while encompassing a diversity of genealogies and aesthetic purposes; “owing to this diversity of origins and intents, the formal scope of what might be considered audiovisual software art is quite large as well” (Levin, 2010, p.271). In their diversity, these are artefacts whose subject matter is not necessarily tied to relations between sound and image. However, by being prospective in exploring the possibilities of software, they devise potentially unique dynamic configurations of images and sounds.

In order to draw attention to the specificity of these software-driven systems and their diversified nature as aesthetic artefacts, we resort to the “principles” that motivate their development, as proposed by Levin (2010). These comprise sound and music visualisation, the transmutability of digital data, generative autonomy, and interactive performativity. These correspond to the use of sound or music “to generate aesthetic or analytic visualizations”, to works that “map ‘real-world’ data signals to graphics and sound”, or works that “use human performances to govern the synthesis of animation and music”, as well as to “generative artworks [that] produce animations and/or sound autonomously – from their own intrinsic rule-sets” (Levin, 2010, p.277).

These principles emphasise the creative possibilities of a medium in which “data and process are the major site of authoring” (Wardrip-Fruin, 2006, p.381); they correspond to different ways of exploring the mapping of a given input data or source information into visual and auditory form, and to the possibility of devising dynamic audiovisual behaviours and responses to interaction.

The notion of transmutability (including visualisation and sonification practices) puts an emphasis on data as information or content, its mode of representation and perception, and the mediating transformational process as subject matter. In turn, generative autonomy and interactivity emphasise processes, as observable activities performed by the work, defining its surface and supporting interaction.[21] Rather than mere generalisations of sound and image relations (as data mappings), these notions underline how sound and image acquire meaning through action, as the products of processes or operations performed by the work. Sound and image, in their relations, become procedurally enacted dynamic articulations of visual and auditory modes that can be subjected to interaction.

Creative possibilities and aesthetic qualities

On one level, what is highlighted is the possibility to create behaviour, whether autonomous or interactive. On another level, what is emphasised as a distinctive quality of these systems is the dynamics of their behaviour. In contrast to other time-based forms of audiovisuality, they have not only a transient, but also a variable nature, whose experience entails the temporal simultaneity and spatial co-attendance of the user. In other words, these works’ “content is their behavior”, and not merely the output that streams out (Hunicke et al., 2004). Their experience is not limited to the audiovisual or sensorial qualities of expression, but also the procedural ones.[22]

Sound and image are then a tangible surface expression of the “expressive processes”, which according to Wardrip-Fruin (2006) are those that more evidently contribute to the work’s meaning and experience. As aesthetic materials, sound and image are tied to the “processual” and “performative aesthetic qualities” of works that occur while running as processes performed in real-time (Broeckmann, 2005). In contrast to the notion of interactive performativity discussed earlier, which emphasises the role of the audience as user, the view of performativity proposed by Broeckmann (2005) expresses a quality of an artefact in operation, and the live dimension of the presentation of an execution (with or without the participation of the audience). The expression and experience of these artefacts is then shaped by their variable (autonomous or interactive) behaviour in its visual and auditory realisation.

Convergence and divergence

This narrative stressed one of the possible histories of convergence of sound and image, artistic forms of expression, and media technologies. Its aim was also to reflect a divergence regarding the plurality of its themes of creative exploration, entailing different conceptions of sound and image relations as well as cultural foundations and media induced models of articulation. This text summarises the topic of sound and image relations considering contemporary practices that, in their diversity, often move ahead of theory; they creatively reshape audiovisuality within the digital computational medium, beyond its dominant themes or approaches.

By navigating the paradigms and models that shape the evolution of the topic towards these contemporary practices, this overview stresses a gradual performative connection between sound and image, concerning their modes of creative production and live presentation.

In a move from a sensory basis towards structural methods and media-technological procedures, the ideal of a synthesis between the arts and the senses finds a counterpart in media technologies as models of articulation of sound and image. The discrete nature of film contrasts with the processual flux of video as an audiovisual medium, emphasising interference and interaction as a model emancipated from the musical. However, music continues to play its role in the structuring of audio-visual experiences. In this sense, Sandra Naumann (2011) stresses the idea of a growing “musicalization of the visual arts” as an incorporation of musical aspects, such as “abstraction, time, expansion into space, and real-time production and liveness”. This process can also be interpreted with a consideration that these aspects, rather than being strictly musical, pertain to non-exclusive dimensions of artistic forms of expression; as discussed earlier, they explore what they latently share, as dimensions, structures, methods, and procedures, as well as technological foundations (material and processual).

The topics mentioned by Naumann can be read accordingly, where the move towards “non-representationality, having music as a model” reflects the search for an explicit formal grammar underlying each art form; an abstract structure that concerns conception and method of creative production. The integration of “the musical dimension” of time through the “use of compositional principles” in film becomes tied to the exploration of its structural, discrete units, as a media-technological based model or audiovisual articulation. The “expansion of the visual into space” reflects a move towards immersive experiences involving both sound and image; the creation of multi-sensory experiences, multimedia environments, or audiovisual performances.

Finally, the strand of “improvisation and real-time production as performance” stresses a performative connection as an operative logic of the live production and presentation of both sound and image. As discussed earlier, this performative connection moves from a procedural interdependency between modes of creative production towards the live dimension of these actions. Throughout this narrative, the notion of performativity also shifts from human control to a quality of an aesthetic artefact, as the live dimension of its autonomous or interactive performance in its visual and auditory realisation.

The “musicalization of the visual” is thus accompanied by a gradual emancipation of the musical towards an anchoring in the creative possibilities of media technologies and ultimately in their “digital material and computational (software-driven) logic” (Manovich, 2001). The topic of audiovisuality is reshaped with practices that are particularly speculative and prospective in exploring the possibilities of software in their different aesthetic intents. Within the potential diversity of these aesthetic artefacts, sound and image become the expression of their distinctive dynamics, of the variable (and often indeterminable) behaviour that defines their meaning and experience.


Bogost, I. (2008) 'The Rhetoric of Video Games.' In Salen, K. (ed.) The Ecology of Games: Connecting Youth, Games, and Learning. Cambridge, Massachusetts: MIT Press.

Bosseur, J.-Y. (2006) Musique et arts plastiques: Interactions au XXe Siècle. Paris: Minerve.

Broeckmann, A. (2005) Image, Process, Performance, Machine. Aspects of a Machinic Aesthetics. Refresh! 2005. International conference on the histories of media art, science and technology. Canada: Media Art Histories Archive.

Buchmann, S. and Bellenbaum, R. (2010) 'Conceptual correlations of sound and image.' In Daniels, D. and Naumann, S. (eds.) Audiovisuology: compendium. Cologne: Verlag der Buchhandlung Walther König.

Chion, M. (1994) Audio-Vision: Sound on Screen. New York, Columbia University Press.

Conran, T. (1996) Interview by Brian Duguid. [Online] Available here.[Accessed 8 July 2013].

Daniels, D. (2004) 'Media → Art / Art → Media: Forerunners of media art in the first half of the twentieth century. Media Art Net.' [Online] Overview of Media Art. Available here [Accessed 20 April 2008].

Daniels, D. (2005) Sound & Vision in Avantgarde & Mainstream. Media Art Net. [Online] Sound and Image. Available here [Accessed 24 March 2008].

Daniels, D. (2011) 'Prologue. Hybrids of Art, Science, Technology, Perception, Entertainment, and Commerce at the Interface of Sound and Vision.' In Daniels, D., Naumann, S. and Thoben, J. (eds.)Audiovisuology 2: Essays. Cologne: Verlag der Buchhandlung Walther König.

Daniels, D. and Naumann, S. (2010) 'Introduction.' In Daniels, D. and Naumann, S. (eds.) Audiovisuology: compendium. Cologne: Verlag der Buchhandlung Walther König.

Gottdang, A. (2010) 'Painting and music.' In Daniels, D. and Naumann, S. (eds.) Audiovisuology: compendium. Cologne: Verlag der Buchhandlung Walther König.

Helfert, H. (2004) Technological Constructions of Space-Time: Aspects of Perception. Media Art Net. [Online] Overview of Media Art. Available here  [Accessed 5 June 2008].

Holtzman, S. R. (1995) Digital mantras: the languages of abstract and virtual worlds. Cambridge, MA, MIT Press.

Hunicke, R., Leblanc, M. and Zubek, R. (2004) 'MDA: A Formal Approach to Game Design and Game Research.' Proceedings of the Challenges in Games AI Workshop, Nineteenth National Conference of Artificial Intelligence. San Jose, CA: AAAI Press.

Jewanski, J. (2010) 'Color organs: from the clavecin oculaire to autonomous light kinetics.' In Daniels, D. and Naumann, S. (eds.) Audiovisuology: compendium. Cologne: Verlag der Buchhandlung Walther König.

Jewanski, J. and Naumann, S. (2010) 'Structural analogies between music and the visual arts.' In Daniels, D. and Naumann, S. (eds.) Audiovisuology: compendium. Cologne: Verlag der Buchhandlung Walther König.

Jutz, G. (2009) 'Not married: Image-sound relations in avant-garde film.' In Rainer, C., Rolling, S., Daniels, D. and Ammer, M. (eds.) See this sound. Cologne: Verlag der Buchhandlung Walter König.

Kittler, F. A. (1999) Gramophone, Film, Typewriter. Stanford, CA: Stanford University Press.

Krueger, M. (2003) 'Responsive Environments' (1977). In Wardrip-Fruin, N. and Montfort, N. (eds.) The New Media Reader. Cambridge, MA: MIT Press.

Kwastek, K. (2009) 'Embodiment and Instrumentality.' Digital Arts and Culture Conference. DAC'09. Irvine: UC Irvine.

Kwastek, K. (2010) 'Sound-image relations in interactive art.' In Daniels, D. and Naumann, S. (eds.) Audiovisuology: compendium. Cologne: Verlag der Buchhandlung Walther König.

Le Grice, M. (2001) Experimental Cinema in the Digital Age. London, British Film Institute.

Levin, G. (2010) 'Audiovisual software art.' In Daniels, D. and Naumann, S. (eds.) Audiovisuology: compendium. Cologne: Verlag der Buchhandlung Walther König.

Levin, T. Y. (2003) Tones from out of Nowhere: Rudolph Pfenninger and the Archaeology of Synthetic Sound. Grey Room, 32-79.

Lista, M. (2004) 'Empreintes sonores et métaphores tactiles.' In Duplaix, S. and Lista, M. (eds.) Sons et Lumières: Une histoire du son dans l’Art du XXème Siècle. Paris: Editions du Centre Pompidou.

Manovich, L. (2001) The Language of New Media. Cambridge, MA: MIT Press.

Maur, K. V. (2004) 'Bach et l'art de la fugue: modèle structurel d'un language abstrait.' In Duplaix, S. and Lista, M. (eds.) Sons et Lumières: une histoire du son dans l’Art du XXème Siècle. Paris: Editions du Centre Pompidou.

Moritz, W. (1979) Non-Objective Film: The Second Generation. Film as Film, Formal Experiment in Film, 1910-1975. London: Hayward Gallery.

Moritz, W. (1999) The Absolute Film. WRO99, Media Art Biennale. Wrocław, Poland.

Murray, J. H. (1997) Hamlet on the Holodeck, The Future of Narrative in Cyberspace. Cambridge, MA: MIT Press.

Naumann, S. (2009) 'Seeing Sound: the short films of Mary Ellen Bute.' In Lund, C. and Lund, H. (eds.) Audio-Visual: On visual music and related media. Stuttgart: Arnoldsche art publishers.

Naumann, S. (2011) 'The Expanded Image: On the musicalization of the Visual Arts in the Twentieth Century.' In Daniels, D., Naumann, S. and Thoben, J. (eds.) Audiovisuology 2: Essays. Cologne: Verlag der Buchhandlung Walther König.

Parrat, J. (1994) Des relations entre la peinture et la musique dans l'art contemporain. Nice: Z'éditions.

Shaw-Miller, S. (2002) Visible Deeds of Music: art and music from Wagner to Cage. New Haven, CT: Yale University Press.

Shaw-Miller, S. (2010) 'Performance art at the interface between the visual and the auditive.' In Daniels, D. and Naumann, S. (eds.) Audiovisuology: compendium. Cologne: Verlag der Buchhandlung Walther König.

Shaw-Miller, S. (2011) 'Separation and Conjunction: Music and Art, circa 1800-2010.' In Daniels, D., Naumann, S. and Thoben, J. (eds.) Audiovisuology 2: Essays. Cologne: Verlag der Buchhandlung Walther König.

Spiegel, L. (1998) 'Graphical Groove: Memorium for a Visual Music System.' Organised Sound, 3, pp.187-191.

Spielmann, Y. (2004) Video and Computer: The Aesthetics of Steina and Woody Vasulka. Daniel Langlois Foundation [Online]. Available here [Accessed 14 May 2010].

Spielmann,, Y. (2010) 'Video, an audiovisual medium.' In Daniels, D. and Naumann, S. (eds.) Audiovisuology: compendium. Cologne: Verlag der Buchhandlung Walther König.

Thoben, J. (2010) 'Technical sound-image transformations.' In Daniels, D. and Naumann, S. (eds.)Audiovisuology: compendium. Cologne: Verlag der Buchhandlung Walther König.

Vasulka, W. (1992) 'Curatorial Statement.' In Dunn, D., Vasulka, W., Vasulka, S. and Weibel, P. (eds.) Eigenwelt der Apparatewelt: pioneers of electronic art. Linz: Ars Electronica.

Wardrip-Fruin, N. (2006) 'Expressive Processing: On Process-Intensive Literature and Digital Media.' PhD, Brown University.

Weibel, P. (1992) 'The apparatus world - a world unto itself.' In Dunn, D., Vasulka, W., Vasulka, S. and Weibel, P. (eds.) Eigenwelt der Apparatewelt: pioneers of electronic art. Linz: Ars Electronica.

Weiss, M. (2005) 'What is Computer Art? An attempt towards an answer and examples of interpretation. Media Art Net.' [Online] Generative Tools. Available here [Accessed 12 June 2008].

Zénouda, H. (2006) 'Images et sons dans les hypermédias: de la correspondance à la fusion.' Thèse de doctorat: Université Paris XIII.

Zielinski, S. (1999) Audiovisions: Cinema and Television as Entr'actes in History. Amsterdam: Amsterdam University Press.


[1] The former can be exemplified by Frédéric Kastner’s Pyrophone (1870), Mary Hallock-Greenewalt’s Sarabet (1919), Alexander László’s Sonchromatoscope (1925), or Loyd G. Cross’sSonovision (1968). Artists such as Alexander Wallace Rimington, with his Colour-Organ (1893), or Bainbridge Bishop with the concept of painting music (1877) explored free forms of association. In turn, a free play of colour and light is seen from Thomas Wilfred’s Clavilux(started in 1919), Vladimir Baranoff-Rossiné’s Piano Optophonique (1920), Zdeněk Pešánek’sSpectrophone (1926), and Charles Dockum’s MobilColor Projectors (started in 1936), to Fischinger’s Lumigraph performances (of the 1950s).

[2] As Jewanski explains, many factors led to the proliferation of colour organs and kinetic light apparatuses in the twentieth century, such as developments in electricity, renewed interest in Pythagorean and theosophical ideas of harmony and cosmic order, and beliefs in synesthesia and studies in sensory physiology (Jewanski, 2010).

[3] Repercussions of these ideas would extend to Scriabin’s forms of simultaneity, or to correspondences between sensory impressions as proposed by Baudelaire, through a form of parallelism to musicality as a quality of all art forms. While the latter formulated correspondences based on analogies between the sensory impressions, the conception that all art has musicality found its apotheosis in Scriabin’s operatic performances as holistic sensory experiences.

[4] From the point of view of painting, these analogies imply a borrowing of the time dimension, relations between the elements of the vocabulary of each art, and the transfer of compositional principles. In music, there is a similar move for the inclusion of the spatial, including relations between elements, analogies to colour contrasts and nuances, the borrowing of methods and procedures (eg. collage), and, ultimately, methods to deal with new media materials, as in tape music (Jewanski and Naumann, 2010).

[5] Performative acts involve “confrontation (direct engagement with the audience) and simultaneity (more than one thing happening at the same time)”; according to Shaw-Miller, Futurism and Dada maintain the essential aesthetic characteristics “of simultaneity, noise, humour, provocation, and an aspiration to join art and life” that would later be central for artistic practices characterised as in flux and inter-media (Shaw-Miller, 2010, pp.260–261).

[6] These are silent (with sound) films, because they were often presented with accompanying music, pre-existing or especially composed for the film (Jutz, 2009).

[7] The Tri-Ergon sound-on-film process was an optical recording process, therefore allowing for the inscription of sound and image in the filmstrip (Jutz, 2009).

[8] The term audio-vision makes reference to a perceptual mode of reception that sound and image, in their artificially constructed relations as audio-visual forms, place the spectator in. Michel Chion explains this notion referring to an audio-visual contract, as “the opposite of a natural relationship” or “a sort of symbolic pact to which the audio-spectator agrees” assuming that sound and image cooperate in one and the same entity or world (Chion, 1994).

[9] According to Moritz, his films had an influential impact on experimental filmmakers concerned especially with the interplay between sound and image, as diverse as Len Lye, Mary Ellen Bute, and Norman McLaren. Similarly this influence continues to a second generation of abstract filmmakers such as James Whitney, Jordan Belson, and Harry Smith, who shared his mystical beliefs (Moritz, 1979).

[10] Previous ideas on these creative productive possibilities can be traced to Raoul Hausmann’s Optophonic theory and device for transforming visible forms into sound and vice-versa. László Moholy-Nagy’s text Production-Reproduction (1922) also advocated the production of original sounds by means of opto-acoustic notation (see Levin, 2003).

[11] This interdisciplinary enterprise is illustrated by several collaborations, namely those with Thomas Wilfred, Lev Theremin, Joseph Schillinger, or Norman McLaren.

[12] This idea contrasts with practices implying direct intervention in the material surface of film and exploring rhythmic, tactile, kinetic sensations (as done by Len Lye, Hy Hirsch, or Harry Smith), that are also based on improvisational methods.

[13] These conceptual correlations borrowed the process-oriented formal idiom of minimalism in an exploration of the “phenomenological materiality of sound-image correlations” (Buchmann and Bellenbaum, 2010, p.351).

[14] These are line with Marcel Duchamp’s notion of “anti-retinal” art and the constitutive role of the viewer in the work, which aligned with John Cage’s work and George Brecht’s events extends to Fluxus and intermedia art (Buchmann and Bellenbaum, 2010).

[15] As Peter Weibel stresses, “the signal itself is no longer a carrier for depicting the object world but rather the image itself; autonomous worlds of sound and image that can be manipulated by both the observer and the machine. An artificial world of sound and images is emerging, one which can be generated by machines alone” (Weibel, 1992, p.17).

[16] We are referring to creative domain that “relies on computer software as its medium, and is primarily concerned with (or is articulated through) relationships between sound and image” (Levin, 2010, p.270), but also addressing the creation of “process oriented and participatory forms that involve the manipulation of acoustic and visual information by the audience” (Kwastek, 2010, p.163).

[17] However, in the 1960s the processing capability of computers did not yet allow for the generation of complex imagery in real-time. Whitney therefore had to use the computer to create individual frames that were then animated on film.

[18] Similarly, Stan VanDerBeek also collaborated with Knowlton at the Bell Laboratories on a series of digital structuralist computer animations: Poem Field (1964–67). Other artists used computers to produce abstract imagery (or computed kinetic abstraction) in close relation with music concepts or sound, such as Calculated Movements (1985) by Larry Cuba. These experiments were soon transferred to a performance context, where computer generated visuals were accompanied by musical improvisations. Herbert W. Franke, for example, described in 1976 the potential of combining computer-based synthesis and live-performance as “graphic music” films, in which musical improvisation corresponded to the simultaneous projection of patterns of abstract animated lines.

[19] The Video and Music Program for Interactive Real-time Exploration/Experimentation, included animation routines by Ken Knowlton and was built on the basis of the GROOVE computer music system, created by Max Mathews.

[20] According to Kwastek these systems can also be considered apparatuses (comparable but different from instruments) whose operative possibilities and functionality as “production devices” are potentially “unique and novel” to the user, therefore inciting exploration (Kwastek, 2009).

[21] Processuality refers to the processes or operations performed by the work, as an observable time-based evolution or sequence of events that is the result of ongoing computations (Broeckmann, 2005).

[22] This idea highlights the subordination of audiovisuality to procedurality, as the computer’s “defining ability” to execute rules that model the way things behave (Murray, 1997, p.71). We then move towards an aesthetic level that is tied to their “procedural rhetoric” or “the practice of using processes expressively” (Bogost, 2008, pp.122-124).

About the author: 

Dr Luísa Ribas holds a PhD in Art & Design (2012), a Master in Multimedia Art (2002) and a Degree in Communication Design (1996) from FBAUP (Faculty of Fine Arts, University of Porto). She is a member of ID+ (Research Institute for Design, Media and Culture), researching sound-image relations in digital interactive systems. She contributes to events and publications with articles on digital art and design. As a professor at FBAUL (Faculty of Fine-Arts, University of Lisbon) she teaches Communication Design, focusing on print and digital computational media, namely in the domains of editorial design and audiovisuality.