Empty Room , how to compose and spatialize electroacoustic music in VR in ambisonic and binaural

Presentation of the Empty Room project, dedicated to the exploration of new musical composition and spatialization methods in virtual 3D spaces developped on multiplayer or mono player technologies such as Second Life, Open Simulator and Unity 3D, using the latest video game and virtual reality technologies.


Empty Room
, is a project initiated by the composer Christine Webster at EnsadLab Spatial Media laboratory in 2013. The project has lead to a thesis started in 2016, at CICM/Musidance Paris 8 ( 1 ). LE CUBE artistic center of digital arts in Issy-les-Moulineaux is the official artistic partner of the project. Since the years 2000, online massively multiplayer games and metaverses have given the users unprecedented experiences [1]. Trough the representation of an avatar, in a dynamic real time changing world, the resident can interact with other avatars occupying with him the same space, but he has also to interact with the 3D structure of this environment depending on the more or less good rendering. This structure is perceived by the brain like a sensitive space stimulating the senses like a physical space would do. But the sensation of immersion could not be whole without the presence of sound. Sound contains space and is space from essence, it gives to the virtual world a second layer of texture, depth and space, it reinforces credibility, sometimes going far beyond the representation itself. Video games and online virtual worlds used sound for oral communication between avatars, local sound effects, ambiances and music. But what if we consider this 3d scene no more like a "game thing" where sound only deserves the visual like it does with cinema but rather as a new blank page, a new space for musical composition and spatialization. In this kind of situation what becomes the idea of "sound object" [2], [3] associated with a virtual 3D space? How do we spatialize it ? How does it behave with the other sounds in the composition ? How will the embedded 3D tools influence the creative process ? How will this space be sensed and the listening be transformed ? The purpose of our research is to face the practice of electroacoustic music with the virtual 3D dimensional graphic space we consider as a genuine place of sound generation, fixation and diffusion. Following our introduction we describe in the second section of this paper the technological and experimental background from witch Empty Room has emerged. We will discuss the heritage of the multimedia audio scene from the year 2000 and how we create our first immersive sound works between 2008 and 2014 on Second Life and OpenSim ( 2 ).
In the third section we describe the compositional process and content, the spatialization structure and the Alpha version audiovisual apparatus. In the fourth section we present a synthesis of the Alpha version user feedback where we discuss the user responsiveness in the virtual 3D space (embodiment, interaction, sound field perception). In the fifth section we discuss how the concept of sound object evolves in a virtual 3D context. We finish with the last section discussing the possibilities of direct stem integration in the 3D audio scene coming from Bi-Pan techniques or CICM's HOA library (3).

II.
EMPTY ROOM'S SEARCH FIELD CONTEXT 1.
The multimedia audio scene What we mean with multimedia audio scene are the whole spatialization practices in use, going from radio techniques to cinema post-production, especially those performed in video games. To understand how theses practices have colonized video game and virtual reality we take the example of the sound panel as it was developed for massively multiplayer games like World of Warcraft or further adapted for Second Life's specific purposes. Since 2000, multiples spatialization techniques have coexisted in video games and virtual reality. Game development platforms are in capacity to mix different sort of sound fluxes (mono, st, 5+1) for the same 3D scene. In this abundance of possibilities the ambisonic material showed the best performances when it comes to audio 3D perception. [4]. Second Life's sound panel is a typical massively multiplayer one. Trough the panel the user can adjust in real time all the available sound layers. Fig.2. The panel displays a master volume controlling all channels at once plus one volume control per specific audio layer : one for the user interface sounds, one for general ambient sounds, one for the FX sounds, one for stereo audio and media streaming and the last is for the voice chat. With the panel is also possible to adjust the listener's perspective : from the virtual camera position or from the avatar position being in subjective view mode or in third person view mode. In our research we focused particularly on the Sound Effects channel which is with the Voice Chat channel able to render 3D audio in binaural directly from the 3D scene (in perspective/projection mode). The rest of the sound panel channels do not display spatialized sounds from the 3D scene. These sounds got their inherent spatiality after audio post-production processes (panning +reverb) or if it's a live streaming from the live recording and diffusion settings. The Sound Effects channel was originally intended to place in the 3D scene the foley, fx or diegetic sounds close in relation with the fictional universe (explosion, sea shore lapping, a radio playing from a house...).
With this embedded spatialization system the user can program multiples sound objects in the 3D scene. We choose this technique to create our first electroacoustic compositions immersed in a virtual 3D space in Second Life.

Earlier projects created in Second Life before Empty Room
The first compositions were realized in Second Life between 2008 and 2011, created with Second Life's 3D building tools and the LSL Linden Script Language developed by Linden Lab. These tools are freely available for the user to generate, manage and share virtual content in a creative way.
Our creation/spatialization process went trough 3 steps : As a first step we have to import all our sound files in mono on the Linden Lab server where they will be converted, tagged with an ID and renamed manually from the user inventory.
The next step we simply have to drag and drop one or several sounds from the inventory into a primitive Fig.3, this virtual object can be handled manually in six degrees of freedom in the 3D scene.
The final step concerns the scripting process. Each sound or serial of sounds attached to the virtual object can be controlled by one or more scripts while the object itself can be controlled physically by another one. From there it became possible to organize from simple to complex spatial arrangements, Fig.4. The sounds can be freely organized in the 3D scene in any possible ways (position and movements). They didn't follow stereophonic or multi channel conventions. The spatialization takes the visual 3D environment into consideration -playing, if necessary with the topology. The only constraint we had to cope with was the audio file's maximum length authorized at importation by Linden lab, 9 secs max. Therefore we privileged timer and looping controls to create a generative and interactive soundfield. Following these rules we create three compositions : Talking

Sound artists in Second Life
Created in 2003 Second Life is a online virtual word entirely shaped by the users -the resident is the potential content creator. Linden Lab's "your world, your imagination" sets the limits, they are only defined by the limits of the building tools. Most of the SL artists are visual 3D builders, they have largely express themselves trough the media. Regarding the sound artists things went a little different. Music takes a huge place in SL but is essentially used trough streaming by DJ's, live instrumentalists or singers, groups, etc. However some sound artists have investigate Second Life since its creation, putting forward new approaches for sound art produced in a virtual context. We can mention : -Adam Nash, Australian sound artist( 5 ). He creates in Second Life numerous audio visual sound sculptures. Nash approaches the questions of the sonic nature of the metaverse. Is it the sound of the machines or the Shaefferian écoute réduite ? [5] Nash's minimalist sound sculpture are a poetic invitation to confront virtual versus real trough a put in abyss.
-Electronic music composer NnoizPapp( 6 ) used Second Life to stage his stéréo compositions streamed on parcels on his simulations. He build also a serial of individual sound objects he played trough Midi or the visitor could trigger manually.
-The Avatar Orchestra Metaverse ( 7 ) , still in activity since 2008, is a global collaboration of composers that approaches Second Life as an instrument itself. The Orchestra conceives, designs and build its own virtual instruments, featuring sounds, visuals and animations.The last performance given by the AOM in

III. EMPTY ROOM THE COMPOSITIONAL PROCESS 1. Introduction
Empty Room is the continuation of the sound works achieved in Second Life. Our positive results allowed us to transpose the spatialization process on Unity 3D's game development platform in the end 2014. This decision was motivated for two reasons, the new audio features present in the version 5 and the compatibility with Oculus Rift and later with HTC Vive VR headsets. Due to this engine shift we had to reconsider some aspects of our upcoming project. It will not be a multiplayer experience and the walktrough area will be constraint to a 40m2 platform, in order to be experienced in a standup position by the users.

Description of the experience
When we make the experience of Empty Room we consider two aspects of reality sensed at the same time those by the body and those the brain : we call Room 1 the space of reality and Room 2 the virtual space. These two sensation of space intertwine together like mixed realities. This sensation creates a cognitive paradox, a bi-local perception more or less in sync between the real and the virtual. The virtual space stimulates vision and hearing while the real space stimulates body contact (walk, touch). The avatar and the physical body trying to unite as much as possible in the limits of the experience. With Empty Room the goal for the user is to feel wholeheartedly the virtual space and its spatial potential trough the sound and visuals, to be part of it, to realize the user/avatar fusion in the 3D space. The project also wants to free the user from wires to experience an whole autonomous body, reducing as much as possible the need of man-machine interface or testing more adapted ones for the benefit of the immersion. All the sounds, objects and visual structures present in Empty Room will guide, attract, trick or confuse the "promeneur écoutant" [6] introducing the concept of shared composition : the user experience, his path line or walk around becomes part of the creation and is equally important as the composer's musical proposition. Each immersion in Empty Room is a non linear, interactive and unique experiment, where sound contributes at 99% to shape the sensation of space and the emotions going along with it. Empty Room is a paradox revealing at the same time a no-place, the atopy sensed during the virtual immersion and in despite of this the feeling to be "somewhere", both validated by the cognitive process

Composition
We started defining a musical structure divided in three parts making a 10' composition. The composition and visual environment were linked : Part 1 -4' -Large and aerial -environment color Blue. Part 2 -4' -Closed, shifted perspective, lost of spatial referencesenvironment color Red. Part 3 -2' -generative panic room, confinement -environment color Grey-White. We also create a spatialization map, Fig. 5, a 64 virtual channels diffusion plus 10 mobiles channels. In Part 1 the sounds are mostly monophonic playing solo or in a quadri-octo question/answer mode tested with the Ircam Spat ( 9 ). In Part 2 we introduced two stereo pairs sliding trough the platform. The whole spatial device acts like an acousmonium does ( 10 ).

Sound Design
We first designed the overall ambiances during the prototyping phase of the generative visual structure on Francogrid ( 11 ) with OpenSim technology. This structure will be adapted later on Unity 3. The entire sound material was produced on a modular synth we build for the purpose of our project. After they have been recorded in Protools ( 12 ) the sounds were edited and premixed in stereo. Once the general balance was found each individual sound or stem was exported and named to fit the spatialization map. The gender of the composition goes from abstract to pointillism, with a touch of ambiant and noise. The sonic layer in Empty Room plays a counterpoint with the 3D visual.

Spatialization
Before the sound integration we set the virtual transductors mapping Fig. 6, in Unity 3D following the spatialization map. Once this step is finished we could import all the sound material and link each sound to its 3D sound channel position. Then we used Unity3 integrated audio manager to set the spatial features for each sound (3Dmode, perception curve, volume) and behaviors (play at launch, loop, timer). In order to spare precious CPU we decided to bypass Unity's 3D reverb zone features mixing the reverb in the sounds during the design phase. For Empty Room's Alpha Version we used Unity's default spatializer in Version 5 set in 3D sound mode, delivering a 2D ambisonic sound field with a binaural output.

The Alpha version
We released an  Fig. 7. The seated position is highly recommended to avoid vertigo effects.

Displacements
In the Alpha version the user displacements possibilities are the following : with the gamepad in 2 axis left/right and forward/backward programmed on the left joystick. With the Oculus Rift DK 2 headset's position tracking system, the user's head and torso can move in 6 degrees of freedom. Left/right, forward/backward, up/down + rotation on the 3 axis.

Embodiment
The user evolves in the 3D scene in subjective mode without virtual body feedback. These issues will be worked out during the Beta version. In order to stay in harmony with the audio-visual 3D environment the body feedback must be not invasive. We tested virtual hands interaction with Leap motion and came to the conclusion to reduce as much as possible their visual occlusion effect when projected in front of the user. The question of the body feedback is still quite challenging nowadays because synchronicity between real and virtual body needs optimal fluidity.

Interactions
Interaction in Alpha version concerns essentially the global interaction the user sense with the 3D space. This means how the user will move into the limit of the virtual 40m2 platform, his awareness towards the sound field, the reaction of the body in VR (vertigo because of heights impression, being there feelings, touch reactions) and the interaction with the moving objects we have placed in the 3D scene such as the three Monolithes in Part 1 Fig.8, the Hectic Cube in Part 2. and the feeling of oppression rendered by the cubes in Part 3. The results will be discussed in the next section.

Alpha version user feedback
More than 4000 persons have made the Empty Room's experience with the Alpha version. The audience was quite heterogeneous mostly mainstream, but we got also some VR professionals, sound artists, academics and researcher. The overall user return was positive. A few number of users rejected the experience, we counted only 4 persons which were unable to immerse properly or to understand the purpose. Before immersion each user was briefed about the general purpose of the experience (Empty Room is not a game, either a VR demo but an invitation to experience an electroacoustic concert from inside the composition), and the use of the gamepad, additional warnings were given about motion sickness due to the VR headset and the possibility to stop the experience in case of emergency with a panic button. We classified the users feedback into two categories, those concerning embodiment and interaction and those concerning the sound perception in the 3D scene.

1.
Embodiment and interaction During the immersion the users were able to adapt and to respond rapidly with the visual environment. They evaluate the borders of the platform, interact emotionally and physically with the blacks Monolithes passing by and with the perception of space given by the 3D architecture. Many users stretched naturally they arm out to touch the Monolithes. The sense of presence was fully experienced until some users get real vertigo when looking in the depths from the border of the platform. The lack of virtual body feedback was not an issue -most of the users liked the sensation of "floating like a free spirit" trough space. However more audacious users when getting off from the chair were immediately lost without their body feedback. Another result of deep interaction with Empty Room was the amazing creativity with which users could depict their immersive experience. Overall we can say that the users engaged themselves at 100% and are willing to sense more (direct touch interactions, virtual body feedback).

2.
Sound perception in the 3D scene The introduction of a head-tracking device (Oculus Rift) into a 3D scene introduces the interaction of the binaural rendering system and the listener who controls part of the simulation procedure not only with head moving but also moving in position through the scene. [7] The virtual world and the listener are connected. Without being an audio expert many users experiencing Empty Room could made the difference between a regular stereo stream and our binaural output. They could perfectly localize the perspective : inside the platform the direct sounds playing with more or less aerial content and forces, and from the platform border the sense of reverberated distant sounds positioned rearwards. To emphasize the perception of directivity and distance we placed sounds + a light Doppler effect in two 3D moving type of objects, the Monolithes in P1 and the Hectic Cube in P2. The user could associate without any latency these objects with their corresponding sounds, following them with the head. In order to avoid the loss of homogeneity in the sound field perception when moving from one source to another we placed a stereo Air ambiant sound file in 2D. At this stage the listener achieves authentic auditory reproduction. [8] as described by Blauert.

V. EMPTY ROOM THE SOUND OBJECT ANALYSIS
According to Shaeffer's definition all the sounds produced for Empty Room are sound objects separated from their original electronic production sources and are proposed to be listened in the virtual 3D scene as acousmatic ( 13 ). This state doesn't differ much today with a cd or mp3 listening diffused online or offline. Shaeffer distinguished two perception of spatiality, when we listen to the composition from the sound studio or the when the composition is played live, each of them carries its intrinsic diffusion qualities. At this point we can state an interesting observation, listening to Empty Room from the 3D scene differs from the studio or the live context : The immersed concert where the listener can freely move from one sound to another or between the sound layers trough a 3D scene, acts like an audio mix session witch is entirely deployed or stretched out from the 2D context of a DAW into the 3D one of the game engine, getting a specific spatiality render. And if we refer to Vaggione's [3] definition or classification of a digital sound object, Empty Room's spatialization mapping is a sound object (containing sound objects) encapsulated in a previous serial of sound objects we can enumerate as follows : Concrete sound object (audio file) -associated to a virtual object (container) -controlled and spatialized by plugin parameters -included in a 3D scene (container) -crossed by a virtual camera (container ) displaying binaural output. These objects collection respond to three computing concepts Vaggione sorted out : the abilities for these objects to be linkable (encapsulation), in heritage and polymorphous. As a result we can say that all our compositional structure in Empty Room uses both sound object concept as defined by Shaeffer and Vaggione. Empty Room is a genuine compositional space open to sound experimentations.

VI. EMPTY ROOM BETA VERSION AND NEW AUDIO STEM INTEGRATION 1. The Beta version
We started to work on Beta version since September 2016. This standup version will be optimized for the latest HTC Vive VR wireless headset display allowing tracking for human displacements on 25m2. The Beta version will include and work out numerous Alpha user's feedback issues : we will add direct touch interaction with the Monolithes, (touch will create granular sounds harmonized with overall composition) and additional zone and position interactivity, (specific sounds only perceptible in small range). We will finalize the user avatar body, wich has to be in sync with the human one. The most crucial step will be to go from a 2D Ambisonic/binaural sound field perception to the 3D one with the perception of elevation.

Bi-panned stem integration in Unity 3D
We have tested numerous 3D spatialization toolkits compatible with Unity3D V5 among others we can mention these proposed by Oculus Rift DK2, or the RealSpace 3D audio plugin. We choose the model developed by the french society 3D Sound Labs and their VRAudioKit plugin witch gives the best results when it comes to the perception of elevation in a 3D ambisonic sound field. With this toolkit we made a primary test on Unity3D with Jean-Christophe Messonnier from the CNSMD from Paris( 14 ). Messonnier works on recording and spatialization of complete audio scenes with the Binaural/Transaural system developed by Jean-Marc Lyzwa CNSMDP and Alexis Baskind( 15 ). We transposed an audio scene previously recorded and spatialized by Messonnier with the Transaural Bi-pan panner, into a Unity3D scene using 3D SoundLabs VRAudioKit. The scene consisted of 16 monophonics stems recorded from a Jazz trio, including the directs sounds, the early reflections, the later reflections and the high reverberations.
The placement of the stems in Unity3D Fig.9, creates a very realistic, transparent and homogeneous 3D sound scene trough witch the virtual camera linked to the binaural audio output can move freely in all dimensions, 6 degrees of freedom, between each recorded layer, without audio clicks or directivity loss. Most important of all, in this situation and from the listener's perspective, we entirely get rid off any sweet spot ( 16 ) constraint. In addition to this primary and successful test we decided to add a second recorded sound scene made by Messonnier in our Unity 3D scene. This time with got two pianos, a cello and one harp. We included this scene alongside to the first one. Thus we got two complete different sound scenes, each one with it's own personal reverb and reflection specificities we could position in the 3D space, walking in subjective view from one to another freely. With 16 stems per audio scene we got a total of 32 stems playing at the same time in the 3D space. These primary tests authorize us to go further and to adapt this technique to achieve Empty Room's part 3.

Empty Room part 3
Working together with Messonnier on the part 3 we had to adapt his recording and spatializing process to Empty Rooms dramaturgy. Originally Part 3 takes place in a sort of containment chamber, panic room or anechoic chamber. Because we need a little bit of reflections to work with the Transaural Bi-pan panner we choose to record a session with very close ups miking on 5 Tibetan percussions instruments, a gong and 4 bowls. For each instrument take we got a close mike in order to record the direct sound and all around we got 6 omnidirectional mikes for the primary reflections recordings. From this session Messonnier build a new sound scene with the Transaural Bi-pan panner placing the 5 elements together with their primary reflections, adding digital reverb to glue the whole scene together and to work on the height reverb perception at +45° elevation.
At the end we got 15 stems comprising 5 direct instruments sounds, 6 grounded reverbs and 4 height reverbs at +45° elevation ready to be set in Unity 3D. After verifying in Unity 3D that all the stems fits perfectly together -we go back to some sound design in the studio, working on glitch and sound distortion effects applied on the some part of the recorded stems. At the end we got two qualities of sound and space perception in evolution during part 3 : we percieve the sound scene first like something very close and intimate but also containing some air and transparency evolving into to something glitchy and noisy getting more and more thicker. In the end we loose completely the sense of space and distance.

Empty Room Spatial Beta
During August 2017 we finalized the first spatial Beta version -this means the 3 scenes playing in Empty Room are now running in full Ambisonic 3D mode.
Going from 2D to 3D demanded to re-adjust each sounds from P1 and P2, in order to get a homogenic sound field now we have the possibility to place part of the sounds in elevation. The first Spatial Beta was released for Oculus Rift DK2 and we are now working on the HTC Vive version at CICM-Paris 8 studio hosted at MSH Paris Nord. Using HTC Vive in room scale mode will force us to reconsider the walking area surface, HTC allowing a maximum of 5 meters per 5 meters scaling, witch gives us 25m2 instead of our initial 40m2 in seated mode. Figure 11. CICM's HOA 16 channels semi dome.

CICM's HOA library integration
Other stems integration will be studied in particular those coming from CICM's HOA Library. We will work on a high order 7 spatialization into a 3D configuration, with binaural output optimized for supraaural headphones. Empty Room part 1 sound spatialization mapping would be a great candidate to make a CICM HOA version we could spatialize on CICM's 16 channels semi dome Fig. 11, and then integrate in Empty Room's Unity 3D scene as we did with the Transaural Bi-pan panned scenes. At its final stage the spatial beta will contain a CICM HOA spatialisation mapping in part 1, a free format mapping in part 2 and a mixed version with Transaural Bi-pan + free format mapping in part 3.

Benefits and issues
Exploring new techniques of spatialization in Empty Room we have to consider numerous relevant aspects, some of them are an immediate benefit, while other are serious issues witch had to be worked out in the future.
As immediate benefit we have the liberation of the sweet spot constraint and the variety of audio dissemination format witch ca be used and merged together in the same virtual 3D scene. With the possibility to mold the reverberation before entering the 3D scene in Unity we also spare a huge amount of processing ressources of the cpu. But if we want to stick to a more realistic sound scene reproduction we have to integrate physical body masking effects as well as directivity cone spreading possibilities, both adjustable on each sound emitter in the scene. These aspects can be worked out in the future with 3D Sound Labs.

VII. CONCLUSIONS
Empty Room demonstrates that virtual reality is a real opportunity for experimental and contemporary music composition when introduced in a 3D scene in perspective/projection mode. Electroacoustic music in association with the 3D media delivers innovative spatialization possibilities, especially when it comes to mix several spatialization techniques together. The binaural/ambisonic team can integrate, handle and display numerous audio dissemination formats, mono, stereo, multichannel stems, multiphonic, quadriphonic, octophonic. The flexibility of the spatialization system using the opportunity of complex digital sound objects allows us to envisage spatialization structures closely set to the virtual 3D topology. This experience opens the way to a new harmonization in a compositional space being at the same time a creation/diffusion platform, suited for complex sound spatialization restitution or a custom made sound space willing to be freed from classical post-production techniques.