A Conceptual Framework for Immersive Acoustic Auralisation: Investigating the Key Attributes

In architectural acoustics, the assessment of sound propagation in interior and/or environmental conditions has now become progressively more prominent throughout the past few decades, as a response to the development of advanced prediction tools. Within the adaptation of virtual reality (VR) systems, it is necessary to considerably expedite the prediction and simulation software as well as to enable flexible and responsive data analysis during simulation and 3D audio sensory projection. To generate ideal immersion in a simulated virtual environment, the generated stimulus across all senses should therefore be coherent. Accordingly, in the domain of acoustic in virtual reality, the system sound simulation must be constructed efficiently in order to convey the auditory stimuli to the user in an appropriate manner. This necessitates the implementation of virtual reality system as an advanced prediction tool that can accurately anticipate and replicate realistic audio experiences. Therefore, this study explores the realm of acoustic virtual reality (AVR) through a critical review with the purpose of elucidating design attributes and determining factors in generating immersive acoustic VR experiences. In light of these findings, the aim of this paper is to develop a comprehensive conceptual framework that will serve as a beneficial guide and road map for future researchers and developers in the field.


Introduction
The utilisation of Virtual Reality (VR) in acoustical related research has garnered significant interest over the past few years due to its capability to generate an authentic acoustic perception within the virtual environment.The majority of recent immersive technology interventions in built environments, particularly within the architecture domain, have extensively focused on achieving visual fidelity.This emphasis has led to the creation of virtual simulations that closely emulate the real world, utilising advanced graphics and visual effects to attain a high level of realism [1], [2].When addressing other modalities such as acoustic or sound, they are frequently given less importance and are often treated as secondary or supplemental aspects, without a firm foundation in real-world properties [3].Notwithstanding the potential for enhancing other sensory perceptions, such as visual, haptic, and tactile perceptions, acoustic perception is not widely exploited [4].These modalities do not receive the equivalent amount of consideration and integration as visual depiction, resulting in a discrepancy between the visual and non-visual components of the virtual world.
Recent developments in computerisation have enabled the generation of comprehensive yet lifelike three-dimensional virtual depictions encompassing both indoor and outdoor spatial domains.The developed models are commonly utilised applications including such visual representation as well as physics-based simulations.In this study, acoustic Virtual Reality is a relatively emerging and developing domain.It incorporates physics as well as acoustic engineering techniques with psychoacoustics concepts including the use of electroacoustic sound reproduction [5].Simulation, auralisation, and spatial sound are the most prominent keywords related to Acoustic Virtual Reality.Moreover, auralisation was first coined in the year 1993 [6], which is to simulate acoustics by rendering sound propagation in a simulated environment.The term "auralisation" is comparable to the concept of "visualisation."In contrast, visualisation is the production of graphical or visual representations of data or information, whereas auralisation is the production of acoustic effects or sound signals into auditory outputs.Nevertheless, it is commonly acknowledged that synchronizing auditory inputs with visual information substantially improves the overall sensory experience of presence and immersion, which are fundamental components of virtual reality [7].
The degree of immersion in a virtual reality (VR) setting is dependent on the reliability and capacity of the hardware system components as well as the complexity of the auditory rendering and processing algorithms.Integration and utilisation of both software and hardware are required for a truly immersive aural experience [8], [9].Virtual reality (VR) gear, such as headsets and controllers, is the basis for visual experience and user interfaces, enabling accurate tracking and spatial awareness [3].Whereases, software such as gaming platforms and programming software, supplies features for developing the simulated environment, incorporating spatial audio, and facilitating user interaction [10], [11].Moreover, attaining a fully immersive experience for users in the acoustic virtual world is highly dependent on accurate sound localization, in which a user's ability to accurately recognise the direction of the sound and their corresponding responses is crucial as it determines how specific acoustic design features influence the general auditory experience.That is where the audio processing was implemented using techniques such as binaural rendering, head-related transfer functions (HRTFs), and accurate and realistic sound placement to replicate high-fidelity sound sources [12].Accordingly, audio systems consisting of both software and hardware components are required for producing authentic sound experiences.They comprise headphones or loudspeakers for high-quality audio production and spatial audio algorithms enabling accurate sound localisation and acoustic effect simulation.
Ultimately, to successfully establish an immersive acoustic auralisation that accurately portrays the measured acoustic conditions, both virtual design and acoustic properties must be carefully considered.As the field of VR intervention in acoustic simulation garners increasing attention, understanding the core attributes that enhance immersive acoustic auralisation is crucial.This paper investigates the domain of Acoustic Virtual Reality (AVR) to discover design attributes and determining elements that have the ability to significantly enhance the user's sense of immersion and presence.Consequently, the primary objective of this paper is to construct a conceptual framework that can serve as a valuable reference for virtual acoustic design and practice.The findings of this study could therefore contribute to the advancement and refinement of immersive auditory experiences within virtual environments.

Review process and literature search method
An extensive literature search strategy was undertaken to identify relevant publications on the key attributes of immersive auralisation utilizing VR as an advanced acoustic simulation tool.The search was first initiated by selecting relevant keywords and utilizing prominent digital libraries, namely Scopus, IEEE Xplore, and Web of Science.These databases were chosen based on their capability to sufficiently address the search requirements [13].The search process involved combining relevant keywords using Boolean operators "AND" and "OR."The selected keywords included terms such as "virtual reality" OR "VR" AND "acoustic*" OR "noise" AND "virtual acoustic*" OR "auditory acoustic" OR "auralisation" AND "building acoustic*" OR "room acoustic*".No exclusion criteria were applied to the initial search.The yielded results encompassed review article, academic paper, conference proceeding, conference review, article, book chapter published in English, amounting to a total of 972 IDENTIFICATION OF STUDIES VIA DATABASES articles.Figure 1 illustrates the overall procedure of developing the databases on VR intervention in designing immersive auralisation.
Figure 1 illustrates that 94 papers were eliminated due to duplication and unrelated titles.Additionally, any papers published before 2016 were excluded, resulting in the inclusion of only those published in 2016 and later.The decision to focus on this timeframe is supported by a Google Trends search, which indicated a rising interest in VR since the release of the HTC Vive headset in 2016 [13].Following the abstract-based screening process outlined in Figure 1, a total of 878 papers were initially included.However, during this screening, 783 papers were subsequently removed based on specific criteria.These criteria encompassed factors such as unrelated keywords, the lack of relevance to the area of study concerning VR implementation in acoustic research, and the absence of a specified research method or process for assessing acoustic VR.
As a result of this rigorous screening, the number of papers was reduced to 115, indicating that these papers satisfied the specified criteria and were considered appropriate for further full-text manual analysis of their content and outcome in the context of VR implementation in acoustic research.During the eligibility stage, the papers underwent a manual review, and exclusion decisions were made based on the criteria outlined in Figure 1.Specifically, papers were excluded if they did not adequately describe VR computing devices and components, lacked a well-structured and comprehensive assessment methodology, or failed to evaluate the effectiveness of VR implementation in the context of acoustic studies.Through this rigorous review process, papers that did not meet these criteria were excluded from further consideration resulting in 37 papers.The remaining papers of 78, which satisfied the eligibility criteria, were retained for subsequent analysis.The 78 selected papers were retrieved and comprehensively reviewed to extract their key findings, which were then incorporated into the current study.Upon close analysis of these articles, it became evident that the choice of research methodologies employed varied based on the specific acoustic domains being measured.Furthermore, there were differences in the software and hardware utilized, particularly in terms of VR and acoustic systems.However, despite the variations in methodologies, software, and hardware, all of the selected papers successfully conducted their studies on acoustics by incorporating VR as an advanced acoustic simulation tool.This highlights the broad range of approaches and technologies being employed to explore the intersection of VR and acoustics in research.To conclude, the research process for this paper involved analyzing the titles, abstracts, and full content of selected journal articles obtained from various fields related to the topic under investigation.These articles were collected from reputable sources such as SCOPUS, IEEE Xplore, and Web of Science to ensure a comprehensive coverage of research in the field.During the review process, this paper highlighted the significant attributes related to design features, emphasizing their fundamental importance in creating immersive acoustic VR experiences.By examining these articles, this paper aimed to gain insights into the various aspects of design that contribute to the quality and realism of acoustic simulations in VR.

Acoustic virtual reality auralisation
Acoustic virtual reality (AVR) is grounded on auralisation, which simulates and delivers realistic acoustic environments in a virtual world [3].Consequently, there have been substantial advancements in room acoustics prediction technologies and auralisation techniques, enabling such modelling of physically accurate virtual environments in real-time.This advancement has been essential to attain realistic acoustic modelling, a crucial component for spatializing sound efficiently in virtual worlds.In the perspective of immersive auralisation, academics and professionals have conducted studies integrating acoustic simulation with virtual reality (VR) interventions to create immersive auditory experiences.These studies involve simulating realistic soundscapes and spatial audio within virtual environments for the purpose of enhancing users' sense of presence and immersion [3,11,[14][15][16].Utilizing advanced algorithms and embedded systems software, academics have successfully modelled sound transmission, reflections, and occlusion, thereby simulating real-world acoustic events in virtual worlds [5].The collaboration between different disciplines in science and technology has opened up possibilities for diversifying into a wide range of auralisation applications.
Accordingly, when developing immersive auralisation for studying acoustics in Virtual Reality (VR), it is important to take into account a number of essential attributes and properties.Given that acoustic interventions can be expensive, it is important to determine the appropriate type of acoustic intervention for different settings, users, and tasks [16].To design an immersive auralisation experience for studying acoustics in Virtual Reality (VR), several key factors and attributes must be considered.These considerations encompass acoustic scene modelling, spatial audio sources, sound localisation and sound reproduction.Nevertheless, the approach and implementation in designing acoustic virtual reality (VR) systems vary depending on the specific acoustic criteria being measured.In the following section, this paper delves into the specific details of how these elements come together to create compelling and realistic acoustic simulations within the VR environment.

Accurate virtual acoustic scene modelling
Virtual acoustics scene modelling entails reproducing the acoustic qualities of multiple settings, such as concert halls, living areas [17], or outdoor environment [18].Fundamental elements that characterize a visual aspect of virtual environment include sources of illumination, 3D geometry, and the light transmission attributes of surfaces and materials [19].For an accurate representation of the acoustical virtual environment, different techniques and approaches were used, including the creation of a 3D reconstruction of the desired setting through captured panoramic or 360° view [20][21][22] or 3D modelling [16,22,23].
Designing a 3D geometric model acoustic VR environment involves creating a virtual space that accurately represents the geometry and acoustic properties of a real-world environment.One approach is the use of recorded 360° images of the physical environment to be reconstructed in a 3D model geometry as demonstrated in Kim et al. study [21].The study used convolutional neural networks (CNN) to estimate room geometry and acoustics, employing depth estimation and semantic labeling based on visual properties to categorize materials in scene images [24].Alternatively, in the study by Hong et al. [20], recorded high-quality omnidirectional videos and sound sources of a physical environment using a spherical panoramic camera.Unlike other studies, 3D reconstruction was not involved as they focused on outdoor soundscape perception, rather than analysing acoustic material properties.
Another approach is to manually create a digital representation of the environment, using 3D modelling software.For instance, Jeon and Jo [22] modelled a 3D virtual room using Sketchup software to investigate road traffic noise in urban high-rise residential buildings.However, their study focused on simulating recorded sounds in the virtual space without considering the real acoustic characteristics of the physical environment.Nevertheless, Dogget et al. [16] undertook research that specifically addressed the effects of acoustic materials intervention on cognitive performance and well-being.3D virtual classroom environment was constructed using Unity game engine, while ODEON software [25] were used for rendering the acoustic properties assigned in the 3D space.
Using images from omni-directional cameras or 3D modelling software displayed on head-mounted devices offers an immersive and lifelike experience.However, image quality, field of view, and motion sickness concerns require careful consideration for a well-balanced outcome.To establish a sense of absolute realism, the accompanying audio must possess a minimum degree of spatial quality to enhance this immersive visual experience [26].The upcoming section delves into the sound sources and techniques employed prior to audio signal processing and reproduction.

Realistic spatial audio sources
Sound simulation deals with synthesis, propagation and rendering of audio effects.Therefore, in immersive acoustic VR, it is imperative to use realistic sound samples to generate a realistic virtual environment.Employing spatial audio techniques is one method of sound simulation in acoustic VR research.Spatial audio entails the recording and reproduction of sound in a manner that generates the listener a sense of localisation and spatialization [27,28], which aims to replicate a real-world acoustic environment through sound recording, or to synthesize realistic new ones.Spatialized sounds heighten the sense of presence in auditory virtual environments, consequently, there is definitely a significant relation between physiological and psychological responses towards spatialized sounds [29].Throughout earlier studies, sound sources were predominantly taken from actual recordings in order to reproduce and evaluate acoustic settings [30].Nonetheless, when recording sound sources in a real environment is not feasible, a reconstruction technique relying on physically measured data becomes imperative [31].The incorporation of realistic sound samples within the virtual world contributes significantly to the perception of presence, the fundamental aspect of virtual reality that encapsulates the sense of "being there" within the simulated environment [32,33].
Conveying an immersive auditory experience necessitates the integration of action sounds and environmental cues [34,35].This process involves incorporating a soundscape that effectively enhances the realism of virtual environment.Serafin et al. [34] characterized sound sample into two forms which is action sound and environmental sound.These samples offer a broad spectrum of user action sounds, from footfall and doors opening and closing including environmental sounds such as birds chirping and the noise of a car's engine [18,34].By strategically curating and blending these sounds, a dynamic auditory experience is created, thus heightening the listener's senses.The following Table 1 depicts the sound sources in several acoustic virtual reality research domains.A variety of devices have been demonstrated to be effective for spatial sound capture [40].Jeon et al. [37] employed binaural recording techniques with microphone arrays and portable sound level metres to capture environmental noise experienced by inhabitants in the actual indoor environment to be simulated in a virtual world.A separate soundscape study by Kern et al. [18], the sound sources were not directly recorded from their real-life settings.Instead, these sound sources were a pre-recorded sound sample SoundCloud library which encompasses ambient nature and walking sound.In another study where Zhou et al. [36] investigated the impact of the acoustic environment on patients by recording the environmental and action sound within a hospital ward, encompassing mechanical, artificial, and natural sounds to deliver an immersive auditory experience that represents a typical source of sounds in a hospital setting [41].Recorded sound offers authenticity, capturing the genuine acoustic qualities of specific environments, while sample sound offers convenience and adaptability, enabling easy integration into diverse projects.Nevertheless, recorded sound source capabilities are limited to delivering a particular sound effect, without the ability to distinguish the sound's direction, listener placement, or spatial dimensions.Consequently, the manipulation of the models of sound sources is vital.

Dynamic sound localisation
Precise localisation perception is an essential element for crafting a truly immersive auditory experience through spatial audio [42].It entails the precise prediction of the spatial position of sound sources in a three-dimensional environment, allowing listeners to perceive sound as they do in the actual world.Nevertheless, spatial audio rendering technologies nowadays are usually able to deliver perceptually realistic simulations with audio stimuli reconstructed from actual recordings [43].However, recorded sound source capability is restricted to producing a specific sound effect, without the capacity to differentiate the sound's origin, the listener's position, or the spatial dimensions.Thus, the manipulation of the sound sources modelling is important.
Apart from accurately modelling the sound source, achieving precise signal reproduction is vital for conveying all spatial information present in the signal to the listener's ears in a realistic manner [44].In terms of spatial sound presentation, the binaural system enables relatively close resemblance to natural human hearing.It relies mostly on producing localisation cues, which are generated by sound interactions with the listener's body, head, and ears as sound waves travel toward the ear canal and reach the eardrums [45].These cues are namely interaural time difference (ITD), interaural level difference (ILD), and frequency-dependent filtering (FDF) [46].ITD uses ear-to-ear time differences, while ILD relies on intensity variations, and FDF involves frequency changes due to ear and head shape, collectively helping us locate sounds in our environment [47][48][49].
Nevertheless, while binaural recording provides a captivating spatial audio experience, it frequently fails to accommodate specific listener preferences and facilitate precise head tracking [26].Therefore, in the research conducted by Jeon et al. [37], concerning environmental noise assessment, the incorporation of Head-Related Transfer Functions (HRTF) was found to amplify the identification of road traffic noise direction and spatialisation.Moreover, in a study conducted by Hong et al. [20], the researchers employed an ambisonics recording technique to capture soundscapes.The recorded sounds were subsequently converted from their initial first-order ambisonics (FOA) format into various other formats.Consequently, the study incorporated the use of ambisonics for binaural headphone reproduction, in which the Head-Related Transfer Function (HRTF) was utilized.In contrast, Kern et al.'s study [18] utilised basic Unity techniques to position sounds in a virtual environment's auditory space where audio rendering was disregarded and relied solely on pre-recorded sounds.This could undermine realism, accuracy, spatial audio perception, and overall immersion.In essence, appropriate audio signal reproduction is a key element in achieving accurate audio localization, vital for creating convincing and immersive virtual experiences.

3D sound reproduction
Sound is essential in creating a realistic virtual reality (VR) experience, as auditory stimuli contribute to the perception of presence and immersion in a real, physical environment [50].In a study by Lokki et al. [19], the researcher defines 3D sound in auditory technology as the capability to properly recognise the direction of sound signals, providing a realistic perception of spatial position that improves sound localisation capabilities.Therefore, this involves utilising sound reproduction devices that can be broadly categorized into two main methods: stereo headphones and surrounding loudspeakers as illustrated in Figure 2   The application of headphone-based setups for virtual acoustics and their impacts on the perceptions of sound have received significant consideration in a variety of research endeavours [53].Headphonebased reproduction encompasses the application of headphones to enhance an immersive auditory environment, enabling the precise control of sound reproduction and the binaural cues that reach each ear of the listener.For instance, in order to examine the effect of room acoustics on cognitive performance and well-being, Dogget et al. [16] used stereo headphones to deliver a virtual soundscape that was seamlessly coupled with a VR headset, thereby creating a truly immersive and realistic auditory experience for the users.While headphone reproduction entails transmitting audio directly to each participant's ears via stereo headphones, surrounding loudspeakers are the other alternative for sound reproduction by distributing sound waves throughout the room.Kirsch et al. [54] examined spatial resolution within virtual acoustic environments by employing a subset of an 86-channel spherical loudspeaker array, emphasising the significance of a sufficient number of loudspeakers to authentically convey the spatial characteristics of delayed reverberation in virtual scenarios.In contrast, Lentz et al. [44] investigated the efficacy of loudspeaker-based auditory reproduction systems in CAVE-like settings utilising only four strategically placed loudspeakers.
Ultimately, the utilisation and configurations of headphones and loudspeakers in virtual reality acoustic research enable the production of realistic audio-visual environments and provide a platform to investigate various aspects of auditory perception.The selection of headphone and loudspeaker configuration, in conjunction with techniques like as ambisonics and binaural rendering, as discussed in the preceding section, can have a substantial effect on the quality and realism of the virtual audio environment [51].By carefully designing and implementing these technologies, researchers can create realistic and ecologically valid virtual environments for studying auditory perception and evaluating hearing systems [55].This paper provides a comprehensive exploration of the fundamental design attributes crucial for creating an immersive acoustic environment in heightening the sense of presence within auditory virtual realms.Leveraging predominantly English-language sources, the study conducts an extensive review of prominent databases to rigorously analyse a range of publications, unveiling diverse attributes that are integral to the virtual environment design.The research findings emphasize a series of key attributes in the context of an acoustic virtual environment.These attributes encompass modelling a precise virtual acoustic scene, capturing or synthesizing spatial sound sources for simulation within the virtual acoustic space, employing techniques to enhance sound source localization, and ultimately achieving accurate sound reproduction through either headphones or loudspeakers.Ultimately, in the visual representation depicted as Figure 3, the illustration accentuates the significance of four core attributes that hold paramount importance in the creation of truly immersive auditory experiences.

Conclusion
During this comprehensive study, a thorough examination of 78 scholarly articles was carried out to meticulously create a comprehensive conceptual framework for the field of immersive acoustic auralisation.By combining insights from a wide variety of sources, a set of key factors essential for designing immersive acoustic virtual environments was pinpointed.These factors provide invaluable guidance for future researchers and developers in this area.Nonetheless, it's important to recognize that this paper represents the initial phase of an ongoing research journey.These identified factors establish a fundamental base for further exploration, requiring in-depth investigations that involve on-site measurements of psychological and physical aspects.In the end, the combined work of thoroughly reviewing existing literature and conducting practical measurements has the capability to reveal subtle insights, ultimately pushing forward and enhancing the understanding and real-world application of immersive acoustic experiences.

Figure 1 .
Figure 1.Screening and review process flowchart.
not describe VR computing devices and component b) Inadequate assessment and well-structured methods c) Did not evaluate the effectiveness of VR implementation in acoustic study Records excluded (n= 763) Exclusion criteria: a) Irrelevant to area of study b) Unrelated keywords c) Did not describe VR implementation in acoustic research d) Did not specify research method or process to assess acoustic VR e) Did not mentioned on outcome on Noise, Vibration and Comfort (NVC 2023) Journal of Physics: Conference Series 2721 (2024) 012015 IOP Publishing doi:10.1088/1742-6596/2721/1/0120154 below [51,52].

Figure 2 .
Figure 2. Sound reproduction method through headphones and speakers.

Table 1 .
Sound source for acoustic virtual reality simulation