Towards Immersive Environments: Ambisonics-driven Room Acoustic Modeling and Spatial Audio Capture
Abstract
Spatial audio technology has gained significant importance across various fields, including extended reality, gaming, telecommunications, and immersive media, where delivering realistic auditory experiences is crucial. These applications require extensive and resource-intensive microphone measurements to accurately capture and reproduce authentic acoustic environments. This thesis addresses the need for more efficient and scalable methods by developing solutions that balance precision with practicality, thereby making high-quality spatial audio more accessible for a wide range of applications. Aligning with the evolving spatial audio technologies, we focused the research on three core challenges: (i) achieving comprehensive room acoustic characterization with minimal measurements, (ii) enabling versatile auditory experiences beyond measurement locations via Room Impulse Response (RIR) extrapolation, and (iii) facilitating spatial audio capture using Head-Worn Devices (HwDs). All our methods are fundamentally grounded in the solution of the acoustic wave equation, employing spherical harmonic basis functions.
A key contribution of this thesis is the development of a room acoustic analyzer based on higher-order eigenbeams/ambisonics. This tool provides detailed insights into the directional characteristics of early reflections and late reverberation, with an emphasis on their frequency-dependent behavior. We can adapt this tool to various microphone arrays and measurement setups capable of ambisonic capture. For high-precision 3D analysis, the use of a Spherical Microphone Array (SMA) is demonstrated through comprehensive studies conducted in a small lab room and an empty classroom. The analyzer's scalability is further showcased through a comparative study involving a first-order ambisonic array and a higher-order SMA in a recording studio with variable acoustic settings. Additionally, a novel metric, termed the ``directivity time-span'', is introduced as a more accurate alternative to traditional estimates of mixing or transition times, enhancing the precision of room acoustic analysis.
Utilizing reflection characteristics learned by the proposed analyzer, we present a method for extrapolating RIRs from a single SMA measurement. Our reconstruction approach ensures the coherence of early reflections and late reverberations, preserving perceptually critical objective parameters, even as the extrapolation extends farther from the original measurement location.
To further advance spatial audio capture, we propose a method for estimating perceptually accurate higher-order ambisonics using HwDs. This method overcomes the challenges posed by the arbitrary geometry and complex scattering effects of HwDs by leveraging the inherent diversity of wearable-device-related transfer functions, optimized using a magnitude least-squares approach. Validation through simulation results and listening tests confirms that the binaural audio produced by this method is on par with that generated by traditional SMAs, establishing HwDs as a viable solution for seamless spatial audio capture and reproduction in extended reality applications.
The contributions in this thesis collectively pave the way for more sophisticated and adaptive acoustic modeling, supporting the development of highly immersive, personalized, and real-time spatial audio solutions.
Description
Keywords
Citation
Collections
Source
Type
Book Title
Entity type
Access Statement
License Rights
Restricted until
Downloads
File
Description
Thesis Material