Acoustic Volume Rendering for Neural Impulse Response Fields

1University of Pennsylvania   2University of Washington  

NeurIPS'24 Spotlight

TL;DR: We propose acoustic volume rendering for impulse response rendering. Wave propagation principle is incorporated into the acoustic volume rendering to ensure multi-pose consistency for acoustic signal. Our method provide a better method to model the acoustic propagation.

Abstract

As technologies like Virtual Reality continue to advance, realistic synthesis of spatial sound has become crucial for creating immersive experiences. Synthesizing the sound received at any position relies on the estimation of impulse response (IR), which describes how sound propagates in one scene. While various learning-based approaches have been proposed, their generalization to unseen poses remains unsatisfactory. In this paper, we present Acoustic Volume Rendering (AVR) for impulse response rendering. AVR enables the construction of an impulse response field that follows wave propagation principles and achieves state-of-the-art performance in synthesizing impulse responses for unseen poses. We introduce frequency-domain signal rendering and spherical integration specifically for acoustic volume rendering. Experiments show that AVR surpasses the current leading methods by a substantial margin. Additionally, we developed an acoustic simulation platform, AcoustiX, which simulates more realistic impulse responses than existing simulators.

Examples of Rendered Audio


Rendered impulse response from our method on the Real Acoustic Field dataset. Headphones are strongly recommended.





Zero-Shot Spatial Binaural Audio Rendering


Our model can render binaural audio. We can render binaural audio by simply synthesizing the sound heard at the left and right ears separately.




Overview

Interpolate start reference image.

Left: Task illustration. From observations of the sound emitted by a speaker, our model constructs an impulse response field that can synthesize observations at any listener position.
Right: Waveform visualization We transfer the signal into frequency domain, and we visualize phase and amplitude distributions at a specific wavelength. Our method predict correct signal spatial distributions.



Method

Interpolate start reference image.

Rendering pipeline. We sample points along the ray that is shot from the microphone, and query the network to obtain signals and density. Time delay is applied to account for the wave propagation. After that, we combine signals and densities to perform acoustic volume rendering for each ray to get the directional signal. We integrate along the sphere to combine signals from all possible directions with gain pattern to obtain the final rendered impulse response.




More Qualitative Result

Interpolate start reference image.

Visualization of spatial signal distributions



Interpolate start reference image.

Synthesized impulse response across different methods




BibTeX