Research

P-HRTF: Efficient Personalized HRTF Computation for High-Fidelity Spatial Sound

Project Website

Published in Proceedings of the 2014 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pages 53-61

Accurate rendering of 3D spatial audio for interactive virtual auditory displays requires the use of personalized head related transfer functions (HRTFs). We present a new approach to compute personalized HRTFs for any individual based on combining state-of-the-art image-based 3D modeling with an efficient numerical simulation pipeline. Our 3D modeling framework enables capture of the listener's head and torso using consumer-grade digital cameras to estimate a high resolution non-parametric surface representation of the head, including extended vicinity of the listener's ear. We leverage sparse structure from motion and dense surface reconstruction techniques to generate a 3D mesh. This mesh is used as input to a numeric sound propagation solver which uses acoustic reciprocity along with Kirchhoff surface integral representation to efficiently compute the personalized HRTF of an individual. The overall computation takes tens of minutes on multi-core desktop machine. We have used our approach to compute personalized HRTFs of few individuals and present preliminary evaluation. To the best of our knowledge, this is the first commodity technique that can be used to compute personalized HRTFs in a lab or home setting.

Binaural Audio Processing: Application in Localization and Model for Synthesis

Code and Documentation

Human hearing is binaural (two ears). The field of Binaural Audio Processing focuses on sound processing and perception from this perspective. It finds applications in diverse areas, including perception, 3D sound, and virtual reality.

For my final year project at IITB I focused on application in perception and 3D sound, implementing systems that performed Localization and Binaural Audio Synthesis (Audio Spatialization).

Localization

Humans are capable of Sound Localization, that is, finding the relative location of a sound source using only auditory information. Much research in this area focuses on how to create systems that can localize sounds as well. Using one such technique described in Duda and Chau's paper, I implemented a system that looks for monaural as well as binaural features introduced to a sound's spectrum by the Head-Related Transfer Function (HRTF) in order to estimate the sound source's location. . The KEMAR HRTF data obtained at the MIT media lab was used for this purpose.

Binaural Synthesis

Another application of Binaural Processing is in the area of binaural synthesis and 3D sound. Since binaural hearing allows us to perceive the relative location of a sound, models can be created that do the same artificially. Using such models, any sound can be modified to appear as if coming from a certain direction (binaural synthesis). I implemented a configurable model for binaural synthesis described in Brown and Duda's paper. While convolution with HRTFs is one of the simplest ways to spatialize audio and is regularly used in many applications including virtual reality, actual HRTF data is bulky and impractical to use. Furthermore, HRTFs vary from person to person, depending on physical attributes such as head size. Thus, using a generalized HRTF model or data set isn't as effective as using personalized HRTFs. A configurable model can be adjusted for a person's physical attributes and this can lead to better audio spatialization.