Psychoacoustic Effects and Models for Processing and Coding of 3-Dimensional Audio

Document Type
Doctoral Thesis
Issue Date
Issue Year
Dick, Sascha

Immersive audio systems enable the creation and reproduction of sound scenes that envelop listeners with dozens or even hundreds of spatially distributed sound sources. This provides a realistic and immersive listening experience, but also poses technical challenges that require efficient methods for transmission, processing and rendering of immersive sound scenes. The localization accuracy of human hearing is known to be limited and position dependent, e.g. horizontal localization is more accurate than vertical localization. The limitations of human hearing can be exploited by perception-based processing algorithms to enable efficient transmission and rendering of immersive audio at high perceptual quality. In this thesis, psychoacoustic effects in spatial hearing were investigated, and based thereon, psychoacoustic models and perception-based algorithms were developed. As a foundation for the model development, subjective listening experiments were conducted to investigate localization accuracy, masking effects and perceived localization differences for spatially distributed sound sources. The subjective experiments were complemented by objective analysis of measured localization cue differences. Based on subjective experiments and analytical results, psychoacoustic models were derived. A first major contribution is the introduction of a Perceptual Coordinate System (PCS) to model perceptual differences by warping coordinates to represent perceptual properties rather than geometric positions. This enables to operate directly in a perceptual domain, for efficient calculation of perceptual differences, manipulation of sound source positions, modeling loudness distributions, and estimating spatial masking effects. Based on the developed components, a perceptual spatial distortion metric was obtained to determine the perceptual impact of sound source location changes within the context of an overall sound scene. As a second major contribution, the novel psychoacoustic models were employed in perceptionbased processing algorithms for immersive audio. Perception-based clustering of object-based audio substantially reduces the number of individual sound sources a scene representation by combining audio objects based on the spatial distortion metrics. Furthermore, multi-channel perceptual audio coding is enhanced by a novel spatial masking model based on the PCS. The evaluation of subjective quality in listening tests and computation time measurements confirm a high perceptual quality is achieved at high computational efficiency, suitable for interactive, real-time applications.

Faculties & Collections
Zugehörige ORCIDs