Boolean:	`(bicycle AND helmet) OR (head AND protection)` (always group AND in parenthesis)
Title:	`title:(climate change)`
Author:	`author:("bohr niels" OR "bohr n")` (avoid only full first name)
Phrase:	`"water pump control"` (does not work with wildcards)
Wildcards:	`wom?n pharm*`

PhD Thesis

Structure Learning in Audio

From

Cognitive Systems, Department of Informatics and Mathematical Modeling, Technical University of Denmark¹

Department of Informatics and Mathematical Modeling, Technical University of Denmark²

Abstract

By having information about the setting a user is in, a computer is able to make decisions proactively to facilitate tasks for the user. Two approaches are taken in this thesis to achieve more information about an audio environment. One approach is that of classifying audio, and a new approach using pitch dynamics is suggested.

The other approach is finding structures between the mixings of multiple sources based on an assumption of statistical independence of the sources. Three different audio classification tasks have been investigated. Audio classification into three classes, music, noise and speech, using novel features based on pitch dynamics.

Within instrument classification two different harmonic models have been compared. Finally voiced/unvoiced segmentation of popular music is done based on MFCC’s and AR coefficients. The structures in the mixings of multiple sources have been investigated. A fast and computationally simple approach that compares recordings and classifies if they are from the same audio environment have been developed, and shows very high accuracy and the ability to synchronize recordings in the case of recording devices which are not connected.

A more general model is proposed based on Independent Component Analysis. It is based on sequential pruning of the parameters in the mixing matrix and a version based on a fixed source distribution as well as a parameterized distribution is found. The parameterized version has the advantage of modeling both sub- and super-Gaussian source distributions allowing a much wider use of the method.

All methods uses a variety of classification models and model selection algorithms which is a common theme of the thesis.

Language:	English
Publisher:	Technical University of Denmark, DTU Informatics, Building 321
Year:	2009
Series:	Imm-phd-2008-208
Types:	PhD Thesis

Structure Learning in Audio

DTU Library

Address

Shortcuts

Log in?

Structure Learning in Audio

DTU Library

Address

Shortcuts