Conference paper
Cognitive componets of speech at different time scales
Cognitive component analysis (COCA) is defined as unsupervised grouping of data leading to a group structure well aligned with that resulting from human cognitive activity. We focus here on speech at different time scales looking for possible hidden ‘cognitive structure’. Statistical regularities have earlier been revealed at multiple time scales corresponding to: phoneme, gender, height and speaker identity.
We here show that the same simple unsupervised learning algorithm can detect these cues. Our basic features are 25-dimensional short time Mel-frequency weighted cepstral coefficients, assumed to model the basic representation of the human auditory system. The basic features are aggregated in time to obtain features at longer time scales.
Simple energy based filtering is used to achieve a sparse representation. Our hypothesis is now basically ecological: We hypothesize that features that are essentially independent in a reasonable ensemble can be efficiently coded using a sparse independent component representation. The representations are indeed shown to be very similar between supervised learning (invoking cognitive activity) and unsupervised learning (statistical regularities), hence lending additional support to our cognitive component hypothesis.
Language: | English |
---|---|
Year: | 2007 |
Pages: | 983-988 |
Proceedings: | 29th Annual Conference of the Cognitive Science Society |
ISBN: | 097683183X and 9780976831839 |
Types: | Conference paper |
ORCIDs: | Hansen, Lars Kai |