In this report the speaker sensitive ANN, introduced in Ström (1994b),
is further investigated. The general framework, an ANN for phoneme evaluation
with a set of extra input units that characterizes the speaker is inspired
by the work of Carlson and Glass (1992a,b), and is also used in Paper 5
and discussed in Paper 4.
In this report, the speaker parameters supplied by the extra input units
are automatically extracted, i.e., they have no explicit relation to any
knowledge-based parameters. However, using a novel analysis-by-synthesis
method, the influence of the automatically extracted parameters was visualized
in the formant space. An ensemble of synthetic vowels where generated by
a formant synthesizer driven by an LF voice source (Fant, Liljencrants
and Lin, 1985). The individual vowels of the ensemble were varied only
in F1 and F2, (the first and second formant frequencies). The synthetic
vowels were then fed to the ANN and the vowel classification was recorded.
This makes it possible to draw a map with the phoneme boundaries in the
F1/F2 space for the ANN. By repeating the procedure with different speaker
parameters, the effect of the speaker parameters on the phoneme boundaries
can be studied. It was found that, in agreement with theory, the phoneme
boundaries were lower in frequency for speaker parameters corresponding
to male voices than female. In another analysis, a correlation between
the knowledge-based parameter fundamental frequency, and one of the two
automatically generated parameters was also found.
This report was written during my stay as a guest researcher at ATR, Kyoto,
Japan. As a curiosity I can mention that, as part of my attempt to learn
the Japanese language, it was written on a Macintosh with only Japanese
labels on all buttons and menu items. However, I never quite succeeded
in learning the kanji characters, so sometimes some pretty spectacular
things happened on the screen. This was how I learned to recognize the
"undo" character.
|
"A Speaker Sensitive
Artificial Neural Network Architecture for Speaker Adaptation," ATR
Technical Report, TR-IT-0116, 1995, ATR, Japan.
|