9. References

Ahadi S. M. and Woodland P. C. (1995) : "Rapid
speaker adaptation using model prediction," Proc. ICASSP 1995, pp. 684-687.

Aust H., Oerder M., Seide F., and Steinbiss V. (1994) : "Experience with the Philips automatic train timetable information system," Proc. of IEEE Workshop on Interactive Voice Technology for Telecommunications Applications (IVTTA94), pp. 67-72.

Bertenstam J., Blomberg M., Carlson R., Elenius K., Granström B., Gustafson J., Hunnicutt S., Högberg J., Lindell R., Neovius L., de Serpa-Leitao A., and Ström N. (1995a) : "The Waxholm application database," Proc. EUROSPEECH '95, Madrid, pp. 833-836.

Bertenstam J., Blomberg M., Carlson R., Elenius K., Granström B,. Gustafson J., Hunnicutt S., Högberg J., Lindell R., Neovius L., de Serpa-Leitao A., Nord L., and Ström N. (1995b) : "Spoken dialogue data collection in the Waxholm project," STL-QPSR, KTH, 1/1995, pp. 50-73.

Blomberg M., Carlson R., Elenius K., Granström B., Gustafson K., Hunnicutt S., Lindell R., and Neovius L. (1993) : "An experimental dialogue system: Waxholm," Proc. EUROSPEECH '93, pp. 1867-1870.

Bourlard H. (1995) : "Towards increasing speech recognition error rates," Proc. EUROSPEECH '95, pp. 883-894.

Bourlard H. and Morgan N. (1993) : "Continuous speech recognition by connectionist statistical methods," IEEE trans. on Neural Networks, Vol. 4(6), pp. 893-909.

Bourlard H. and Wellekens C. J. (1988) : "Links between Markov Models and Multilayer Perceptrons," IEEE Trans on PAMI, 12(12), pp. 1167-1178.
Brown P. F., Lee C-H, and Spohrer J. C. (1983) : "Bayesian adaptation in speech recognition," Proc. ICASSP 1983, pp. 761-764.

Carlson, R. and Glass J. (1992a) : "Vowel classification based on analysis-by-synthesis," STL-QPSR 4/1992, pp. 17-27, Dept. of Speech Communication and Music Acoustics, KTH, Sweden.

Carlson R. and Glass J. (1992b) : "Vowel classification based on analysis-by-synthesis," Proc. ICSLP 1992, pp. 575-578.

Carlson R. and Hunnicutt S. (1996) : "Generic and domain-specific aspects of the Waxholm NLP and dialog modules" Proc. ICSLP 1996, pp. 677-680.

Chang J. and Glass J. (1997) : "Segmentation and modeling in segment-based recognition," Proc. EUROSPEECH 1997, pp. 1199-1202.

Cohen J. (1996): "The summers of our discontent," Proc. ICSLP 1996, distributed on CDROM version.

Dalsgaard P. and Baekgaard A. (1994) : "Spoken language dialogue systems," Proc. of Artificial Intelligence, Infix. Presented at the CRIM/FOR-WISS workshop on Progress and Prospects of Speech Research and Technology, Munich.

Digalakis V. V., Ostendorf M., and Rohlicek J. R. (1992) : "Fast algorithms for phone classification and recognition using segment-based models," IEEE Trans. on Signal Processing, Vol. 40, pp. 2885-2896.

Elenius K. and Takacs G. (1990) : "Acoustic-phonetic recognition of continuos speech by artificial neural networks," STL-QPSR 2-3/1990, pp. 1-44, KTH, Dept. of Speech Communication and Music Acoustics, Sweden.

English T. M. and Boggess L. C. (1992) : "Back-propagation training of a neural network for word spotting", Proc. ICASSP '92, Vol. 2, pp. 357-360.

Fant G., Liljenkrans J., and Lin Q. (1985) : "A four-parameter model of glottal flow," STL-QPSR 4/85, pp. 1-13, KTH, Dept. of Speech, Music and Hearing, Sweden.

Gauvain J. L. and Lee C. H. (1994) : "Maximum a posteriori estimation for multivariate Gaussian observations of Markov chains," IEEE Trans. Speech and Audio Processing, Vol. 2(2), pp. 806-814.

Gish H. (1990) : A probabilistic approach to the understanding and training of neural network classifiers," Proc. ICASSP '90, pp.1361-1364.

Glass J., Chang J., and McCandless M. (1996) : "A probabilistic framework for feature-based speech recognition," Proc. ICSLP '96, pp. 2277-2280.

Glass J., Flammia G., Goodine D., Phillips M., Polifroni J., Sakai S., Seneff S., and Zue V. (1995) : "Multilingual spoken language understanding in the MIT voyager system," Speech Communication 17/1-2, pp. 1-18.

Hazen T. J. and Glass J. R. (1997) : "A comparison of novel techniques for instantaneous speaker adaptation," Proc. EUROSPEECH 1997, pp. 2047-2050.

Hetherington L. and McCandless M. (1996) : "SAPPHIRE: An extensible speech analysis and recognition tool based on Tcl/Tk," Proc ICSLP '96, pp. 1942-1945.

Hetherington L., Phillips M., Glass J., and Zue V. (1993) : "A* word network search for continuous speech recognition," Proc. ICASSP '93, pp. 1533-1536.

Hopcroft J. and Ullman J. (1979) : Introduction to automata theory, languages and computation, Addison and Wesley, ISBN 0-201-02988X.

Huang X. D. and Lee K. F. (1991) : "On speaker-independent, speaker-dependent and speaker-adaptive speech recognition," Proc. ICASSP 1991, pp. 877-880.

Kershaw D. J., Hochberg M. M., and Robinson A. J. (1996) : "Context-dependent classes in a hybrid recurrent network-HMM speech recognition system," In: Advances in Neural Information Processing Systems, Vol. 8, eds: Touretsky D. S., Mozer M. C, and Hasselmo M. E., Morgan Kaufmann.

Ladefoged P., and Broadbent D. E. (1957) : "Information conveyed by vowels," JASA 29(1), pp. 99-104.

Le Cun Y., Denker J. S., and Solla S. A. (1990) : "Optimal brain damage," In: Advances in Neural Information Processing Systems Vol. II, ed: Touretsky D. S., pp. 589-605, San Mateo, California IEEE, Morgan Kaufmann.

Leggetter C. J. and Woodland P. C. (1994) : "Speaker adaptation of continuous density HMMs using multivariate linear regression," Proc. ICSLP 1994, pp. 451-454.

Levin E. (1990): "Word recognition using hidden control neural architecture", Proc. ICASSP '90, Vol. 1, pp. 433-436.

Li K. P., Naylor J. A., and Rossen M. L. (1992) : "A whole word recurrent neural network for keyword spotting," Proc. ICASSP '92, Vol. 2, pp. 81-84.

Mitchel C. D., Harper M. P., and Jamieson L. H. (1996) : "Stochastic observation hidden Markov models," Proc. ICASSP '96, pp. 617-620.

Necioglu B. F., Ostendorf M., and Rohlicek J. R. (1992) : "A Bayesian approach to speaker adaptation for the stochastic segment model," Proc. ICASSP 1992, pp. I-437 - I-440.

Ney H. and Aubert X. (1994) : "A word graph algorithm for large vocabulary, continuous speech recognition," Proc. ICSLP '94, pp. 1355-1358.

Peckham J. (1993) : "A new generation of spoken dialog systems: results and lessons from the SUNDIAL Project," Proc. Eurospeech '93, pp. 33-40.

Richard M. D. and Lippman R. P. (1991) : "Neural network classifiers estimate Bayesian a posteriori probabilities," Neural Computation, Vol. 3, pp. 461-483.

Robinson A. J. (1994) : "An application of recurrent nets to phone probability estimation," IEEE trans. on Neural Networks Vol. 5(2), pp. 298-305.

Robinson T. and Fallside F. (1991) : "A recurrent error propagation network speech recognition system," Computer Speech & Language 5:3, pp. 259-274.

Shiel F. (1993) : "A new approach to speaker adaptation by modelling pronounciation in automatic speech recognition," Speech Communication Vol. 13, pp. 281-286.

Sietsma J. and Dow R. J. F. (1991) : "Creating artificial neural networks that generalize," Neural Networks, 4(1) pp. 67-69.

Sjölander K. and Gustafson J. (1997) : "An integrated system for teaching spoken dialogue systems technology," Proc. EUROSPEECH '97, pp. 1927 - 1930.

Soong and Huang (1991) : "A tree-trellis based fast search for finding the N best sentence hypotheses in continuous speech recognition," Proc. ICASSP '91, pp. 713-716.

Strange W. (1989): "Evolving theories of vowel perception," JASA 85(5), pp. 2081-2087.

Ström N. (1992): "Development of a recurrent time-delay neural net speech recognition system," STL-QPSR 2-3/1992, pp. 1-44, KTH, Dept. of Speech Communication and Music Acoustics, Sweden.

Ström N. (1994a) : "Optimising the lexical representation to improve A* lexical search", STL-QPSR 2-3/1994, pp. 113-124.

Ström N. (1994b): "Experiments with a new algorithm for fast speaker adaptation," Proc. ICSLP 1994, pp. 459-462.

Ström N. (1995): Generation and minimisation of word graphs in continuous speech recognition," Proc. Workshop on Automatic Speech Recognition, pp. 125-126, Snowbird, Utah.

Ström N. (1997): Nikko Ström (1997): "A tonotopic artificial neural network architechture for phoneme probability estimatio," To appear in Proc. of the 1997 IEEE Workshop on Speech Recognition and Understanding, Santa Barbara, CA.

Sutton S., de Veilliers J., Schalkwyk J., Fanty M., Novick D. and Cole R. (1996): "Technical specification of the CSLU toolkit;" Tech. Report No. CSLU-013096, CSLU, Dept. of Computer Science and Engineering, Oregon Graduate Institute of Science and Technology, Portland. OR.

Tebelskis J. and Waibel A. (1990): "Large vocabulary recognition using linked predictive neural networks," Proc. ICASSP '90, Vol. 1, pp. 437-440.

Verbrugge R. R. and Strange W. (1976): "What information enables a listener to map a talker's vowel space," JASA 60(1), pp. 198-212.

Viterbi A.J. (1967): "Error bounds for convolutional codes and an asymptotically optimum decoding algorithm," IEEE Trans. Information Theory, Vol. IT-13, pp. 260-269.

Waibel A., Hanazawa T., Hinton G., Shikano K. and Lang K. (1987) : "Phoneme recognition using time-delay neural networks," ATR Technical Report TR-006, ATR, Japan.