nikkostrom  |  NICO  |  Quite BASIC

Nikko Ström (1996): "Continuous Speech Recognition in the WAXHOLM Dialogue System," STL QPSR 4/1996

Continuous Speech Recognition in the WAXHOLM Dialogue System,

Nikko Ström

Abstract -- This paper presents the status of the continuous speech recognition engine of the WAXHOLM project. The engine is a software only system written in portable C code. The design is flexible and different modes for phonetic pattern matching are available. In particular, artificial neural networks and standard multiple Gaussian mixtures are implemented for phone probability estimation, and for research purposes, a general mode were the input consists of a phone-graph also exists. A lexicon with multiple pronuncoiations for many words and a class bigram-grammar is used. The lexicon and grammar constraints are represented by a lexical graph, optimised for efficient lexical decoding. The decoding is performed in a two-pass search. The first pass is a Viterbi beam-search and the second is an A* stack-decoding search. Pruning-strategies and memory management in the two passes are discussed in the report. Several different output formats are available. Results can be reported either on the word or phoneme level with or without the time-alignment information. Multiple hypotheses can be output either as standard N-best lists or in a more compact word-graph format. Continuous speech recognition can be performed on a standard UNIX workstation in real-time with a lexicon of about 100 words.