6. Applications

6.1 The WAXHOLM dialogue system

The WAXHOLM human-machine dialogue demonstrator is built on a generic framework for human/machine spoken dialogue under continuous development at the speech group at the department for Speech, Music and Hearing of KTH. The domain of the WAXHOLM application is boat traffic and tourist information about hotels, camping grounds, and restaurants in the Stockholm archipelago. The application database includes timetables for a fleet of some twenty boats from the Waxholm company, which connects about two hundred ports. The user input to the system is spoken language exclusively, but the responses from the system include synthetic speech as well as pictures, maps, charts and timetables (see Figure 16).
The ASR module of the system, described in detail in Paper 4, has a domain-dependent vocabulary of about 1000 words. The application has similarities to the ATIS domain within the ARPA community, the Voyager system from MIT (Glass et al., 1995) and European systems such as SUNDIAL (Peckham, 1993), Philips’s train timetable information system (Aust et al., 1994) and the Danish dialogue project (Dalsgaard and Baekgaard, 1994). Summaries of the WAXHOLM dialogue system and the WAXHOLM project database can be found in (Bertenstam et al. 1995a,b) and an early reference is Blomberg et al. (1993). 
The demonstration system is currently mature enough to be displayed and tested outside the laboratory by completely novice users. A successful such attempt was made at "Tekniska Mässan" (the technology fair) in Älvsjö in October ’96. Visitors with no prior experience with the system were invited to try the demonstrator in a rather noisy environment.
 
Figure 16. Overview of the WAXHOLM demonstrator system. See the main text for details. The ASR module of the system, described in detail in Paper 4, has a domain dependent vocabulary of about 1000 words. The application has similarities to the ATIS domain within the ARPA community, the Voyager system from MIT (Glass et al., 1995) and European systems such as SUNDIAL (Peckham, 1993), Philips's train timetable information system (Aust et al., 1994) and the Danish dialogue project (Dalsgaard & Baekgaard, 1994). Summaries of the WAXHOLM dialogue system and the WAXHOLM project database can be found in (Bertenstam et al. 1995a,b) and an early reference is Blomberg et al. (1993). 
 

 6.2 An instructional system for teaching spoken dialogue systems technology

Human/machine dialogue systems are large complex software projects, and have traditionally required high expertise to design and develop. However, recently efforts have been made to make this type of user interface available for developers that are not experts in the area. Central in this development are the modular toolkits that are being developed, e.g., OGI's CSLUsh (Sutton et al., 1996), and MIT's Sapphire (Hetherington and McCandless, 1996) toolkits. A toolkit in the same spirit is also being developed at the department for Speech, Music and Hearing (Sjölander and Gustafson, 1997). Existing components for speech recognition, speech synthesis, visual speech synthesis and NLP tools have been extracted and re-designed to fit in a common framework under the Tcl language. The Tcl language has many shortcomings, but is convenient for rapid prototyping and development work on the system integration level, and offers good graphical support through the accompanying Tk-widget set. The ASR system described in the previous sections developed for the WAXHOLM system is the underlying speech recognition module of the toolkit.
The increased availability of the technology makes student courses in the subject matter possible. A simple, but fully functioning dialogue instructional system has been developed using the toolkit for educational purposes (Sjölander and Gustafson, 1997). The system has been used for courses at the MSc level at the KTH (Royal Institute of Technology) and at Linköping University in Sweden. In this environment, students are presented with a simple spoken dialogue application for yellow pages search on a few selected topics using voice input. The application is accompanied by a development environment that allows the students to interactively follow the processing in the system and modify the different modules even while it runs. A screen-shot of the system in use is shown in Figure 17.
 
 
 

Figure 17. A screen-shot of the instructional dialogue system. Upper left: the control window. Right: the dialogue application. Bottom left: the speech recognition module.