Distributed Speech Recognition for wired/wireless data networks (Web/GPRS) using 2 kbps coding

\centerline{ \includegraphics[scale = 0.6]{tuclogo_small.eps}}\end{figure}

Technical University Crete
Electronics and Computer Engineering Department
Network & Telecommunication Division

Perakakis Manolis
(supervised by Digalakis Vassilios)
, http://www.telecom.tuc.gr/~perak


Today speech recognition technology is the dominant one for applications on voice networks. An example of such an application is information retrieval : A customer can call a specific phone number & ask information for a specific domain of interest(weather, theaters, air travel tickets etc). A recognition server recognizes the spoken utterance, retrieves the requested information & speaks the result back to the customer. The whole procedure takes place by using a phone device (standard or mobile) with a voice network (PSTN or GSM).

This schema works fine but ``ignores'' part of the current communication environment which deals with non voice (data) networks & not just phone devices (devices with data processing & communication capabilities). Examples of such data networks (wired & wireless) are the Web & the GPRS which will data enable GSM mobile networks. Such devices range from high-end PDAs to smartphones & GPRS enabled mobile phones. These devices are mobile devices for which users require access to information. Usually they 've poor user interface, so it seems that speech recognition can be an ideal technology for information retrieval on these devices.

So, the motivation of this work is the adaptation of speech recognition for information retrieval, from voice to data networks & devices respectively. To achieve this two key ideas are used. The first is based on the fact that speech recognition can not directly be used on these devices (speech recognition is a very complex technology with high requirements for CPU & memory not met on those devices). So the first idea is to decouple the front-end mechanism of a speech recognition system from the rest recognition mechanism, using a client-server model. The front-end process is lightweigh & can be used on most such devices (clients) .The acoustic information produced on the clients side is transmitted using a data network (Web, GPRS) to a back-end recognizer (server) where the rest recognition process takes place. This model seems nice but to be successful the data rate of the acoustic information need to be lowered. This is the second idea which constitutes the research part of this work. A speech coder especially designed for speech recognition purposes was implemented with an impressively low data rate of 2 Kbps, using Vector Quantization techniques which in addition enables the speed-up of recognition by using discrete acoustic models.

So now we 've the needed technology for the low cost deployment of speech services for data networks & devices. A Speech Recognition Demo for the internet domain was presented using a java API that implements the front-end & speech coding mechanism on the client side & SRI's Decipher speech recognition system on the server side. A similar demo for wireless data networks (GPRS) & a PDA/smartphone devices may also be available in the future.

Perakakis Manolis
Last Modified : 2001-06-22