Distributed Speech Recognition
[ Home | Personal | Speech Research | Linux | Development | Utils ]
Hopefully, the purpose of this page is to share some useful information about speech technologies and specifically distributed speech recognition.
Speech Recognition technology and applications
Speech recognition is an exciting technology, only recently introduced to a critical mass of people. More popular applications of speech recognition technology are dictation systems & information retrieval services over the phone. The later is a very promising application and integrates technologies from the whole "speech spectrum" including language understanding, speech synthesis and more. By using Voice-XML technology speech information retrieval can exploit the vast amount of information available on the web via Voice portals. This is really "cool" since by using a voice network (PSTN/mobile) and a voice device(wire-line/mobile phone) one can 've access to Web information! But wait a moment, something is missing! Why not adapt speech recognition to use data networks instead of only voice networks? Data networks are definite the future! Even voice may soon be transmitted over data networks (Voice Over IP). So why not just try this alternative? This is what i do!
Distributed Speech Recognition
The scheme my work uses is called "Distributed Speech Recognition (D.S.R.)" and is based on the idea of decoupling the front-end processing from the rest recognition mechanism using a client-server model over a data network. This way one can use the front-end processing on a lightweight device such as a PDA and 've access to speech information retrieval services using a back-end recognition server. The data network is not restricted to be wire-line. For example, GPRS will "data enable" mobile networks making these services available to GPRS-enabled devices like tomorrow's mobile phones. Hold on! There is more! To make this technology even most appealing to end user it should be cost effective too! To accomplish this the stream of acoustic information transmitted from the client to the server needs to be compacted as much as possible. To achieve this we designed a speech coding scheme specifically for recognition purposes and lowered the data rate to just 2Kbps!
One image is worth a thousand words!
The following pictures highlights the 2 Models i just described for speech information retrieval services. Model 1 uses voice network and devices while Model 2 uses devices with data processing and communication capabilities which act as the client part of the client-server scenario. The front-end mechanism located in these devices will transmit acoustic information to the server using the 2Kbps coding scheme we designed.
[ NOTE ] : The 3 bottom left images are copyright of Symbian. They are Symbian's reference devices which roughly correspond to smartphone, handheld and pda devices.
Prime time for D.S.R !
So it seems it is now prime time for D.S.R.! Recently there has been a noticeable activity on this field although previously this alternative didn't seem to 've an effect on speech researchers. Actually right now, there should be quite a few people and research labs around the world designing D.S.R. applications and services (although most of them in early stages - i guess). It will be fascinating to watch such near future attempts, which i hope will turn successful! As far as i am concerned i will try to design and implement an end-to-end such application, hopefully overcoming all challenges!!!
To find more, please go to :
Manolis Perakakis <>
Last modified: Sun Jun 24 19:01:44 GMT-5 2001