5. Speech Recognition Software
5.1. Free Software
Much of the free software listed here is available for download at: http://sunsite.uio.no/pub/Linux/sound/apps/speech/
5.1.1. XVoice
XVoice is a dictation/continuous speech recognizer that can be used with a variety of XWindow applications. It allows user-defined macros. This is a fine program with a definite future. Once setup, it performs with adequate accuracy.
XVoice requires that you download and install IBM's (free) ViaVoice for Linux (See Commercial Section). It also requires the configuration of ViaVoice to work correctly. Additionally, Lesstif/Motif (libXm) is required. It is also important to note that because this program interacts with X windows, you must leave X resources open on your machine, so caution should be used if you use this on a networked or multi-user machine.
This software is primarily for users. An RPM is available.
HomePage: http://www.compapp.dcu.ie/~tdoris/Xvoice/ http://www.zachary.com/creemer/xvoice.html
Project: http://xvoice.sourceforge.net
Community: http://www.onelist.com/community/xvoice
5.1.2. CVoiceControl/kVoiceControl
CVoiceControl (which stands for Console Voice Control) started its life as KVoiceControl (KDE Voice Control). It is a basic speech recognition system that allows a user to execute Linux commands by using spoken commands. CVoiceControl replaces KVoiceControl.
The software includes a microphone level configuration utility, a vocabulary "model editor" for adding new commands and utterances, and the speech recognition system.
CVoiceControl is an excellent starting point for experienced users looking to get started in ASR. It is not the most user friendly, but once it has been trained correctly, it can be very helpful. Be sure to read the documentation while setting up.
This software is primarily for users.
Homepage: http://www.kiecza.de/daniel/linux/index.html
Documents: http://www.kiecza.de/daniel/linux/cvoicecontrol/index.html
5.1.3. Open Mind Speech
Started in late 1999, Open Mind Speech has changed names several times (was VoiceControl, then SpeechInput, and then FreeSpeech), and is now part of the "Open Mind Initiative". This is an open source project. Currently it isn't completely operational and is primarily for developers.
This software is primarily for developers.
Homepage: http://freespeech.sourceforge.net
5.1.4. GVoice
GVoice is a speech ASR library that uses IBM's ViaVoice (free) SDK to control Gtk/GNOME applications. It includes libraries for initialization, recognition engine, vocabulary manipulation, and panel control. Development on this has been idle for over a year.
This software is primarily for developers.
Homepage: http://www.cse.ogi.edu/~omega/gnome/gvoice/
5.1.5. ISIP
The Institute for Signal and Information Processing at Mississippi State University has made its speech recognition engine available. The toolkit includes a front-end, a decoder, and a training module. It's a functional toolkit.
This software is primarily for developers.
The toolkit (and more information about ISIP) is available at: http://www.isip.msstate.edu/project/speech/
5.1.6. CMU Sphinx
Sphinx originally started at CMU and has recently been released as open source. This is a fairly large program that includes a lot of tools and information. It is still "in development", but includes trainers, recognizers, acoustic models, language models, and some limited documentation.
This software is primarily for developers.
Homepage: http://www.speech.cs.cmu.edu/sphinx/Sphinx.html
Source: http://download.sourceforge.net/cmusphinx/sphinx2-0.1a.tar.gz
5.1.7. Ears
Although Ears isn't fully developed, it is a good starting point for programmers wishing to start in ASR.
This software is primarily for developers.
FTP site: ftp://svr-ftp.eng.cam.ac.uk/comp.speech/recognition/
5.1.8. NICO ANN Toolkit
The NICO Artificial Neural Network toolkit is a flexible back propagation neural network toolkit optimized for speech recognition applications.
This software is primarily for developers.
Its homepage: http://www.speech.kth.se/NICO/index.html
5.1.9. Myers' Hidden Markov Model Software
This software by Richard Myers is HMM algorithms written in C++ code. It provides an example and learning tool for HMM models described in the L. Rabiner book "Fundamentals of Speech Recognition".
This software is primarily for developers.
Information is available at: http://www.itl.atr.co.jp/comp.speech/Section6/Recognition/myers.hmm.html
5.1.10. Jialong He's Speech Recognition Research Tool
Although not originally written for Linux, this research tool can be compiled on Linux. It contains three different types of recognizers: DTW, Dynamic Hidden Markov Model, and a Continuous Density Hidden Markov Model. This is for research and development uses, as it is not a fully functional ASR system. The toolkit contains some very useful tools.
This software is primarily for developers.
More information is available at: http://www.itl.atr.co.jp/comp.speech/Section6/Recognition/jialong.html
5.1.11. More Free Software?
If you know of free software that isn't included in the above list, please send me a note at: scook@gear21.com. If you're in the mood, you can also send me where to get a copy of the software, and any impressions you may have about it. Thanks!
5.2. Commercial Software
5.2.1. IBM ViaVoice
IBM has made true on their promise to support Linux with their series of ViaVoice products for Linux, though the future of their SDKs aren't set in stone (their licensing agreement for developers isn't officially released as of this date - more to come).
Their commercial (not-free) product, IBM ViaVoice Dictation for Linux (available at http://www-4.ibm.com/software/speech/linux/dictation.html) performs very well, but has some sizeable system requirements compared to the more basic ASR systems (64M RAM and 233MHz Pentium). For the $59.95US price tag you also get an Andrea NC-8 microphone. It also allows multiple users (but I haven't tried it with multiple users, so if anyone has any experience please give me a shout). The package includes: documentation (PDF), Trainer, dictation system, and installation scripts. Support for additional Linux Distributions based on 2.2 kernels is also available in the latest release.
The ASR SDK is available for free, and includes IBM's SMAPI, grammar API, documentation, and a variety of sample programs. The ViaVoice Run Time Kit provides an ASR engine and data files for dictation functions, and user utilities. The ViaVoice Command & Control Run Time Kit includes the ASR engine and data files for command and control functions, and user utilities. The SDK and Kits require 128M RAM and a Linux 2.2 or better kernel)
The SDKs and Kits are available for free at: http://www-4.ibm.com/software/speech/dev/sdk_linux.html
5.2.2. Vocalis Speechware
More information on Vocalis and Vocalis Speechware is available at: http://www.vocalisspeechware.com and http://www.vocalis.com.
5.2.3. Babel Technologies
Babel Technologies has a Linux SDK available called Babear. It is a speaker-independent system based on Hybrid Markov Models and Artificial Neural Networks technology. They also have a variety of products for Text-to-speech, speaker verification, and phoneme analysis. More information is available at: http://www.babeltech.com.
5.2.4. SpeechWorks
I didn't see anything on their website that specifically mentioned Linux, but their "OpenSpeech Recognizer" uses VoiceXML, which is an open standard. More information is available at: http://www.speechworks.com.
5.2.5. Nuance
Nuance offers a speech recognition/natural language product (currently Nuance 8.0) for a variety of *nix platforms. It can handle very large vocabularies and uses a unqiue distributed architecture for scalability and fault tolerance. More information is available at: http://www.nuance.com.
5.2.6. Abbot/AbbotDemo
Abbot is a very large vocabulary, speaker independent ASR system. It was originally developed by the Connectionist Speech Group at Cambridge University. It was transferred (commercialized) to SoftSound. More information is available at: http://www.softsound.com.
AbbotDemo is a demonstration package of Abbot. This demo system has a vocabulary of about 5000 words and uses the connectionist/HMM continuous speech algorithm. This is a demonstration program with no source code.
5.2.7. Entropic
The fine people over at Entropic have been bought out by Micro$oft... Their products and support services have all but disappeared. Their support for HTK and ESPS/waves+ is gone, and their future is in the hands of M$. Their old website as http://www.entropic.com has more information.
K.K. Chin advised me that the original developers of the HTK (the Speech Vision and Robotic Group at Cambridge) are still providing support for it. There is also a "free" version available at: http://htk.eng.cam.ac.uk. Also note that Microsoft still owns the copyright to the current HTK code...
5.2.8. More Commercial Products
There are rumors of more commercial ASR products becoming available in the near future (including L&H). I talked with a couple of L&H representatives at Comdex 2000 (Vegas) and none of them could give me any information on a Linux release, or even if they planned on releasing any products for Linux. If you have any further information, please send any details to me at scook@gear21.com.