Speak to me, Linux

Voice control is the next step in human interaction with computers. Voice recognition, and its flip side, speech synthesis, can help you streamline your day-to-day work and organize your Linux desktop in a better way.

To begin conversing with your Linux desktop, download the Sphinx-2 speech recognition engine and the Festival text to speech application. Although the CMU Sphinx Group provides several versions of Sphinx (Sphinx-2, -3, and -4), I use only Sphinx-2, as it is the fastest. Even though it is not as accurate as Sphinx-3 or Sphinx-4, it runs in real time, and therefore works well with live applications.

The installation of Sphinx-2 and Festival should be trivial; most distributions already have binaries, and even compiling from source should not be difficult. Debian users might find Festival a little tricky to install if they own an onboard sound card with AC97 codecs. (The symptom is speech that sounds twice as fast as it should, no matter what speed you set up; unfortunately I couldn't find any solution except for changing the sound card.)

Happily, the normal desktop user will not have to learn Festival's command-line interface, as great applications such as KDE Text-to-Speech System (KTTS) and Perlbox Voice fill this gap. KDE 3.4 will talk to you via Festival, Festival Lite (flite) or FreeTTS (another free speech synthesis written in Java), in a multitude of languages and accents. If you want to use KTTS with your present KDE desktop, sources as well as binaries for Debian, SUSE, and Mandrake are available at KTTS's home page.

KTTS Interface  
The KTTS Interface

KTTS works by sending the text to be spoken via DCOP to the KTTS daemon. KTTS can read you pop-ups from Knotify, Web sites, or any other text. You can open Konqueror, navigate to a Web site, select Tools, and choose Speak Text. If you didn't highlight anything, KTTS will read the whole page.

In KDE Text-To-Speech Manager (kttsmgr) you can manage your languages, speech engine, and what your computer reads for you. Your computer can act as your private secretary and read your email messages while you manage other applications. You can use multiple languages, which can be useful if for example you are a native German listener, but you need to read an English Web page. If you need other voices than English, take a look at the MBROLA Project. MBROLA tries to obtain as many sets of speech synthesizers for as many languages as possible, and provide them free for non-commercial applications. You can use Mbrola voices with Festival.

Adding Sphinx-2 and Perlbox Voice to Festival, you can make your computer listen to what you tell it and take actions accordingly. Perlbox Voice provides a transparent interface to several open source speech systems, but it is mainly an easy-to-use front end to Sphinx-2, built in Perl and Tk; you'll need to have both languages installed before you install Perlbox. Perlbox comes with a Perl script for installing, which will copy files in the right location. When that completes, fire up the application with the perlbox-voice command.

Perlbox Interface  
Perlbox Interface

To get your Linux desktop to perform an action, write in one Perlbox box the magic words to invoke the action, and in another what the computer is expected to do when it hears them. For example, in the "When you say" box you write "Web" and in the "Computer does" box, Konqueror. Then start Perlbox's listener via the Control tab and say "Web"; Konqueror will start.

Perlbox comes with a KDE plug-in that allows you change from one desktop to another, invoke K Menu, refresh the desktop, and more, all via voice commands. You can extend the plug-in with more commands by modifying a single file. If you use another desktop manager, Perlbox comes with good documentation on how to build your own plug-ins.

In noisy environments, you can use a magic word to activate Perlbox, in order to keep the application from taking actions if it mishears aleatory words.

The ease of using these speech engines and speech recognition systems could make Linux the preferred OS for the visually impaired. Open licenses allow rapid improvement and development. And it's just a lot of fun to talk to your closest co-worker: your computer.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值