Speech SDK 5.1--No.1:SAPI OVERVIEW,mainly about speech recognizers

SAPI API 大幅减少了应用程序使用语音识别和文本转语音所需代码,使语音技术更易于应用。它提供了应用程序与语音引擎间的高级接口,并管理实时操作细节。包括 TTS 系统和语音识别器两大引擎,分别用于合成语音和转换语音为文本。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

From official document --sapi.chm 

    


   The SAPI application programming interface (API) dramatically reduces(减轻) the code overhead(代价,费用) required for an application to use speech recognition and text-to-speech, making speech technology more accessible and robust for a wide range of applications.


API Overview

    The SAPI API provides a high-level interface between an application and speech engines. SAPI implements(实现) all the low-level details needed to control and manage the real-time operations of various speech engines.

    The two basic types of SAPI engines are text-to-speech (TTS) systems and speech recognizers. TTS systems synthesize text strings and files into spoken audio using synthetic voices. Speech recognizers convert human spoken audio into readable text strings and files.




API For Speech Recoginiton


    Just as ISpVoice is the main interface for speech synthesis,ISpRecoContext is the main interface for speech recognition. Like the ISpVoice, it is an ISpEventSource, which means that it is the speech application's vehicle(中介) for receiving notifications(通知) for the requested speech recognition events.

    An application has the choice of two different types of speech recognition engines (ISpRecognizer). A shared recognizer that could possibly be shared with other speech recognition applications is recommended for most speech applications. To create an ISpRecoContext for a shared ISpRecognizer, an application need only call COM's CoCreateInstance on the component CLSID_SpSharedRecoContext. In this case, SAPI will set up the audio input stream, setting it to SAPI's default audio input stream. For large server applications that would run alone on a system, and for which performance is key, an InProc speech recognition engine is more appropriate. In order to create an ISpRecoContext for an InProc ISpRecognizer, the application must first call CoCreateInstance on the component CLSID_SpInprocRecoInstance to create its own InProc ISpRecognizer. Then the application must make a call to ISpRecognizer::SetInput (see also ISpObjectToken) in order to set up the audio input. Finally, the application can call ISpRecognizer::CreateRecoContext to obtain an ISpRecoContext.

     The next step is to set up notifications for events the application is interested in. As the ISpRecognizer is also an ISpEventSource, which in turn is an ISpNotifySource, the application can call one of the ISpNotifySource methods from its ISpRecoContext to indicate where the events for that ISpRecoContext should be reported. Then it should call ISpEventSource::SetInterest to indicate which events it needs to be notified of. The most important event is the SPEI_RECOGNITION, which indicates that the ISpRecognizer has recognized some speech for this ISpRecoContext. See SPEVENTENUM for details on the other available speech recognition events.

    Finally, a speech application must create, load, and activate an ISpRecoGrammar, which essentially indicates what type of utterances(说话方式) to recognize, i.e., dictation(口授) or a command and control grammar.First, the application creates an ISpRecoGrammar using ISpRecoContext::CreateGrammar. Then, the application loads the appropriate grammar, either by calling ISpRecoGrammar::LoadDictation for dictation or one of the ISpRecoGrammar::LoadCmdxxx methods for command and control.Finally, in order to activate these grammars so that recognition can start, the application calls ISpRecoGrammar::SetDictationState for dictation or ISpRecoGrammar::SetRuleState or ISpRecoGrammar::SetRuleIdState for command and control.

    When recognitions come back to the application by means of the requested notification mechanism(机制), the lParam member of the SPEVENT structure will be an ISpRecoResult by which the application can determine what was recognized and for which ISpRecoGrammar of the ISpRecoContext.

    An ISpRecognizer, whether shared or InProc, can have multiple ISpRecoContexts associated with it, and each one can be notified in its own way of events pertaining to it. An ISpRecoContext can have multiple ISpRecoGrammars created from it, each one for recognizing different types of utterances.






资源下载链接为: https://pan.quark.cn/s/9648a1f24758 这个HTML文件是一个专门设计的网页,适合在告白或纪念日这样的特殊时刻送给女朋友,给她带来惊喜。它通过HTML技术,将普通文字转化为富有情感和创意的表达方式,让数字媒体也能传递深情。HTML(HyperText Markup Language)是构建网页的基础语言,通过标签描述网页结构和内容,让浏览器正确展示页面。在这个特效网页中,开发者可能使用了HTML5的新特性,比如音频、视频、Canvas画布或WebGL图形,来提升视觉效果和交互体验。 原本这个文件可能是基于ASP.NET技术构建的,其扩展名是“.aspx”。ASP.NET是微软开发的一个服务器端Web应用程序框架,支持多种编程语言(如C#或VB.NET)来编写动态网页。但为了在本地直接运行,不依赖服务器,开发者将其转换为纯静态的HTML格式,只需浏览器即可打开查看。 在使用这个HTML特效页时,建议使用Internet Explorer(IE)浏览器,因为一些老的或特定的网页特效可能只在IE上表现正常,尤其是那些依赖ActiveX控件或IE特有功能的页面。不过,由于IE逐渐被淘汰,现代网页可能不再对其进行优化,因此在其他现代浏览器上运行可能会出现问题。 压缩包内的文件“yangyisen0713-7561403-biaobai(html版本)_1598430618”是经过压缩的HTML文件,可能包含图片、CSS样式表和JavaScript脚本等资源。用户需要先解压,然后在浏览器中打开HTML文件,就能看到预设的告白或纪念日特效。 这个项目展示了HTML作为动态和互动内容载体的强大能力,也提醒我们,尽管技术在进步,但有时复古的方式(如使用IE浏览器)仍能唤起怀旧之情。在准备类似的个性化礼物时,掌握基本的HTML和网页制作技巧非常
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值