Voiceprint identification

Voiceprint identification can be defined as a combination of both aural (listening) and spectrographic (instrumental) comparison of one or more known voices with an unknown voice for the purpose of identification or elimination. Developed by Bell Laboratories in the late 1940s for military intelligence purposes, the modern-day forensic utilization of the technique did not start until the late 1960s following its adoption by the Michigan State Police. From 1967 until the present, more than 5,000 law enforcement related voice identification cases have been processed by certified voiceprint examiners.

Voice identification has been used in a variety of criminal cases, including murder, rape, extortion, drug smuggling, wagering-gambling investigations, political corruption, money-laundering, tax evasion, burglary, bomb threats, terrorist activities and organized crime activities. It is part of a larger forensic role known as acoustic analyses, which involves tape filtering and enhancement, tape authentication, gunshot acoustics, reconstruction of conversations and the analysis of any other questioned acoustic event.

Theory

The fundamental theory for voice identification rests on the premise that every voice is individually characteristic enough to distinguish it from others through voiceprint analysis. There are two general factors involved in the process of human speech. The first factor in determining voice uniqueness lies in the sizes of the vocal cavities, such as the throat, nasal and oral cavities, and the shape, length and tension of the individual's vocal cords located in the larynx. The vocal cavities are resonators, much like organ pipes, which reinforce some of the overtones produced by the vocal cords, which produce formats or voiceprint bars. The likelihood that two people would have all their vocal cavities the same size and configuration and coupled identically appears very remote.

The second factor in determining voice uniqueness lies in the manner in which the articulators or muscles of speech are manipulated during speech. The articulators include the lips, teeth, tongue, soft palate and jaw muscles whose controlled interplay produces intelligible speech. Intelligible speech is developed by the random learning process of imitating others who are communicating. The likelihood that two people could develop identical use patterns of their articulators also appears very remote.

Therefore, the chance that two speakers would have identical vocal cavity dimensions and configurations coupled with identical articulator use patterns appears extremely remote. While there have been claims that sever al voices have been found to be indistinguishable, no evidence to support such allegations has been published, offered for examination or demonstrated to the authors.

Several studies have been published evidencing the ability to reliably identify voices under certain conditions, and a Federal Bureau of Investigation survey of its own performance in the examination of 2,000 forensic cases revealed an error rate of 0.31 percent for false identifications, and 0.53 per cent for false eliminations. (See Koenig, B.E., 1986, Spectrographic Voice Identification: a forensic survey, Journal of the Acoustical Society of America, 79:2088-2090.)

While there is disagreement in the so-called "scientific community" on the degree of accuracy with which examiners can identify speakers under all conditions, there is agreement that voices can, in fact, be identified.

To facilitate the visual comparisons of voices, a sound spectrograph is used to analyze the complex speech wave form into a pictorial display on what is referred to as a spectrogram. The spectrogram displays the speech signal with the time along the horizontal axis, frequency on the vertical axis, and relative amplitude indicated by the degree of gray shading on the display. The resonance of the speaker's voice is displayed in the form of vertical signal impressions or markings for consonant sounds, and horizontal bars or formants for vowel sounds. The visible configurations displayed are characteristic of the articulation involved for the speaker producing the words and phrases. The spectrograms serve as a permanent record of the words spoken and facilitate the visual comparison of similar words spoken between and unknown and known speaker's voice.

Procedural Guidelines

The acoustic environment in many cases can be controlled at the receiving end of speech signal. Shutting off the radio, television or other signal- noise generating devices will reduce or eliminate unwanted background speech or noise. While not always possible, the investigator should at tempt to select a reasonably quiet environment for controlled activities such as drug buys or other illegal operations being investigated. Many times these types of activities are carried out in bars, restaurants, car washes, billiard rooms and the like, and the investigator cannot always dictate the location.

It may require the recording of telephone conversations or face-to-face encounters under a variety of acoustic conditions in which someone is wearing a body recorder or transmitting the conversation via radio frequency to a remote location. Unfortunately, in many cases the investigators cannot control the acoustic environment. In situations involving an adverse environment, investigators should use high technology stereo equipment to optimize recording capability.

The attempt to produce samples as parallel to the unknown as possible actually assists the examiner in his task because speaker variables are reduced to a minimum. Numerous studies have been conducted that indicate very reliable decisions can be made by trained professional examiners when samples are obtained in the manner described.

The notion proposed by some opponents that duplicating the unknown as closely as possible may cause error is not supported by any available evidence. Research studies have produced strong evidence that even very good mimics cannot duplicate an- other's speech patterns.

In an attempt to obtain proper speech samples, investigators should not hesitate to ask suspects for the samples they need. Surprisingly, many suspects will voluntarily give a sample of their voice for comparison purposes.

In the event you are dealing with some type of vocal' disguise, attempt to obtain a similarly produced known exemplar in addition to the suspect's normal voice. It should be noted that vocal disguises can be very difficult for the examiner to deal with and the probability of determination is less than with normal voice samples.

If a suspect refuses to cooperate with the investigator, a court order may be acquired compelling the suspect to produce voice recordings for the purpose of comparison. Courts have repeatedly held that requiring the accused to submit voice exemplars for the purpose of comparison for identification or elimination does not violate the suspect's Fifth Amendment rights. In Wade, 388 U.S. 218 (1967), the Court held that the privilege against self-incrimination offers no protection from compulsion to submit to speaking for purpose of voice identification, or to writing, photographing, finger- printing and measurements.

Several problems have been encountered in obtaining known voice exemplars even with the use of a court order. If the court order is vague, the suspect may utter a few words of the text involved, speak too softly, too fast, or too slowly, or otherwise disguise the sample and claim compliance with the order.

To prevent such problems, the investigator is wise to request that the court order specify in detail, that the suspect give a sample of his or her voice, repeating the phrases of the questioned call in a natural conversational voice (or in a similar disguise, if that is the case) and that such sample shall be given at least three times and to the reasonable satisfaction of the investigator. Voice exemplars obtained with such specific instructions are usually very satisfactory for comparison purposes.

Before terminating the recording session, check the recording to deter mine whether or not a satisfactory exemplar was obtained.' Remember that once a suspect is released, a second known sample may be very difficult to obtain.

Whatever the recording circum stance, background noise and the distance between the talker and the receiving device should be minimized for optimal recording. Good quality tape recording equipment should be used, as well as magnetic recording tape. As a rule of thumb, recording tape with standard 120 equalization, normal bias and no more than a 5 dB drop at 6 KHz should be used.

After the development of a suspect, the next task is to properly obtain known voice samples for comparison purposes. Do not hesitate to ask a suspect for a speech sample. If the suspect refuses, a court order may be obtained requiring compliance with the request. See Schmerber v. California, 384 US. 757(1966). and Gilbert v. California, 388 US. 263 (1967). Both are landmark cases. There are also many additional decisions at both state and federal court levels that may be cited to support such a request. Court orders should clearly spell out the minimum number of samples to be obtained, the manner of speech, and the method to be employed.

The next task for the investigator is to obtain proper speech samples for comparison purposes. Probably the best guide here is attempting to duplicate the recording of the questioned call. Known samples should be obtained via the telephone and recorded in the same manner as the questioned call. If possible, the same recorder and telephone pickup should be used. In some cases, even the same telephone has been employed. If there is room on the questioned tape, the known sample may be placed on it. If there is not, another tape of the same type and brand should be used if at all possible.

Speech samples obtained should contain exactly the same words and phrases as those in the questioned sample because only like speech sounds are used for comparison. Be cause the voice, like handwriting, is dynamic and variant, several samples of each spoken phrase are desired for analysis. Unless the questioned call sounds like a read statement, the suspect should not be allowed to read the phrases from a transcript but should repeat each phrase after it is spoken by someone else. To avoid an unnatural verbal response, the suspect should repeat the first phrase and proceed in the same manner with each successive phrase.

When all phrases have been recorded, the same procedure should be repeated at least two more times beginning with the first word or phrase. The suspect may be asked to read the phrases if a very poor job of repeating is done. Some people do a better job of reading than repeating the phrases.

It is important that the known sample be spoken in the same manner as the questioned sample; therefore, the investigator should be familiar with the voice, manner of speech and the text. If the caller's voice was disguised, the suspect should give a normal sample and a disguised one as in the questioned call.

Recorded evidence should be wrapped in tinfoil to protect it from possible contact with a magnetic field if it is submitted by mail. The evidence should be shipped in a secure container that will prevent the evidence from tearing through the packaging material. Do not submit a copy of your investigative report with the evidence. The examiner does not want to know the details of the case. It is important, however, to provide the examiner with information regarding the recording method, the number of calls and suspects involved, and any other information that may assist the examiner in the examination of the evidence.

Upon receipt of the evidence by the laboratory, it is properly marked and a case number is assigned. The analysis and comparison of known and questioned voice samples may take several hours or days to complete, depending on the number of samples involved and the complexity of the examination. Both an aural (listening) and visual (spectrographic) examination and comparison is conducted. Aural and spectrographic cues examined should compliment one another in the event the voices are in fact the same.

As with the identification of fingerprints, there is presently no universal standard for the number of words required for identification. It does, how ever, vary from a minimum of 10 for some agencies and 20 for others. The Internal Revenue Service has chose to use 20 or more like speech sounds between an unknown and known sample with the degree of certainty based on quality and excellence of the evidence examined. Obtaining a second, independent decision is standard practice in this field as in other forensic sciences.

Visual comparison of spectrograms involves, in general, the examination of spectrograph features of like sounds as portrayed in spectrograms in terms of time, frequency and amplitude. Specific features, the result of producing consonants, vowels and semi-vowels in isolation or in combination (co-articulation), include the following but certainly not all-inclusive clues: pitch, bandwidth, mean frequency, trajectory of vowel formants, distribution of formant energy, nasal resonance, stops, plosives, fricatives, pauses, inter formant features and other idiosyncratic and pathological features.

Special aural comparison tapes are prepared facilitating comparison of psycholinguistic features via short-term memory. Aural cues compared include resonance quality, pitch, temporal factors, inflection, dialect, articulation, syllable grouping, breath pattern, disguise, pathologies and other peculiar speech characteristics.

Some agencies offer court testimony, others do not. The IRS laboratory is the only federal agency that presently offers testimony. All other certified examiners, whether in state agencies or in private practice, also offer court testimony.


Court Admissibility

Court testimony involving aural- spectrographic voice comparison essentially started having an impact on the courts after the Tosi Study in December 1970. Since then there have been between 150 and 200 trials in local, state or federal courts. Because of a difference based on evidentiary philosophical reasons, some courts have admitted aural-spectrographic voice evidence and others have not.

There are two general "rules" or "standards" by which scientific evidence is accepted in courts of law in the United States. The first, commonly referred to as the Frye "rule" or "test," is based on a 1923 District of Columbia case and basically requires "general acceptance in the particular field in which it belongs." See Frye v. United States, 54 App. D.C. 46, 293 F. 1013 (1923). The second is based on the argument of McCormick (See "McCormick on Evidence," 3rd Ed., 203 at 608.) McCormick states: "General scientific acceptance is a proper condition for taking judicial notice of scientific facts, but it is not a suitable criterion for the admissibility of scientific evidence. Any relevant conclusion supported by a qualified expert witness should be received unless there are distinct reasons for exclusion." See Rule 702 of the Federal Rules of Evidence.

Many state and federal courts have abandoned Frye and adopted the argument of McCormick. The supreme courts of Minnesota, Maine, Ohio and Rhode Island have admitted aural-spectrographic voice evidence following McCormick. Intermediate appellate courts in California, Mary land and Michigan admitted such evidence following Frye but were reversed by their respective supreme courts, which held that the Frye test had not been met. The Massachusetts Supreme Court held aural-spectrographic voice evidence admissible applying the Frye test, while those of Arizona, Indiana and Pensylvania did not.

In the federal court system, we are aware of 30 trials in which the question of aural-spectrographic voice evidence was addressed. All but three admitted the evidence based on Frye or McCormick. On appeal, the Second, Fourth and Sixth Circuits held the evidence admissible, applying McCormick, while the District of Columbia did not, applying Frye. See United States v. Williams, 583 F.2d 1194 (2d Cir.), cert. denied 439 US.

1117 (1978); United States v. Bailer, 519 F.2d 463 (4th Cir.), cert. denied

423 US. 1019 (1975); United States v. Franks, 511 F.2d 25 (6th Cir.) cert. denie4 422 US. 1042 (1975), and United States v. McDaniel, 538 F.2d 408 (D.C. Cir. 1976).

In United States v. Williams, supra at 1198, the court said: "The 'Frye' test is usually construed as necessitating a survey and categorization of the subjective views of a number of scientists, assuring thereby a reserve of experts available to testify. Difficulty in applying the 'Frye' test has led a number of courts to its implicit modification." Also see United States v. Bailer, supra at n.6.

Since 1970, the forensic application of aural-spectrographic voice identification has been reliably applied in the investigation of several thousand cases. While there is disagreement on the reliability of the method under all conditions, there is agreement that voices can be identified and eliminated when the proper conditions exist and the analysis is carefully conducted by qualified examiners.

Several state appellate and supreme courts have admitted the evidence, as have three of four federal appellate courts. The United States Supreme Court has refused to review and decide the three cases brought before it. While the admission of aural-spectrographic voice evidence continues to be decided in various courts, the method continues to be a very important tool m the arsenal against crime.

Other areas of acoustic analysis include, in part, gun shot analysis, tape enhancement and tape authentication. While not discussed in this article, it should be noted that laboratory analysis related to these problems is avail able in some laboratories.

 

By:  Steve Cain   Email: info@tapeexpert.com

先看效果: https://renmaiwang.cn/s/jkhfz Hue系列产品将具备高度的个性化定制能力,并且借助内置红、蓝、绿三原色LED的灯泡,能够混合生成1600万种不同色彩的灯光。 整个操作流程完全由安装于iPhone上的应用程序进行管理。 这一创新举措为智能照明控制领域带来了新的启示,国内相关领域的从业者也积极投身于相关研究。 鉴于Hue产品采用WiFi无线连接方式,而国内WiFi网络尚未全面覆盖,本研究选择应用更为普及的蓝牙技术,通过手机蓝牙与单片机进行数据交互,进而产生可调节占空比的PWM信号,以此来控制LED驱动电路,实现LED的调光功能以及DIY调色方案。 本文重点阐述了一种基于手机蓝牙通信的LED灯设计方案,该方案受到飞利浦Hue智能灯泡的启发,但考虑到国内WiFi网络的覆盖限制,故而选用更为通用的蓝牙技术。 以下为相关技术细节的详尽介绍:1. **智能照明控制系统**:智能照明控制系统允许用户借助手机应用程序实现远程控制照明设备,提供个性化的调光及色彩调整功能。 飞利浦Hue作为行业领先者,通过红、蓝、绿三原色LED的混合,能够呈现1600万种颜色,实现了全面的定制化体验。 2. **蓝牙通信技术**:蓝牙技术是一种低成本、短距离的无线传输方案,工作于2.4GHz ISM频段,具备即插即用和强抗干扰能力。 蓝牙协议栈由硬件层和软件层构成,提供通用访问Profile、服务发现应用Profile以及串口Profiles等丰富功能,确保不同设备间的良好互操作性。 3. **脉冲宽度调制调光**:脉冲宽度调制(PWM)是一种高效能的调光方式,通过调节脉冲宽度来控制LED的亮度。 当PWM频率超过200Hz时,人眼无法察觉明显的闪烁现象。 占空比指的...
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值