用python 绘制语谱图

用python 绘制语谱图

1.步骤:

1)导入相关模块
2)读入音频并获取音频参数 
3)将音频转化为可处理形式(注意读入的是字符串格式,需要转换成int或short型)

代码如下:


 
 
  1. import numpy as np
  2. import matplotlib.pyplot as plt
  3. import os
  4. import wave
  5. #读入音频。
  6. path = "E:\SpeechWarehouse\zmkm"
  7. name = 'zmkm0.wav'
  8. #我音频的路径为E:\SpeechWarehouse\zmkm\zmkm0.wav
  9. filename = os.path.join(path, name)
  10. # 打开语音文件。
  11. f = wave.open(filename, 'rb')
  12. # 得到语音参数
  13. params = f.getparams()
  14. nchannels, sampwidth, framerate,nframes = params[: 4]
  15. # 将字符串格式的数据转成int型
  16. strData = f.readframes(nframes)
  17. waveData = np.fromstring(strData,dtype=np.short)
  18. # 归一化
  19. waveData = waveData * 1.0/max(abs(waveData))
  20. #将音频信号规整乘每行一路通道信号的格式,即该矩阵一行为一个通道的采样点,共nchannels行
  21. waveData = np.reshape(waveData,[nframes,nchannels]).T # .T 表示转置
  22. f.close() #关闭文件

其中getparams方法介绍如下:

getnchannels()  -- returns number of audio channels (1 for mono, 2 for stereo)
getsampwidth()  -- returns sample width in bytes
getframerate()  -- returns sampling frequency
getnframes()    -- returns number of audio frames
getparams()     -- returns a namedtuple consisting of all of the above in the above order

稍微翻译一下:
    nchannels:音频通道数(the number of audio channels),getnchannels()
    sampwidth:每个音频样本的字节数(the number of bytes per audio sample),getsampwidth()
    framerate:采样率(the sampling frequency),getframerate()
    nframes:音频采样点数(the number of audio frames),getnframes()

4)绘制时域波形:

4.1)计算时间:t = n/fs
4.2)绘图


 
 
  1. '''绘制语音波形'''
  2. time = np.arange( 0,nframes) * ( 1.0 / framerate)
  3. time= np.reshape(time,[nframes, 1]).T
  4. plt.plot(time[ 0,:nframes],waveData[ 0,:nframes],c= "b")
  5. plt.xlabel( "time(seconds)")
  6. plt.ylabel( "amplitude")
  7. plt.title( "Original wave")
  8. plt.show()

时域波形如图: 

5)绘制语谱图:

5.1)求出帧长,一般取20~30ms
     N = t*fs 每帧点数等于每帧时间乘以采样率
    帧叠点数,一般取每帧点数的1/3~1/2
    且FFT点数等于每帧点数(即不补零)
5.2)绘制语谱图,利用specgram()方法


 
 
  1. #绘制频谱
  2. print( "plotting spectrogram...")
  3. framelength = 0.025 #帧长20~30ms
  4. framesize = framelength*framerate #每帧点数 N = t*fs,通常情况下值为256或512,要与NFFT相等\
  5. #而NFFT最好取2的整数次方,即framesize最好取的整数次方
  6. #找到与当前framesize最接近的2的正整数次方
  7. nfftdict = {}
  8. lists = [ 32, 64, 128, 256, 512, 1024]
  9. for i in lists:
  10. nfftdict[i] = abs(framesize - i)
  11. sortlist = sorted(nfftdict.items(), key= lambda x: x[ 1]) #按与当前framesize差值升序排列
  12. framesize = int(sortlist[ 0][ 0]) #取最接近当前framesize的那个2的正整数次方值为新的framesize
  13. NFFT = framesize #NFFT必须与时域的点数framsize相等,即不补零的FFT
  14. overlapSize = 1.0/ 3 * framesize #重叠部分采样点数overlapSize约为每帧点数的1/3~1/2
  15. overlapSize = int(round(overlapSize)) #取整
  16. spectrum,freqs,ts,fig = plt.specgram(waveData[ 0],NFFT = NFFT,Fs =framerate,window=np.hanning(M = framesize),noverlap=overlapSize,mode= 'default',scale_by_freq= True,sides= 'default',scale= 'dB',xextent= None) #绘制频谱图
  17. plt.ylabel( 'Frequency')
  18. plt.xlabel( 'Time(s)')
  19. plt.title( 'Spectrogram')
  20. plt.show()

specgram()方法概述,详细信息见官网   

matplotlib.pyplot.specgram(x, NFFT=None, Fs=None, Fc=None, detrend=None, window=None, noverlap=None, cmap=None, xextent=None, pad_to=None, sides=None, scale_by_freq=None, mode=None, scale=None, vmin=None, vmax=None, *, data=None, **kwargs)
#参数:
x : 信号,一维arry或deqyence
NFFT:fft点数,默认256.不应该用于的零填充,最好为2的整数次方
Fs:采样率,默认2
Fc:信号x的中心频率,默认为0,用于移动图像,
window : 窗函数,长度必须等于NFFT(帧长),默认为汉宁窗
        window_hanning(), window_none(), numpy.blackman(), numpy.hamming(), numpy.bartlett(), scipy.signal(), scipy.signal.get_window(), etc. 
sides : {'default', 'onesided', 'twosided'}单边频谱或双边谱
        Default gives the default behavior, which returns one-sided for real data and both for complex data. 
        'onesided' forces the return of a one-sided spectrum, 
        while 'twosided' forces two-sided.
pad_to : 执行FFT时填充数据的点数,可以与NFFT不同(补零,不会增加频谱分辨率,可以减轻栅栏效应,默认为None,即等于NFFT)
scale_by_freq : bool, optional是否按密度缩放频率,MATLAB默认为真
        Specifies whether the resulting density values should be scaled by the scaling frequency, which gives density in units of Hz^-1. 
        This allows for integration over the returned frequency values. The default is True for MATLAB compatibility.
mode : 使用什么样的频谱,默认为PSD谱(功率谱){'default', 'psd', 'magnitude', 'angle', 'phase'}
        'complex' returns the complex-valued frequency spectrum. 
        'magnitude' returns the magnitude spectrum. 
        'angle' returns the phase spectrum without unwrapping. 
        'phase' returns the phase spectrum with unwrapping.
noverlap : 帧叠点数,默认为128
scale : {'default', 'linear', 'dB'}频谱纵坐标单位,默认为dB
xextent : None or (xmin, xmax)图像x轴范围
cmap :A matplotlib.colors.Colormap instance; if , use default determined by rc
detrend : {'default', 'constant', 'mean', 'linear', 'none'} 
        The function applied to each segment before fft-ing, designed to remove the mean or linear trend. 
        Unlike in MATLAB, where the detrend parameter is a vector, in matplotlib is it a function. 
        The mlab module defines detrend_none(), detrend_mean(), and detrend_linear(), but you can use a custom function as well. 
        You can also use a string to choose one of the functions. 'default', 'constant', and 'mean' call detrend_mean(). 'linear' calls detrend_linear(). 'none' calls detrend_none()
#返回:
spectrum:频谱矩阵
freqs:频谱图每行对应的频率
ts:频谱图每列对应的时间
fig :图像     

结果如图:

2.完整代码 


 
 
  1. import numpy as np
  2. import matplotlib.pyplot as plt
  3. import os
  4. import wave
  5. #读入音频。
  6. path = "E:\SpeechWarehouse\zmkm"
  7. name = 'zmkm0.wav'
  8. #我音频的路径为E:\SpeechWarehouse\zmkm\zmkm0.wav
  9. filename = os.path.join(path, name)
  10. # 打开语音文件。
  11. f = wave.open(filename, 'rb')
  12. # 得到语音参数
  13. params = f.getparams()
  14. nchannels, sampwidth, framerate,nframes = params[: 4]
  15. #---------------------------------------------------------------#
  16. # 将字符串格式的数据转成int型
  17. print( "reading wav file......")
  18. strData = f.readframes(nframes)
  19. waveData = np.fromstring(strData,dtype=np.short)
  20. # 归一化
  21. waveData = waveData * 1.0/max(abs(waveData))
  22. #将音频信号规整乘每行一路通道信号的格式,即该矩阵一行为一个通道的采样点,共nchannels行
  23. waveData = np.reshape(waveData,[nframes,nchannels]).T # .T 表示转置
  24. f.close() #关闭文件
  25. print( "file is closed!")
  26. #----------------------------------------------------------------#
  27. '''绘制语音波形'''
  28. print( "plotting signal wave...")
  29. time = np.arange( 0,nframes) * ( 1.0 / framerate) #计算时间
  30. time= np.reshape(time,[nframes, 1]).T
  31. plt.plot(time[ 0,:nframes],waveData[ 0,:nframes],c= "b")
  32. plt.xlabel( "time")
  33. plt.ylabel( "amplitude")
  34. plt.title( "Original wave")
  35. plt.show()
  36. #--------------------------------------------------------------#
  37. '''
  38. 绘制频谱
  39. 1.求出帧长、帧叠点数。且FFT点数等于每帧点数(即不补零)
  40. 2.绘制语谱图
  41. '''
  42. print( "plotting spectrogram...")
  43. framelength = 0.025 #帧长20~30ms
  44. framesize = framelength*framerate #每帧点数 N = t*fs,通常情况下值为256或512,要与NFFT相等\
  45. #而NFFT最好取2的整数次方,即framesize最好取的整数次方
  46. #找到与当前framesize最接近的2的正整数次方
  47. nfftdict = {}
  48. lists = [ 32, 64, 128, 256, 512, 1024]
  49. for i in lists:
  50. nfftdict[i] = abs(framesize - i)
  51. sortlist = sorted(nfftdict.items(), key= lambda x: x[ 1]) #按与当前framesize差值升序排列
  52. framesize = int(sortlist[ 0][ 0]) #取最接近当前framesize的那个2的正整数次方值为新的framesize
  53. NFFT = framesize #NFFT必须与时域的点数framsize相等,即不补零的FFT
  54. overlapSize = 1.0/ 3 * framesize #重叠部分采样点数overlapSize约为每帧点数的1/3~1/2
  55. overlapSize = int(round(overlapSize)) #取整
  56. print( "帧长为{},帧叠为{},傅里叶变换点数为{}".format(framesize,overlapSize,NFFT))
  57. spectrum,freqs,ts,fig = plt.specgram(waveData[ 0],NFFT = NFFT,Fs =framerate,window=np.hanning(M = framesize),noverlap=overlapSize,mode= 'default',scale_by_freq= True,sides= 'default',scale= 'dB',xextent= None) #绘制频谱图
  58. plt.ylabel( 'Frequency')
  59. plt.xlabel( 'Time')
  60. plt.title( "Spectrogram")
  61. plt.show()

 

 

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值