サウンドファイルからフレームを取得する

サウンドファイルをフレームに分割して、フレーム内の特定のサウンド周波数の詳細な配列に変換する方法を知っている人はいますか？サウンドファイルからフレームを取得する

たとえば、cv2を使って、フィルムクリップをフレームに分割して、画像のライブラリとして保存することができます。このコードは、後で各画像のカラーヒストグラムを簡単に得ることができる程度に、ジョブをうまく処理します。

filepath1 = input('Please enter the filepath for where the frames should be saved: ') 

name = input('Please enter the name of the clip: ') 

ret, frame = clip.read() 
count = 0 
ret == True 
while ret: 
    ret, frame = clip.read() 
    cv2.imwrite(os.path.join(filepath1,name+'%d.png'%count), frame) 
    count += 1

しかし、私は、音声ファイルに対して同等の何かを簡単に見つけることはできません。誰にどのようにして（または）それを行うことができるかについての提案はありますか？

出典

2017-02-28 Lodore66

念のために - あなたは** **右、サウンドファイルから一連の画像を作成したくありませんか？ – kazemakase

はい、はい。なぜなら、私がサウンドファイルから一連の画像を探していたら、ここに掲載されたものよりずっと大きな問題があるからです。私は、この例を類推によってのみ使用します。サウンドファイルをフィルムフレームに相当するものにしたい。 – Lodore66

厳密に言えば、フィルムフレームに相当するサウンドファイルはオーディオサンプルです。これはチャンネルごとに1つの値に過ぎないので、それが本当に必要なのかどうかはわかりません。あなたが達成したいと思っているのは、時間の経過とともにファイルの周波数内容がどのように変化するかを分析することです。

おそらくspectrogramを見たいと思っていますか？この場合、www.frank-zalkow.deから取られた次のスクリプトは、あなたが望むものを正確に行うか、あるいは始める方法をいくつか考えているかもしれません。

#!/usr/bin/env python 
#coding: utf-8 
""" This work is licensed under a Creative Commons Attribution 3.0 Unported License. 
    Frank Zalkow, 2012-2013 """ 

import numpy as np 
from matplotlib import pyplot as plt 
import scipy.io.wavfile as wav 
from numpy.lib import stride_tricks 

""" short time fourier transform of audio signal """ 
def stft(sig, frameSize, overlapFac=0.5, window=np.hanning): 
    win = window(frameSize) 
    hopSize = int(frameSize - np.floor(overlapFac * frameSize)) 

    # zeros at beginning (thus center of 1st window should be for sample nr. 0) 
    samples = np.append(np.zeros(np.floor(frameSize/2.0)), sig)  
    # cols for windowing 
    cols = np.ceil((len(samples) - frameSize)/float(hopSize)) + 1 
    # zeros at end (thus samples can be fully covered by frames) 
    samples = np.append(samples, np.zeros(frameSize)) 

    frames = stride_tricks.as_strided(samples, shape=(cols, frameSize), strides=(samples.strides[0]*hopSize, samples.strides[0])).copy() 
    frames *= win 

    return np.fft.rfft(frames)  

""" scale frequency axis logarithmically """  
def logscale_spec(spec, sr=44100, factor=20.): 
    timebins, freqbins = np.shape(spec) 

    scale = np.linspace(0, 1, freqbins) ** factor 
    scale *= (freqbins-1)/max(scale) 
    scale = np.unique(np.round(scale)) 

    # create spectrogram with new freq bins 
    newspec = np.complex128(np.zeros([timebins, len(scale)])) 
    for i in range(0, len(scale)): 
     if i == len(scale)-1: 
      newspec[:,i] = np.sum(spec[:,scale[i]:], axis=1) 
     else:   
      newspec[:,i] = np.sum(spec[:,scale[i]:scale[i+1]], axis=1) 

    # list center freq of bins 
    allfreqs = np.abs(np.fft.fftfreq(freqbins*2, 1./sr)[:freqbins+1]) 
    freqs = [] 
    for i in range(0, len(scale)): 
     if i == len(scale)-1: 
      freqs += [np.mean(allfreqs[scale[i]:])] 
     else: 
      freqs += [np.mean(allfreqs[scale[i]:scale[i+1]])] 

    return newspec, freqs 

""" plot spectrogram""" 
def plotstft(audiopath, binsize=2**10, plotpath=None, colormap="jet"): 
    samplerate, samples = wav.read(audiopath) 
    s = stft(samples, binsize) 

    sshow, freq = logscale_spec(s, factor=1.0, sr=samplerate) 
    ims = 20.*np.log10(np.abs(sshow)/10e-6) # amplitude to decibel 

    timebins, freqbins = np.shape(ims) 

    plt.figure(figsize=(15, 7.5)) 
    plt.imshow(np.transpose(ims), origin="lower", aspect="auto", cmap=colormap, interpolation="none") 
    plt.colorbar() 

    plt.xlabel("time (s)") 
    plt.ylabel("frequency (hz)") 
    plt.xlim([0, timebins-1]) 
    plt.ylim([0, freqbins]) 

    xlocs = np.float32(np.linspace(0, timebins-1, 5)) 
    plt.xticks(xlocs, ["%.02f" % l for l in ((xlocs*len(samples)/timebins)+(0.5*binsize))/samplerate]) 
    ylocs = np.int16(np.round(np.linspace(0, freqbins-1, 10))) 
    plt.yticks(ylocs, ["%.02f" % freq[i] for i in ylocs]) 

    if plotpath: 
     plt.savefig(plotpath, bbox_inches="tight") 
    else: 
     plt.show() 

    plt.clf() 

plotstft("my_audio_file.wav")

出典

2017-02-28 14:44:34 kazemakase

しかし、スペクトログラムは長いオーディオ録音に対してのみ意味がありますが、単一の時間インデックスでは意味がありません。 OPがおそらく代わりにしたいのは、各フレームの短い音声を抽出し、FFTを実行して、そのフレームで結果の周波数スペクトラムを得ることです。スペクトルをプロットするか、ファイルに保存することができます。この[ブログ投稿]（http://www.cbcity.de/die-fft-mit-python-einfach-erklaert）は、この作業に必要なもののほとんどを記述しています（ドイツ語の最初の段落は気にしないでください）。 – Schmuddi

@Schmuddiそれは正しいです。 *各フレームの短い音声を抽出し、FFTを実行すると、そのフレームで得られる周波数スペクトラムをスペクトログラムで正確に表示します。基本的に、結果の画像の各列は、オーディオの「ストレッチ」のFFT/PSDです。 – kazemakase

ありがとうございました。これらの答えはすべて有用であり、どちらも私が興味を持っていることを実践する貴重な（そして補完的な）方法を提供します。 – Lodore66

サウンドファイルからフレームを取得する

答えて

関連する問題