{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "c0c76943", "metadata": { "id": "1d08fdcb" }, "source": [ "# Signall Processing工具使用" ] }, { "attachments": {}, "cell_type": "markdown", "id": "9d5eb2bc", "metadata": { "id": "857881d5" }, "source": [ "在理論課程中已有補充數位信號處裡的基礎與常見分析方式。\n", "\n", "這邊我們介紹用python完成這些訊號處裡的一些工具,方便後面AI modeling時使用。\n", "\n", "我們主要會聚焦在聲音處裡上,不過其中很多概念與工具都可以用在其他訊號的處理。" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e8fe1b3b", "metadata": { "id": "a664ddb8" }, "source": [ "課程包含以下內容:\n", "* Audio data\n", "* Up/Down Sampling\n", "* Fast Fourier Transform (FFT)\n", "* Short-Time Fourier Transform (STFT)\n", "* Mel-Spectrogram\n", "* Mel-Frequency Cepstral Coefficient (MFCC)" ] }, { "cell_type": "code", "execution_count": null, "id": "ec4b9d61", "metadata": { "id": "42dc1557", "scrolled": true }, "outputs": [], "source": [ "# 處理音訊最常用librosa\n", "!pip install librosa" ] }, { "cell_type": "code", "execution_count": null, "id": "fad1e04c", "metadata": { "id": "19889f25" }, "outputs": [], "source": [ "import librosa\n", "import IPython.display as idp # 播音工具\n", "import librosa.display as ldp # 畫頻譜圖工具\n", "import numpy as np # 輔助運算\n", "import matplotlib.pyplot as plt # 輔助畫圖\n", "from plotly import express as px" ] }, { "cell_type": "code", "execution_count": null, "id": "bb3395ee", "metadata": { "id": "QcqLSsCzOem7" }, "outputs": [], "source": [ "def plot_signal(signal, sr, start=0, end=None, labels=None, title=None):\n", " # Visualizes signal data\n", " # Args:\n", " # signal (list of array of int) - 時間點對應的訊號,列表內時間序列數量為D,每筆資料長度為T,若非為列表則轉為列表\n", " # start (int) - 開始的資料序(第幾筆)\n", " # end (int) - 結束繪製的資料序(第幾筆)\n", " # labels (list of strings)- 對於多時間序列或多維度的標註\n", " # title (string)- 圖片標題\n", "\n", " # 若資料只有一筆,則轉為list\n", " if type(signal) != list:\n", " signal = [signal]\n", "\n", " if not end:\n", " end = len(signal[0])\n", " time = (np.arange(len(signal[0]))/sr)[start:end]\n", " if labels:\n", " # 設立dictionary, 讓plotly畫訊號線時可以標註label\n", " dictionary = {\"time\": time}\n", " for idx, l in enumerate(labels):\n", " # 截斷資料,保留想看的部分,並分段紀錄於dictionary中\n", " dictionary.update({l: signal[idx][start:end]})\n", " # 畫訊號線\n", " fig = px.line(dictionary,\n", " x=\"time\",\n", " y=list(dictionary.keys())[1:],\n", " width=1000,\n", " height=400,\n", " title=title)\n", " else:\n", " # 畫訊號線\n", " fig = px.line(x=time,\n", " y=signal,\n", " width=1000,\n", " height=400,\n", " title=title)\n", " fig.show()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "1a2b3ae4", "metadata": { "id": "HyKOaGqmBr6l" }, "source": [ "## Audio Data" ] }, { "attachments": {}, "cell_type": "markdown", "id": "50dd8c6e", "metadata": { "id": "l4q7DGb6ByUF" }, "source": [ "我們可以使用Librosa讀取資料,這邊提供範例資料:\n", "* google小姐的一段語音 (以這個為示範) - data/mrs_google_aia.mp3\n", "* 鋼琴聲(Mozart, Sonata K.331 - I. Andante grazioso) (可以拿來自行練習) - data/signal_googl.wav" ] }, { "cell_type": "code", "execution_count": null, "id": "0c9d1053", "metadata": { "id": "ecde300e" }, "outputs": [], "source": [ "# 下載檔案並存到data資料夾\n", "!mkdir -p data\n", "!wget -O data/demo.zip https://github.com/TA-aiacademy/course_3.0/releases/download/TSRNN/sound_demo.zip\n", "!unzip data/demo.zip -d data" ] }, { "cell_type": "code", "execution_count": null, "id": "03060bec", "metadata": { "id": "V5YigFrPB7mf" }, "outputs": [], "source": [ "# 使用librosa.load可以讀取檔案,讀出來預設是float32格式\n", "# 'path' - 檔名\n", "# 'sr' - 可以指定聲音檔讀出來後的sampling rate\n", "\n", "googl_, sr = librosa.load(\"data/mrs_google_aia.mp3\") # 讀出訊號檔以及sampling rate\n", "piano_, sr = librosa.load(\"data/signal_piano.wav\") # 讀出訊號檔以及sampling rate" ] }, { "cell_type": "code", "execution_count": null, "id": "4721c2c5", "metadata": { "id": "tTMdnU7tb3e7" }, "outputs": [], "source": [ "# 使用IPython的display.Audio可以放出來試聽\n", "idp.Audio(\"data/signal_piano.wav\")" ] }, { "cell_type": "code", "execution_count": null, "id": "338c316c", "metadata": { "id": "8aU9ZBNgb4ED" }, "outputs": [], "source": [ "idp.Audio(\"data/mrs_google_aia.mp3\")" ] }, { "cell_type": "code", "execution_count": null, "id": "fe47f64d", "metadata": { "id": "6i4yI_JEB_WH" }, "outputs": [], "source": [ "%matplotlib inline\n", "# 使用librosa的display.waveplot可以將波形的波封畫出來,並且對上實際時間做為參考\n", "ldp.waveshow(googl_, sr=sr)" ] }, { "cell_type": "code", "execution_count": null, "id": "74b028af", "metadata": { "id": "vpfNWgFOMDIK" }, "outputs": [], "source": [ "# librosa那邊的圖不太能操控或縮放,所以也可以自己用plotly寫的比較好操控,可以Zoom-in\n", "plot_signal(googl_, sr, labels=['google'])" ] }, { "attachments": {}, "cell_type": "markdown", "id": "763d7ef3", "metadata": { "id": "1XlSHdUrQ3Pj" }, "source": [ "把訊號畫出來對解析很重要,但每種情況可能需求都不太一樣\n", "\n", "例如音訊可能十分密集,所以可能可以不需要看得太細,可以稍微re-sample一下" ] }, { "attachments": {}, "cell_type": "markdown", "id": "1415b19f", "metadata": { "id": "sqwPN5MHXY2U" }, "source": [ "## Re-Sampling" ] }, { "attachments": {}, "cell_type": "markdown", "id": "a0b45b1b", "metadata": { "id": "-LyLuoopcivt" }, "source": [ "Sampling rate的調整是處理訊號重要的一環,要是要將兩個訊號疊加,但發現兩者雖然sample數相同sampling rage卻不相符則會有錯誤的疊加效果。" ] }, { "attachments": {}, "cell_type": "markdown", "id": "2a853e17", "metadata": { "id": "fuAAH1cAdGwN" }, "source": [ "Re- sampling有各種方法,對訊號皆有擾動,詳細可參考librosa官網參數```res_type```:\n", "https://librosa.org/doc/main/generated/librosa.resample.html\n", "\n", "預設的就一般訊號分析來講很夠用了。" ] }, { "attachments": {}, "cell_type": "markdown", "id": "2b6dba06", "metadata": { "id": "UxMss50vfatG" }, "source": [ "### Downsample" ] }, { "cell_type": "code", "execution_count": null, "id": "708118b6", "metadata": { "id": "KL3Hrw-OXYTs" }, "outputs": [], "source": [ "# Downsample 減少sampling rate,也減少資料點數\n", "googl_3000 = librosa.resample(googl_, orig_sr=sr, target_sr=3000)" ] }, { "cell_type": "code", "execution_count": null, "id": "137c844f", "metadata": { "id": "eVJhSrU2eS0G" }, "outputs": [], "source": [ "# 同樣是10秒錄音,在resample後點數會變少\n", "print(len(googl_), len(googl_3000))\n", "# 若是需求的sampling rate剛好是原本的因數,那也可以直接依downsample的倍率做sample\n", "googl_11025 = googl_[::2]\n", "print(len(googl_), len(googl_11025))" ] }, { "cell_type": "code", "execution_count": null, "id": "7ab0f894", "metadata": { "id": "dGxuGmA7P5hr" }, "outputs": [], "source": [ "# 可聽出聲音變得比較朦朧,第一是downsample後失真,第二也是人類對特定頻段較有知覺,剛好有些頻段被截掉了\n", "# 參考 https://en.wikipedia.org/wiki/Equal-loudness_contour\n", "idp.Audio(googl_3000, rate=3000)" ] }, { "cell_type": "code", "execution_count": null, "id": "093946e2", "metadata": { "id": "zcG4jP0DX-zA" }, "outputs": [], "source": [ "# 從waveplot上可以看到在經過downsample後 有點失真了\n", "ldp.waveshow(googl_[:int(0.2*sr)], sr=sr)\n", "ldp.waveshow(googl_3000[:int(0.2*3000)], sr=3000)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "115168a7", "metadata": { "id": "Yty3j8OXfo6Z" }, "source": [ "### Upsample" ] }, { "cell_type": "code", "execution_count": null, "id": "cae32792", "metadata": { "id": "fDu37RRFYCqN" }, "outputs": [], "source": [ "# Upsample 增加sampling rate,也增加資料點數\n", "googl_44100 = librosa.resample(googl_, orig_sr=sr, target_sr=44100)" ] }, { "cell_type": "code", "execution_count": null, "id": "b703a4c8", "metadata": { "id": "ix-kQSSaf4zG" }, "outputs": [], "source": [ "# 可看一下資料數\n", "print(len(googl_), len(googl_44100))" ] }, { "cell_type": "code", "execution_count": null, "id": "25b2765e", "metadata": { "id": "LLi9eBjCZQZN" }, "outputs": [], "source": [ "# 在upsample時不會有失真的情況,因為是類似在資料間插值補值\n", "ldp.waveshow(googl_44100[:int(0.2*44100)], sr=44100)\n", "ldp.waveshow(googl_[:int(0.2*sr)], sr=sr)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "7826d5c6", "metadata": { "id": "umjSkvAyjiGX" }, "source": [ "一般來講upsampling都不存在標準答案,因為就是sampling rate不夠才要補。\n", "\n", "Upsampling本身後來在AI領域也變成一個議題,叫Super Resolution,使用訓練好的神經網路模型來做upsampling增加解析度。" ] }, { "attachments": {}, "cell_type": "markdown", "id": "7573761c", "metadata": { "id": "Dh79YwxhkHaA" }, "source": [ "## Fast Fourier Transform" ] }, { "attachments": {}, "cell_type": "markdown", "id": "8bd27568", "metadata": { "id": "b0pX0BkVm2-h" }, "source": [ "分析訊號的成分最常見的方式是頻譜分析,連續訊號可以藉由Fourier Transform得到頻譜,也就是訊號在頻率上的分布。\n", "\n", "而對於數位訊號,我們可以藉由Discrete Fourier Transform得到頻譜\n", "$$𝐗_k≔\\sum\\limits_{𝑛=0}^{𝑁−1}𝑥_𝑛 𝑒^{(−𝑗2𝜋𝑘𝑛/𝑁)}$$\n", "\n", "\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "2bebc15b", "metadata": { "id": "EJAA1gQjkJrn" }, "source": [ "Scipy或者其他套件包有提供一些方式做快速的Discrete Fourier Transform,稱為Fast Fourier Transform(FFT)。\n", "\n", "若是訊號分析則用Scipy就好,若是要整合到神經網路上可能就得使用Tensorflow或Pytorch內建的fft。" ] }, { "cell_type": "code", "execution_count": null, "id": "2a72b27d", "metadata": { "id": "Cib8M4uqkI3_" }, "outputs": [], "source": [ "from scipy import fft" ] }, { "attachments": {}, "cell_type": "markdown", "id": "fe5ea14d", "metadata": { "id": "2K_3FHIHgrQ7" }, "source": [ "$$$$這邊因為我們做的是離散的Fourier轉換,若要看到原本單位(seconds, Hz)則需進行轉換。\n", "* 時間: $n=f_s t$\n", "* 頻率: $k=f N /f_s$\n", "\n", "N為參與FFT的點數,$f_s$則是sampling rate。" ] }, { "cell_type": "code", "execution_count": null, "id": "51865602", "metadata": { "id": "A23Krx1Olixg" }, "outputs": [], "source": [ "# 使用scipy.fft.fft可以對訊號做fft\n", "# 'x' - 資料\n", "# 'n' - FFT點數,通常不指定,預設為資料總點數\n", "# 記得做完Fourier Transform後m, 出來都是複數 (預設為complex64格式)\n", "# (不管是continuous/discrete/fast Fourier Transfor)\n", "N = len(googl_)\n", "googl_f = fft.fft(googl_[:N])\n", "frequency = np.linspace(0, sr, N)\n", "max_f = sr/2 # 設定想看到的最大頻率 (Hz)\n", "max_k = int(max_f*N/sr) # 轉成k\n", "\n", "# 通常我們是看這個複數的magnitude, 取absolute就可以做到\n", "plt.plot(frequency[:max_k], abs(googl_f[:max_k]))\n", "plt.xlabel(\"frequency(Hz)\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "dba05c3e", "metadata": { "id": "R8ds4Y68eOLx" }, "source": [ "在頻譜圖中我們可以看到較為高峰的點就表示訊號有較多該頻率成分。\n", "\n", "我們可以試著把做頻譜的時間縮點一點,來看看最前面一段的頻譜" ] }, { "cell_type": "code", "execution_count": null, "id": "6a518724", "metadata": { "id": "AmXILhgio_wB" }, "outputs": [], "source": [ "N1 = int(0.12*sr)\n", "N2 = int(0.15*sr)\n", "plt.plot(np.arange(N1, N2)/sr, googl_[N1:N2])\n", "plt.xlabel(\"time(s)\")\n", "plt.show()\n", "googl_f2 = fft.fft(googl_[N1:N2])\n", "frequency2 = np.linspace(0, sr, N2-N1)\n", "max_f = 4000 # 設定想看到的最大頻率 (Hz)\n", "max_k2 = int(max_f*(N2-N1)/sr) # 轉成k\n", "plt.plot(frequency2[:max_k2], abs(googl_f2[:max_k2]))\n", "plt.xlabel(\"frequency(Hz)\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "8fc2a805", "metadata": { "id": "sWsStQmZkE-l" }, "source": [ "我們取第一個音的頻譜來看可看到一個主頻率,就語音來講會比較模糊一點" ] }, { "cell_type": "code", "execution_count": null, "id": "1318c817", "metadata": { "id": "xMtdRiFEfvjD" }, "outputs": [], "source": [ "plt.plot(frequency[:max_k], abs(googl_f[:max_k]))\n", "plt.plot(frequency2[:max_k2], abs(googl_f2[:max_k2])*200)\n", "plt.xlabel(\"frequency(Hz)\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "fb2ca9bf", "metadata": { "id": "0UpCO6uEkZ_D" }, "source": [ "擷取與整段對比:\n", "* 整段時長較長,堆積出的能量比較高\n", "* 整段音調混雜,彼此交錯" ] }, { "attachments": {}, "cell_type": "markdown", "id": "9ececd9d", "metadata": { "id": "HZDVSP1by_UA" }, "source": [ "亦可用```librosa.power_to_db```轉換成分貝頻譜來看,會使scale差比較多的部分較為緩和,方便比較" ] }, { "cell_type": "code", "execution_count": null, "id": "c037bb8a", "metadata": { "id": "3GHhRC3Gy-oX" }, "outputs": [], "source": [ "librosa.power_to_db(abs(googl_f2))\n", "plt.plot(frequency[:max_k], librosa.power_to_db(abs(googl_f[:max_k])))\n", "plt.plot(frequency2[:max_k2], librosa.power_to_db(abs(googl_f2[:max_k2])))\n", "plt.xlabel(\"frequency(Hz)\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "6dd89730", "metadata": { "id": "TyxKFZ9glOXk" }, "source": [ "## Short-Time Fourier Transform" ] }, { "attachments": {}, "cell_type": "markdown", "id": "bb4a11cb", "metadata": { "id": "8T0FWW9clFWD" }, "source": [ "若是想統計出一段內出現哪些音符,則這個FFT就夠用了,但通常想知道的是在每個時間點頻率長什麼樣子就必須要做時頻分析。\n", "\n", "時頻分析中最常見的方式就是Sort-Time Fourier Transform:\n", "1. Windowing\n", "2. 個別做頻譜\n", "\n", "公式:\n", "\n", "$𝑋[q,k] = \\sum\\limits_{𝑛^′=⌈−𝑁/2⌉}^{⌈𝑁/2⌉−1}𝑥[𝑛^′+𝑞𝐻]𝑤[𝑛^′] 𝑒^\\frac{−𝑗2𝜋𝑘𝑛^′}{𝑁}$\n", "\n", "* q- 每個window的離散時間點\n", "* k- 離散頻率點\n", "* n'- 原訊號的離散時間點\n", "* N- window內FFT點數,同時是window size\n", "* H- hop size,每個window離多少n'\n", "* w- windowing function\n", "* x- signal\n", "\n", "\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "60dddc84", "metadata": { "id": "Kg2Iq4oml24E" }, "source": [ "Librosa裡面沒有FFT但有很多時頻分析的工具。\n", "\n", "我們這邊使用```librosa.stft```來做時頻分析" ] }, { "cell_type": "code", "execution_count": null, "id": "7a619928", "metadata": { "id": "aO6mWtO4i8gj" }, "outputs": [], "source": [ "# 使用librosa.stft可以對訊號做stft\n", "# 'y' - 資料\n", "# 'n_fft' - FFT點數,同時是window長度,這邊就一定要指定了,預設2048,可根據自己想看到的頻率範圍調整\n", "# 'hop_length' - 每個window間要跳多長\n", "# 'window' - windowing function,預設是 'hann'\n", "# 出來是複數 (預設為complex64格式)\n", "\n", "N = 2048\n", "H = 512\n", "googl_S = librosa.stft(googl_, n_fft=N, hop_length=H)" ] }, { "cell_type": "code", "execution_count": null, "id": "81d09202", "metadata": { "id": "LupiQhJRpLWO" }, "outputs": [], "source": [ "# 可以看出,我們使用librosa STFT可生出一個K x Q 的矩陣\n", "print(googl_S.shape)\n", "print(type(googl_S))\n", "print(type(googl_S[0, 0]))" ] }, { "cell_type": "code", "execution_count": null, "id": "22038b7b", "metadata": { "id": "1YJ3lxQRrTkO" }, "outputs": [], "source": [ "# K的大小來自N points/2,因為大於sampling rate/2的訊號會aliase\n", "print(N//2+1)\n", "# Q的大小來自總長度除以hop length,就是windowing的次數\n", "print(len(googl_)//H+1)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "ff00baf6", "metadata": { "id": "o_4PTzc2smnX" }, "source": [ "使用```librosa.display.specshow```可以畫出頻譜圖" ] }, { "cell_type": "code", "execution_count": null, "id": "430617b7", "metadata": { "id": "ExdDODgYs0Ym" }, "outputs": [], "source": [ "import librosa.display as ldp # 顯示訊號工具" ] }, { "cell_type": "code", "execution_count": null, "id": "07263895", "metadata": { "id": "RYGf63yUpOt1" }, "outputs": [], "source": [ "# 使用librosa.display.specshow畫出Spectrogram\n", "# 'data' - Spectrogram\n", "# 'sr' - sampling rate\n", "# 'x_axis' - x 軸刻度單位\n", "# 'y_axis' - y 軸刻度單位,預設為 'hz',若要使用log scale可以用 'log'\n", "# 'cmap' - color map\n", "\n", "plt.figure(figsize=(6, 5))\n", "ldp.specshow(abs(googl_S),\n", " sr=sr,\n", " x_axis=\"s\",\n", " y_axis=\"hz\",\n", " cmap=\"jet\")\n", "plt.colorbar(format=\"%+4.f\")\n", "plt.show()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "201ede00", "metadata": { "id": "10s-uLFpthQ-" }, "source": [ "若全圖檢視可以看到所有包含的頻率,但其中Y軸的scale有點太大使得較低頻較有資訊的部分看不見,可以轉而使用log scale frequency" ] }, { "cell_type": "code", "execution_count": null, "id": "c9be02d6", "metadata": { "id": "IpEb4PCus1_I" }, "outputs": [], "source": [ "plt.figure(figsize=(6, 5))\n", "ldp.specshow(googl_S, sr=sr, x_axis=\"s\", y_axis=\"log\", cmap=\"jet\") # **\n", "plt.colorbar(format=\"%+4.f\")\n", "plt.clim([5, 98]) # **\n", "plt.show()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "caf01c8c", "metadata": { "id": "efmGlOQp0Lff" }, "source": [ "結果如果對比不夠明顯,可以使用```librosa.power_to_db```轉成分貝來看" ] }, { "cell_type": "code", "execution_count": null, "id": "d9e41e36", "metadata": { "id": "qKSvw99Quyj3" }, "outputs": [], "source": [ "googl_S_db = librosa.power_to_db(abs(googl_S)) # **\n", "plt.figure(figsize=(6, 5))\n", "ldp.specshow(googl_S_db, sr=sr, x_axis=\"s\", y_axis=\"log\", cmap=\"jet\")\n", "plt.colorbar(format=\"%+4.f\")\n", "plt.show()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "b83278ad", "metadata": { "id": "iIHp677w478C" }, "source": [ "如果對於power較小的部分不想看到,可以調整```clim```" ] }, { "cell_type": "code", "execution_count": null, "id": "5913e1a7", "metadata": { "id": "UjvTObYV0GmI" }, "outputs": [], "source": [ "googl_S_db = librosa.power_to_db(abs(googl_S))\n", "plt.figure(figsize=(6, 5))\n", "ldp.specshow(googl_S_db, sr=sr, x_axis=\"s\", y_axis=\"log\", cmap=\"jet\")\n", "plt.colorbar(format=\"%+2.f\")\n", "plt.clim([-10, 10]) # **\n", "plt.show()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "632dc54b", "metadata": { "id": "UgUwQJrGSBOQ" }, "source": [ "### Spectral-centroid" ] }, { "attachments": {}, "cell_type": "markdown", "id": "bd8f8eb6", "metadata": { "id": "ACJNQUgPSH74" }, "source": [ "若想要知道這個Spectrogram中頻率大致上在哪邊以及頻率的變化可以計算spectral centroid\n", "\n", "$SC[q] = \\sum\\limits_{k=1}^N{\\frac{F[q,k]}{\\sum_{k'=1}^N{F[q,k']}}\\times k}$\n", "\n", "librosa提供```librosa.feature.spectral_centroid```這個功能" ] }, { "cell_type": "code", "execution_count": null, "id": "456ec37c", "metadata": { "id": "KYglegPlSAcA" }, "outputs": [], "source": [ "# 使用librosa.feature.spectral_centroid計算頻率中心隨時間的變化\n", "# 'y' or 'S' - 可以選擇由資料y從計算Spectrogram開始,也可以直接丟Spectrogram S 的magnitude\n", "# 'sr' - sampling rate\n", "# 若使用y當input會有以下做STFT的argument可以使用:\n", "# 'n_fft' - FFT點數,同時是window長度,這邊就一定要指定了,預設2048,可根據自己想看到的頻率範圍調整\n", "# 'hop_length' - 每個window間要跳多長\n", "# 'window' - windowing function,預設是 'hann'\n", "# 回傳單位是頻率 Hz\n", "mag_S = abs(googl_S)\n", "googl_sc = librosa.feature.spectral_centroid(S=mag_S, sr=sr)\n", "# N = 2048\n", "# H = 512\n", "# googl_sc = librosa.feature.spectral_centroid(y=googl_,\n", "# sr=sr,\n", "# n_fft=N,\n", "# hop_length=H,\n", "# window='hann' ) # 也可以從訊號開始" ] }, { "cell_type": "code", "execution_count": null, "id": "51c0b2f3", "metadata": { "id": "smaiX3ShUogx" }, "outputs": [], "source": [ "print(googl_S.shape, googl_sc.shape)\n", "print(googl_sc.max(), googl_sc.min())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "a06bae58", "metadata": { "id": "mkfxp4MAWt9B" }, "source": [ "可以跟spectrogram一起畫,比較著看\n", "\n", "當然,因為每個聲音有他的harmonic在裡面會干擾頻率波動的準度 (例如沒有聲音時,整體頻率會掉到比較低的地方),所以這個centroid還是當參考或者使用centroid來當新的feature提供訓練使用而已。" ] }, { "cell_type": "code", "execution_count": null, "id": "568c60d0", "metadata": { "id": "Y11aBaTyVb6Y" }, "outputs": [], "source": [ "times = librosa.times_like(googl_sc)\n", "fig, ax = plt.subplots()\n", "ldp.specshow(googl_S_db, sr=sr, x_axis=\"s\", y_axis=\"log\", cmap='jet')\n", "ax.plot(times, googl_sc.T, label='Spectral centroid', color='k')\n", "ax.legend(loc='upper right')" ] }, { "attachments": {}, "cell_type": "markdown", "id": "814ea0aa", "metadata": { "id": "5yM8lROpQ30x" }, "source": [ "以上是最常見做頻譜圖的方式。不過就像課堂中講的一樣會有頻率受到window function而有Spectral leakage的問題,並且也有頻率解析度與時間解析度的trade-off,在使用時須多加注意。" ] }, { "cell_type": "code", "execution_count": null, "id": "5e7ad487", "metadata": { "id": "TSuqO6iU27ZM" }, "outputs": [], "source": [ "# 這邊試一個Window比較大的情況,會使得頻率的解析度較高,但時間解析度會較差\n", "N = 8192\n", "H = 512\n", "\n", "googl_S = librosa.stft(y=googl_, n_fft=N, hop_length=H)\n", "mag_S = abs(googl_S)\n", "googl_sc = librosa.feature.spectral_centroid(S=mag_S, sr=sr)\n", "googl_S_db = librosa.power_to_db(abs(googl_S)) # **\n", "times = librosa.times_like(googl_sc)\n", "fig, ax = plt.subplots()\n", "ldp.specshow(googl_S_db, sr=sr, x_axis=\"s\", y_axis=\"log\", cmap='jet')\n", "ax.plot(times, googl_sc.T, label='Spectral centroid', color='k')\n", "ax.legend(loc='upper right')" ] }, { "attachments": {}, "cell_type": "markdown", "id": "5becdcbd", "metadata": { "id": "5_5ewDEh5tM7" }, "source": [ "## Mel-Spectrogram" ] }, { "attachments": {}, "cell_type": "markdown", "id": "9c77cf13", "metadata": { "id": "gSs06V2D4Xb8" }, "source": [ "在聲音訊號上,為模擬人類聽覺,常使用模擬人耳對頻率變化敏感度的Mel-Spectrogram來做時頻分析。\n", "\n", "其中會用到Mel-scale,是Herz經過log轉換後的單位,是因應頻率變化敏感度exponential上升的調整。" ] }, { "attachments": {}, "cell_type": "markdown", "id": "2c7225d1", "metadata": { "id": "ChOXBuAyA8hc" }, "source": [ "" ] }, { "attachments": {}, "cell_type": "markdown", "id": "1df16326", "metadata": { "id": "d9s-_hqvCGiF" }, "source": [ "在做完STFT後,對K'個等長頻段 (在Mel-Scale上等長) 進行triangle filter就是Mel-Spectrogram。\n", "\n", "使用的filter bank如下圖所示。\n", "\n", "e.g. 10 bins Mel filter bank for 65~1K Hz\n", "\n", "" ] }, { "attachments": {}, "cell_type": "markdown", "id": "ba0451a8", "metadata": { "id": "xBg3tWVEQ1Jh" }, "source": [ "使用```librosa.feature.melspectrogram```可以針對訊號計算Mel Spectrogram。" ] }, { "cell_type": "code", "execution_count": null, "id": "c1c4fc7c", "metadata": { "id": "APEh2IrK5Lmp" }, "outputs": [], "source": [ "# 使用librosa.feature.melspectrogram計算Mel-Spectrogram\n", "# 'y' or 'S' - 可以選擇由資料y從計算Spectrogram開始,也可以直接丟Spectrogram S 的magnitude\n", "# 'sr' - sampling rate\n", "# 'n_mels' - 要有多少個bin,就是前述的K'\n", "# 若使用y當input會有以下做STFT的argument可以使用:\n", "# 'n_fft' - FFT點數,同時是window長度,這邊就一定要指定了,預設2048,可根據自己想看到的頻率範圍調整\n", "# 'hop_length' - 每個window間要跳多長\n", "# 'window' - windowing function,預設是 'hann'\n", "# 回傳Mel-Spectrum,是實數矩陣,預設為float32的格式\n", "N = 2048\n", "H = 512\n", "K_ = 256 # 這Mel-spectrum的bin數一定會比前面N來得少,才有辦法做filter,不然中間會有很多頻段有缺值\n", "googl_mel_S = librosa.feature.melspectrogram(y=googl_,\n", " sr=sr,\n", " n_mels=K_,\n", " n_fft=N,\n", " hop_length=H,\n", " window='hann')" ] }, { "cell_type": "code", "execution_count": null, "id": "bab7b289", "metadata": { "id": "J1-TglWmETt3", "scrolled": true }, "outputs": [], "source": [ "# 可以看一些基本性質\n", "print(googl_mel_S.shape)\n", "print(googl_mel_S.min(), googl_mel_S.max())\n", "print(type(googl_mel_S[0, 0]))" ] }, { "attachments": {}, "cell_type": "markdown", "id": "2b113af6", "metadata": { "id": "6roHHBw6JoCn" }, "source": [ "在畫spectrogram時記得把y軸scaling換成mel (指的是input單位),這樣他才會幫忙轉回Hz,並且以Mel scaling排列顯示\n", "\n", "librosa設定的mel scale跟log scale會有稍微不同,較專注於較高頻的部分。" ] }, { "cell_type": "code", "execution_count": null, "id": "2ad77cd1", "metadata": { "id": "UuuN9yvyEogd" }, "outputs": [], "source": [ "googl_mel_S_db = librosa.power_to_db(googl_mel_S)\n", "plt.figure(figsize=(6, 5))\n", "ldp.specshow(googl_mel_S_db,\n", " sr=sr,\n", " x_axis=\"s\",\n", " y_axis=\"mel\",\n", " cmap=\"jet\") # **\n", "plt.colorbar(format=\"%+4.f\")\n", "plt.show()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "5f7c8bf4", "metadata": { "id": "hSdJkXW52L1s" }, "source": [ "可以看看到底Mel-Spectrum套用怎樣的filter bank" ] }, { "cell_type": "code", "execution_count": null, "id": "25f19a94", "metadata": { "id": "SXQ9_JGk14do" }, "outputs": [], "source": [ "mel_banks = librosa.filters.mel(n_fft=2048, sr=sr, n_mels=32)\n", "print(mel_banks.shape)" ] }, { "cell_type": "code", "execution_count": null, "id": "428c72ae", "metadata": { "id": "dLwEXTmH2Y-t" }, "outputs": [], "source": [ "plt.figure(figsize=(20, 5))\n", "ldp.specshow(mel_banks.T, sr=sr, y_axis=\"hz\")\n", "plt.colorbar(format=\"%+2.6f\")\n", "plt.xticks(range(32))\n", "plt.show()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "015eff36", "metadata": { "id": "T4VKJE0f2n1N" }, "source": [ "如同前面設定的一樣,每個頻段的filter的頻寬都不同,負責越高頻的filter (上圖後面的)頻寬較長。" ] }, { "attachments": {}, "cell_type": "markdown", "id": "eef8c2ed", "metadata": { "id": "EQY2zkC-R40U" }, "source": [ "## Mel-Frequency Cepstral Coefficient" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c74c78ba", "metadata": { "id": "x3de2RaASBzq" }, "source": [ "在人聲產生過程中,會經過各種器官的合成,其作用相當於經過一層層的filter。\n", "\n", "Mel-Frequency Cepstral Coefficient (MFCC) 來自於地震波的研究,可以用來分離個別filter的響應,或者也可以視為對頻譜進行頻率分析抓出新的feature。" ] }, { "attachments": {}, "cell_type": "markdown", "id": "839c66d5", "metadata": { "id": "UUgkkVDnV_x6" }, "source": [ "### Cepstrum" ] }, { "attachments": {}, "cell_type": "markdown", "id": "6a735bc0", "metadata": { "id": "GLuxMoyTUGIR" }, "source": [ "\n", "Cepstrum就是去做頻譜的頻譜:\n", "\n", "$C\\{x(t)\\}=IFT\\{\\log(FT\\{x(t)\\})\\}$\n", "\n", "\n", "\n", "通常在自然訊號中若有週期者,除非是十分完美的sinusoidal波形,在頻譜上都能看得到harmonics。那也就是說這個頻譜上也會具有一定週期性。\n", "\n", "Cepstrum的想法就是對log power spectrum進行IFT,這樣就能擷取在頻譜上的週期性特徵。\n", "\n", "(稍微說明下,如果對complex spectrum進行IFT則會變回原訊號)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "56450340", "metadata": { "id": "lgS74D_ITQ7p" }, "source": [ "為了做頻譜上的頻率分析,衍伸出了很多相關名詞列在下面,都是一些異體字:\n", "\n", "" ] }, { "attachments": {}, "cell_type": "markdown", "id": "6c9aaf4a", "metadata": { "id": "HsDYWwoHWFb6" }, "source": [ "### Mel-Frequency Cepstral Coefficient (MFCC)\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "235394a7", "metadata": { "id": "eUfK4HY1MYoP" }, "source": [ "MFCC是一種Cepstrum (倒頻譜),是從Mel-Spectrum建立的Cepstrum。\n", "就是在計算完Mel-Spectrum後取log乘以係數轉換成分貝,再做頻域的spectrum。\n", "\n", "這邊做spectrum的方式不是做iDFT,而是簡化為discrete cosine transform,沒有虛部,也是因為後來基本比較不看phase頻譜而是magnitude。\n", "\n", "DCT: (有各種方式計算,在這邊是舉個範例)\n", "* $X_k=\\sum_{N-1}^{n=0}{x_ncos({\\pi \\over N}(n+{1 \\over 2})k)}$\n", "\n", "https://en.wikipedia.org/wiki/Discrete_cosine_transform\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "id": "6ffe5a26", "metadata": { "id": "QN55A3K8SBSp" }, "outputs": [], "source": [ "# librosa.feature.mfcc計算MFCC\n", "# 'y' or 'S' - 可以選擇由資料y從計算Spectrogram開始,也可以直接丟Spectrogram S 的magnitude\n", "# 'sr' - sampling rate\n", "# 'n_mfcc' - 要有多少個bin,就是做iDFT的點數,常用39\n", "# 若使用y當input會有以下做STFT的argument可以使用:\n", "# 'n_fft' - FFT點數,同時是window長度,這邊就一定要指定了,預設2048,可根據自己想看到的頻率範圍調整\n", "# 'hop_length' - 每個window間要跳多長\n", "# 'window' - windowing function,預設是 'hann'\n", "# 回傳Mel-Spectrum,是實數矩陣,預設為float32的格式\n", "\n", "googl_mfcc = librosa.feature.mfcc(y=googl_,\n", " sr=sr,\n", " n_mfcc=39,\n", " n_fft=2048,\n", " hop_length=256)" ] }, { "cell_type": "code", "execution_count": null, "id": "4b805233", "metadata": { "id": "aAEn5ELSFqLt" }, "outputs": [], "source": [ "plt.figure(figsize=(6, 5))\n", "ldp.specshow(googl_mfcc[:, 0:300], x_axis=\"time\", cmap=\"jet\")\n", "plt.colorbar(format=\"%+4.f\")\n", "plt.show()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "13fdc078", "metadata": { "id": "T-05rIS6Pf9n" }, "source": [ "Cepstrum對頻譜做頻譜,其中第一個rhamonic出現的倒頻就已經代表了harmonics出現在頻譜中的週期,所以通常只看比較前面的倒頻率部分,因此就只有看39個bin而已。" ] }, { "cell_type": "code", "execution_count": null, "id": "24b108c6", "metadata": { "id": "xybP9uNnLT9U" }, "outputs": [], "source": [ "# 可進一步取MFCC對倒頻率空間的differencing。\n", "googl_delta_mfcc = librosa.feature.delta(googl_mfcc)\n", "plt.figure(figsize=(6, 5))\n", "ldp.specshow(googl_delta_mfcc[:, 0:300], x_axis=\"time\", cmap=\"jet\")\n", "plt.colorbar(format=\"%+4.f\")\n", "plt.show()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "f765bef2", "metadata": { "id": "qGxsCQ1dSuMC" }, "source": [ "以上就是在做ML之前可以做的音訊前處理,處理完之後可以使用DNN、CNN、RNN去做後續分類、迴歸、生成等任務。" ] }, { "attachments": {}, "cell_type": "markdown", "id": "cb150f99", "metadata": { "id": "yB5CjIAGQY1P" }, "source": [ "## Reference\n", "* M. Mueller, Fundamentals of Music Processing, Springer 2015\n", "* 前述課本附Notebook- https://www.audiolabs-erlangen.de/resources/MIR/FMP/C0/C0.html\n", "* Librosa官網- https://github.com/librosa/librosa\n", "* Acoustics for Musicians and Artists, by Miller Puckette, UCSD\n", "* Youtube 熱門音訊AI課程- https://github.com/musikalkemist/AudioSignalProcessingForML" ] }, { "cell_type": "code", "execution_count": null, "id": "6e2897fd", "metadata": { "id": "zAZEaEXtSqwm" }, "outputs": [], "source": [] } ], "metadata": { "colab": { "provenance": [] }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.12" } }, "nbformat": 4, "nbformat_minor": 5 }