{ "cells": [ { "cell_type": "markdown", "id": "98289536", "metadata": { "id": "98289536" }, "source": [ "# **過擬合(Overfitting)**\n", "從模型調校當中了解分別需要查看訓練集以及驗證集的模型表現結果,然而在驗證集上若沒有如訓練集表現的,其中一個可能發生的原因即是模型過擬合在訓練集上,此份程式碼會介紹在過擬合情況產生時,如何在模型上做抑制的手段。\n", "\n", "## 本章節內容大綱\n", "* ### [Regularization](#Regularization)\n", "* ### [Early Stopping](#EarlyStopping)\n", "* ### [Dropout](#Dropout)\n", "* ### [Parameter Initialization](#ParameterInitialization)\n", "* ### [Batch Normalization](#BatchNormalization)\n", "-----------------" ] }, { "cell_type": "markdown", "id": "7648b4fe", "metadata": { "id": "7648b4fe" }, "source": [ "## 匯入套件" ] }, { "cell_type": "code", "execution_count": null, "id": "80780447", "metadata": { "id": "80780447" }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "from tqdm.auto import tqdm\n", "\n", "# PyTorch 相關套件\n", "import torch\n", "import torch.nn as nn\n", "import torch.nn.functional as F" ] }, { "cell_type": "markdown", "id": "39d096f5", "metadata": { "id": "39d096f5" }, "source": [ "## 創建資料集/載入資料集(Dataset Creating / Loading)" ] }, { "cell_type": "code", "source": [ "# 上傳資料\n", "!wget -q https://github.com/TA-aiacademy/course_3.0/releases/download/DL/Data_part3.zip\n", "!unzip -q Data_part3.zip" ], "metadata": { "id": "3tzFaHNZu6GR" }, "id": "3tzFaHNZu6GR", "execution_count": null, "outputs": [] }, { "cell_type": "code", "execution_count": null, "id": "0942624d", "metadata": { "id": "0942624d" }, "outputs": [], "source": [ "train_df = pd.read_csv('./Data/News_train.csv')\n", "test_df = pd.read_csv('./Data/News_test.csv')" ] }, { "cell_type": "code", "execution_count": null, "id": "772c8373", "metadata": { "id": "772c8373" }, "outputs": [], "source": [ "train_df.head()" ] }, { "cell_type": "code", "execution_count": null, "id": "b167f802", "metadata": { "id": "b167f802" }, "outputs": [], "source": [ "X_df = train_df.iloc[:, :-1].values\n", "y_df = train_df.y_category.values" ] }, { "cell_type": "code", "execution_count": null, "id": "1adce20a", "metadata": { "id": "1adce20a" }, "outputs": [], "source": [ "X_test = test_df.iloc[:, :-1].values\n", "y_test = test_df.y_category.values" ] }, { "cell_type": "markdown", "id": "3e803aaa", "metadata": { "id": "3e803aaa" }, "source": [ "## 資料前處理(Data Preprocessing)" ] }, { "cell_type": "code", "execution_count": null, "id": "e096abf4", "metadata": { "id": "e096abf4" }, "outputs": [], "source": [ "from sklearn.preprocessing import StandardScaler, MinMaxScaler\n", "# Feature scaling\n", "sc = StandardScaler()\n", "X_scale = sc.fit_transform(X_df, y_df)\n", "X_test_scale = sc.transform(X_test)" ] }, { "cell_type": "code", "execution_count": null, "id": "b693ebeb", "metadata": { "id": "b693ebeb" }, "outputs": [], "source": [ "# train, valid/test dataset split\n", "from sklearn.model_selection import train_test_split\n", "X_train, X_valid, y_train, y_valid = train_test_split(X_scale, y_df,\n", " test_size=0.2,\n", " random_state=5566,\n", " stratify=y_df)" ] }, { "cell_type": "code", "execution_count": null, "id": "0dc32d3f", "metadata": { "id": "0dc32d3f" }, "outputs": [], "source": [ "print(f'X_train shape: {X_train.shape}')\n", "print(f'X_valid shape: {X_valid.shape}')\n", "print(f'y_train shape: {y_train.shape}')\n", "print(f'y_valid shape: {y_valid.shape}')" ] }, { "cell_type": "code", "source": [ "# build dataset and dataloader\n", "train_ds = torch.utils.data.TensorDataset(torch.tensor(X_train, dtype=torch.float32),\n", " torch.tensor(y_train, dtype=torch.long))\n", "valid_ds = torch.utils.data.TensorDataset(torch.tensor(X_valid, dtype=torch.float32),\n", " torch.tensor(y_valid, dtype=torch.long))\n", "test_ds = torch.utils.data.TensorDataset(torch.tensor(X_test_scale, dtype=torch.float32),\n", " torch.tensor(y_test, dtype=torch.long))\n", "\n", "BATCH_SIZE = 64\n", "train_loader = torch.utils.data.DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True)\n", "valid_loader = torch.utils.data.DataLoader(valid_ds, batch_size=BATCH_SIZE)\n", "test_loader = torch.utils.data.DataLoader(test_ds, batch_size=BATCH_SIZE)" ], "metadata": { "id": "68zV8HasUPDH" }, "id": "68zV8HasUPDH", "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "id": "36fa6c65", "metadata": { "id": "36fa6c65" }, "source": [ "## 模型建置(Model Building)" ] }, { "cell_type": "code", "execution_count": null, "id": "3aa4c3be", "metadata": { "id": "3aa4c3be" }, "outputs": [], "source": [ "NUM_CLASS = 11\n", "\n", "def build_model(input_shape, num_class):\n", " torch.manual_seed(5566)\n", " model = nn.Sequential(\n", " nn.Linear(input_shape, 64),\n", " nn.Tanh(),\n", " nn.Linear(64, 64),\n", " nn.Tanh(),\n", " nn.Linear(64, num_class),\n", " )\n", " return model" ] }, { "cell_type": "code", "execution_count": null, "id": "def4ac11", "metadata": { "id": "def4ac11" }, "outputs": [], "source": [ "model = build_model(X_train.shape[1], NUM_CLASS)\n", "print(model)" ] }, { "cell_type": "markdown", "id": "8a8c0484", "metadata": { "id": "8a8c0484" }, "source": [ "## 模型訓練(Model Training)" ] }, { "cell_type": "code", "source": [ "optimizer = torch.optim.NAdam(model.parameters(), lr=0.001)\n", "loss_fn = nn.CrossEntropyLoss() # 多元分類損失函數\n", "\n", "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n", "print(f'device: {device}')\n", "model = model.to(device)" ], "metadata": { "id": "woTo09ykaEg2" }, "id": "woTo09ykaEg2", "execution_count": null, "outputs": [] }, { "cell_type": "code", "execution_count": null, "id": "04261902", "metadata": { "id": "04261902" }, "outputs": [], "source": [ "def train_epoch(model, optimizer, loss_fn, train_dataloader, val_dataloader):\n", " # 訓練一輪\n", " model.train()\n", " total_train_loss = 0\n", " total_train_correct = 0\n", " for x, y in tqdm(train_dataloader, leave=False):\n", " optimizer.zero_grad() # 梯度歸零\n", " x, y = x.to(device), y.to(device) # 將資料移至GPU\n", " y_pred = model(x) # 計算預測值\n", " loss = loss_fn(y_pred, y) # 計算誤差\n", " loss.backward() # 反向傳播計算梯度\n", " optimizer.step() # 更新模型參數\n", " total_train_loss += loss.item()\n", " # 利用argmax計算最大值是第n個類別,與解答比對是否相同\n", " total_train_correct += ((y_pred.argmax(dim=1) == y).sum().item())\n", "\n", " avg_train_loss = total_train_loss / len(train_dataloader)\n", " avg_train_acc = total_train_correct / len(train_dataloader.dataset)\n", "\n", " return avg_train_loss, avg_train_acc\n", "\n", "def test_epoch(model, loss_fn, val_dataloader):\n", " # 驗證一輪\n", " model.eval()\n", " total_val_loss = 0\n", " total_val_correct = 0\n", " # 關閉梯度計算以加速\n", " with torch.no_grad():\n", " for x, y in val_dataloader:\n", " x, y = x.to(device), y.to(device)\n", " y_pred = model(x)\n", " loss = loss_fn(y_pred, y)\n", " total_val_loss += loss.item()\n", " # 利用argmax計算最大值是第n個類別,與解答比對是否相同\n", " total_val_correct += ((y_pred.argmax(dim=1) == y).sum().item())\n", "\n", " avg_val_loss = total_val_loss / len(val_dataloader)\n", " avg_val_acc = total_val_correct / len(val_dataloader.dataset)\n", "\n", " return avg_val_loss, avg_val_acc\n", "\n", "def run(model, optimizer, loss_fn, train_loader, valid_loader, verbose=1):\n", " train_loss_log = []\n", " val_loss_log = []\n", " train_acc_log = []\n", " val_acc_log = []\n", " for epoch in tqdm(range(20)):\n", " avg_train_loss, avg_train_acc = train_epoch(model, optimizer, loss_fn, train_loader, valid_loader)\n", " avg_val_loss, avg_val_acc = test_epoch(model, loss_fn, valid_loader)\n", " train_loss_log.append(avg_train_loss)\n", " val_loss_log.append(avg_val_loss)\n", " train_acc_log.append(avg_train_acc)\n", " val_acc_log.append(avg_val_acc)\n", " if verbose == 1:\n", " print(f'Epoch: {epoch}, Train Loss: {avg_train_loss:.3f}, Val Loss: {avg_val_loss:.3f} | Train Acc: {avg_train_acc:.3f}, Val Acc: {avg_val_acc:.3f}')\n", " return train_loss_log, train_acc_log, val_loss_log, val_acc_log" ] }, { "cell_type": "code", "execution_count": null, "id": "66c1b3ef", "metadata": { "id": "66c1b3ef" }, "outputs": [], "source": [ "train_loss_log, train_acc_log, val_loss_log, val_acc_log = run(model, optimizer, loss_fn, train_loader, valid_loader)" ] }, { "cell_type": "markdown", "id": "c26459a6", "metadata": { "id": "c26459a6" }, "source": [ "## 模型評估(Model Evaluation)" ] }, { "cell_type": "code", "execution_count": null, "id": "23a78ed7", "metadata": { "id": "23a78ed7" }, "outputs": [], "source": [ "plt.figure(figsize=(15, 4))\n", "plt.subplot(1, 2, 1)\n", "plt.plot(range(len(train_loss_log)), train_loss_log, label='train_loss')\n", "plt.plot(range(len(val_loss_log)), val_loss_log, label='valid_loss')\n", "plt.xlabel('Epochs')\n", "plt.ylabel('Binary crossentropy')\n", "plt.legend()\n", "\n", "plt.subplot(1, 2, 2)\n", "plt.plot(range(len(train_acc_log)), train_acc_log, label='train_acc')\n", "plt.plot(range(len(val_acc_log)), val_acc_log, label='valid_acc')\n", "plt.xlabel('Epochs')\n", "plt.ylabel('Accuracy')\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "id": "50017993", "metadata": { "id": "50017993" }, "outputs": [], "source": [ "# Print the results of testing data\n", "print('============================')\n", "print('Testing data')\n", "print('============================')\n", "test_loss, test_acc = test_epoch(model, loss_fn, test_loader)\n", "print(f'loss: {test_loss}')\n", "print(f'acc: {test_acc}')" ] }, { "cell_type": "markdown", "id": "c2d332c7", "metadata": { "id": "c2d332c7" }, "source": [ "## 過擬合抑制策略" ] }, { "cell_type": "markdown", "id": "a4127de0", "metadata": { "id": "a4127de0" }, "source": [ "![](https://hackmd.io/_uploads/B1rmk5Ubp.png)\n" ] }, { "cell_type": "markdown", "id": "6ed71fa3", "metadata": { "id": "6ed71fa3" }, "source": [ "\n", "* ## Regularization\n", "" ] }, { "cell_type": "code", "source": [ "# L1, L2 regularization 計算\n", "def add_regularization(loss, model, l1_alpha=0, l2_alpha=0):\n", " l1_alpha = float(l1_alpha)\n", " l2_alpha = float(l2_alpha)\n", " l1_norm = sum(torch.linalg.norm(p, ord=1) for p in model.parameters())\n", " l2_norm = sum(p.pow(2).sum() for p in model.parameters())\n", " regularization = l1_alpha * l1_norm + l2_alpha * l2_norm\n", " return loss + regularization\n", "\n", "# 加入 L1, L2 regularization 計算\n", "def train_epoch(model, optimizer, loss_fn, train_dataloader, l1_alpha=0, l2_alpha=0):\n", " # 訓練一輪\n", " model.train()\n", " total_train_loss = 0\n", " total_train_correct = 0\n", " for x, y in tqdm(train_dataloader, leave=False):\n", " x, y = x.to(device), y.to(device) # 將資料移至GPU\n", " y_pred = model(x) # 計算預測值\n", " loss = loss_fn(y_pred, y) # 計算誤差\n", " loss = add_regularization(loss, model, l1_alpha, l2_alpha) # L1, L2 regularization\n", " optimizer.zero_grad() # 梯度歸零\n", " loss.backward() # 反向傳播計算梯度\n", " optimizer.step() # 更新模型參數\n", "\n", " total_train_loss += loss.item()\n", " # 利用argmax計算最大值是第n個類別,與解答比對是否相同\n", " total_train_correct += ((y_pred.argmax(dim=1) == y).sum().item())\n", "\n", " avg_train_loss = total_train_loss / len(train_dataloader)\n", " avg_train_acc = total_train_correct / len(train_dataloader.dataset)\n", "\n", " return avg_train_loss, avg_train_acc\n", "\n", "def test_epoch(model, loss_fn, val_dataloader, l1_alpha=0, l2_alpha=0):\n", " # 驗證一輪\n", " model.eval()\n", " total_val_loss = 0\n", " total_val_correct = 0\n", " # 關閉梯度計算以加速\n", " with torch.no_grad():\n", " for x, y in val_dataloader:\n", " x, y = x.to(device), y.to(device)\n", " y_pred = model(x)\n", " loss = loss_fn(y_pred, y)\n", " loss = add_regularization(loss, model, l1_alpha, l2_alpha) # L1, L2 regularization\n", " total_val_loss += loss.item()\n", " # 利用argmax計算最大值是第n個類別,與解答比對是否相同\n", " total_val_correct += ((y_pred.argmax(dim=1) == y).sum().item())\n", "\n", " avg_val_loss = total_val_loss / len(val_dataloader)\n", " avg_val_acc = total_val_correct / len(val_dataloader.dataset)\n", "\n", " return avg_val_loss, avg_val_acc\n", "\n", "def run(epochs, model, optimizer, loss_fn, train_loader, valid_loader, l1_alpha, l2_alpha, verbose=1):\n", " train_loss_log = []\n", " train_acc_log = []\n", " val_loss_log = []\n", " val_acc_log = []\n", " for epoch in tqdm(range(epochs)):\n", " avg_train_loss, avg_train_acc = train_epoch(model, optimizer, loss_fn, train_loader, l1_alpha, l2_alpha)\n", " avg_val_loss, avg_val_acc = test_epoch(model, loss_fn, valid_loader, l1_alpha, l2_alpha)\n", " train_loss_log.append(avg_train_loss)\n", " train_acc_log.append(avg_train_acc)\n", " val_loss_log.append(avg_val_loss)\n", " val_acc_log.append(avg_val_acc)\n", " if verbose == 1:\n", " print(f'Epoch: {epoch}, Train Loss: {avg_train_loss:.3f}, Val Loss: {avg_val_loss:.3f} | Train Acc: {avg_train_acc:.3f}, Val Acc: {avg_val_acc:.3f}')\n", " return train_loss_log, train_acc_log, val_loss_log, val_acc_log" ], "metadata": { "id": "xUYSr24kcleL" }, "id": "xUYSr24kcleL", "execution_count": null, "outputs": [] }, { "cell_type": "code", "execution_count": null, "id": "c145bd62", "metadata": { "id": "c145bd62" }, "outputs": [], "source": [ "# 以下放置要比較的 regularizer 數值\n", "l1_l2_list = [(0, 0), (1e-3, 0), (0, 1e-2), (1e-3, 1e-2)]\n", "\n", "# 建立兩個 list 記錄選用不同 regularizer 數值的訓練結果\n", "train_loss_list = []\n", "train_acc_list = []\n", "\n", "# 建立兩個 list 記錄選用不同 regularizer 數值的驗證結果\n", "valid_loss_list = []\n", "valid_acc_list = []\n", "\n", "# 建立一個 list 紀錄選用不同 regularizer 數值的測試結果\n", "test_eval = []\n", "\n", "# 迭代不同的 regularizer 數值去訓練模型\n", "for l1_alpha, l2_alpha in tqdm(l1_l2_list):\n", " print('Training a model with regularizer L1: {}, L2: {}'\n", " .format(l1_alpha, l2_alpha))\n", "\n", " # 確保每次都是訓練新的模型,而不是接續上一輪的模型\n", " model = build_model(X_train.shape[1], NUM_CLASS)\n", " model = model.to(device)\n", " optimizer = torch.optim.NAdam(model.parameters(), lr=0.001)\n", " loss_fn = nn.CrossEntropyLoss()\n", " history = run(20, model, optimizer, loss_fn, train_loader, valid_loader,\n", " l1_alpha, l2_alpha,\n", " verbose=0)\n", " # 將訓練過程記錄下來\n", " train_loss_list.append(history[0])\n", " train_acc_list.append(history[1])\n", " valid_loss_list.append(history[2])\n", " valid_acc_list.append(history[3])\n", " test_eval.append(test_epoch(model, loss_fn, test_loader,\n", " l1_alpha, l2_alpha))\n", "print('----------------- training done! -----------------')" ] }, { "cell_type": "code", "execution_count": null, "id": "5f2b62a2", "metadata": { "id": "5f2b62a2" }, "outputs": [], "source": [ "# 視覺化訓練過程\n", "plt.figure(figsize=(15, 7))\n", "\n", "train_line = ()\n", "valid_line = ()\n", "\n", "# 繪製 Training loss\n", "plt.subplot(121)\n", "for k in range(len(l1_l2_list)):\n", " l1, l2 = l1_l2_list[k]\n", " loss = train_loss_list[k]\n", " val_loss = valid_loss_list[k]\n", " train_l = plt.plot(\n", " range(len(loss)), loss,\n", " label=f'Training L1: {l1}, L2: {l2}')\n", " valid_l = plt.plot(\n", " range(len(val_loss)), val_loss, '--',\n", " label=f'Validation L1: {l1}, L2: {l2}')\n", "\n", " train_line += tuple(train_l)\n", " valid_line += tuple(valid_l)\n", "plt.title('Loss')\n", "\n", "# 繪製 Training accuracy\n", "plt.subplot(122)\n", "train_acc_line = []\n", "valid_acc_line = []\n", "for k in range(len(l1_l2_list)):\n", " l1, l2 = l1_l2_list[k]\n", " acc = train_acc_list[k]\n", " val_acc = valid_acc_list[k]\n", " plt.plot(range(len(acc)), acc,\n", " label=f'Training L1: {l1}, L2: {l2}')\n", " plt.plot(range(len(val_acc)), val_acc, '--',\n", " label=f'Validation L1: {l1}, L2: {l2}')\n", "plt.title('Accuracy')\n", "\n", "first_legend = plt.legend(handles=train_line,\n", " bbox_to_anchor=(1.05, 1))\n", "\n", "plt.gca().add_artist(first_legend)\n", "plt.legend(handles=valid_line,\n", " bbox_to_anchor=(1.05, 0.8))\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "id": "800bec37", "metadata": { "id": "800bec37" }, "outputs": [], "source": [ "# Print the results of testing data\n", "for k in range(len(l1_l2_list)):\n", " print('============================')\n", " print(f'(l1, l2) = {l1_l2_list[k]}')\n", " print('============================')\n", " print(f'loss: {test_eval[k][0]}')\n", " print(f'acc: {test_eval[k][1]}\\n')" ] }, { "cell_type": "markdown", "id": "16ee28bf", "metadata": { "id": "16ee28bf" }, "source": [ "\n", "* ## Early Stopping" ] }, { "cell_type": "code", "execution_count": null, "id": "63c5ef65", "metadata": { "id": "63c5ef65" }, "outputs": [], "source": [ "model = build_model(X_train.shape[1], NUM_CLASS)\n", "model = model.to(device)\n", "optimizer = torch.optim.NAdam(model.parameters(), lr=0.001)\n", "loss_fn = nn.CrossEntropyLoss()" ] }, { "cell_type": "code", "execution_count": null, "id": "479bd6a9", "metadata": { "id": "479bd6a9" }, "outputs": [], "source": [ "# 加入earlystop\n", "def run(epochs, model, optimizer, loss_fn,\n", " train_loader, valid_loader,\n", " l1_alpha=0, l2_alpha=0,\n", " early_stop=True,\n", " n_patience=5, # 訓練過程經過 n_patience 次沒有進步之後停止\n", " verbose=1):\n", " train_loss_log = []\n", " train_acc_log = []\n", " val_loss_log = []\n", " val_acc_log = []\n", " # early stoping 設定\n", " best_val_loss = float('inf')\n", " patience = 0\n", "\n", " for epoch in tqdm(range(epochs)):\n", " avg_train_loss, avg_train_acc = train_epoch(model, optimizer, loss_fn, train_loader, l1_alpha, l2_alpha)\n", " avg_val_loss, avg_val_acc = test_epoch(model, loss_fn, valid_loader, l1_alpha, l2_alpha)\n", " train_loss_log.append(avg_train_loss)\n", " train_acc_log.append(avg_train_acc)\n", " val_loss_log.append(avg_val_loss)\n", " val_acc_log.append(avg_val_acc)\n", " if verbose == 1:\n", " print(f'Epoch: {epoch}, Train Loss: {avg_train_loss:.3f}, Val Loss: {avg_val_loss:.3f} | Train Acc: {avg_train_acc:.3f}, Val Acc: {avg_val_acc:.3f}')\n", " if early_stop:\n", " # Early stopping檢查\n", " if avg_val_loss < best_val_loss:\n", " best_val_loss = avg_val_loss\n", " patience = 0\n", " else:\n", " patience += 1\n", " if patience >= n_patience:\n", " print(f'Early stopping at epoch {epoch}')\n", " break\n", " return train_loss_log, train_acc_log, val_loss_log, val_acc_log" ] }, { "cell_type": "code", "source": [ "train_loss, train_acc, valid_loss, valid_acc = run(20, model, optimizer, loss_fn, train_loader, valid_loader,\n", " l1_alpha=0, l2_alpha=0)" ], "metadata": { "id": "_6RmY6tZBD3b" }, "id": "_6RmY6tZBD3b", "execution_count": null, "outputs": [] }, { "cell_type": "code", "execution_count": null, "id": "2acf3b51", "metadata": { "id": "2acf3b51" }, "outputs": [], "source": [ "plt.figure(figsize=(15, 4))\n", "plt.subplot(1, 2, 1)\n", "plt.plot(range(len(train_loss)), train_loss, label='train_loss')\n", "plt.plot(range(len(valid_loss)), valid_loss, label='valid_loss')\n", "plt.xlabel('Epochs')\n", "plt.ylabel('Loss')\n", "plt.legend()\n", "\n", "plt.subplot(1, 2, 2)\n", "plt.plot(range(len(train_acc)), train_acc, label='train_acc')\n", "plt.plot(range(len(valid_acc)), valid_acc, label='valid_acc')\n", "plt.xlabel('Epochs')\n", "plt.ylabel('Accuracy')\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "id": "655d7982", "metadata": { "id": "655d7982" }, "outputs": [], "source": [ "# Print the results of testing data\n", "print('============================')\n", "print('Testing data')\n", "print('============================')\n", "test_loss, test_acc = test_epoch(model, loss_fn, test_loader)\n", "print(f'loss: {test_loss}')\n", "print(f'acc : {test_acc}')" ] }, { "cell_type": "markdown", "id": "1b321f9a", "metadata": { "id": "1b321f9a" }, "source": [ "\n", "* ## Dropout\n", "![](https://hackmd.io/_uploads/HJePycUba.png)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "95f89986", "metadata": { "id": "95f89986" }, "outputs": [], "source": [ "NUM_CLASS = 11\n", "\n", "def build_model_dropout(input_shape, num_class, droprate):\n", " torch.manual_seed(5566)\n", " model = nn.Sequential(\n", " nn.Linear(input_shape, 64),\n", " nn.Dropout(droprate),\n", " nn.Tanh(),\n", " nn.Linear(64, 64),\n", " nn.Dropout(droprate),\n", " nn.Tanh(),\n", " nn.Linear(64, num_class),\n", " )\n", " return model" ] }, { "cell_type": "code", "execution_count": null, "id": "b629990f", "metadata": { "id": "b629990f" }, "outputs": [], "source": [ "# 以下放置要比較的 dropout rate\n", "dropout_rates = [0, 0.1, 0.3, 0.5]\n", "\n", "# 建立兩個 list 記錄選用不同 dropout rate 的訓練結果\n", "train_loss_list = []\n", "train_acc_list = []\n", "\n", "# 建立兩個 list 記錄選用不同 dropout rate 的驗證結果\n", "valid_loss_list = []\n", "valid_acc_list = []\n", "\n", "# 建立一個 list 紀錄選用不同 dropout rate 數值的測試結果\n", "test_eval = []\n", "\n", "# 迭代不同的 dropout rate 去訓練模型\n", "for drop_r in dropout_rates:\n", " print('Training a model with dropout rate: {}'\n", " .format(drop_r))\n", "\n", " # 確保每次都是訓練新的模型,而不是接續上一輪的模型\n", " model = build_model_dropout(X_train.shape[1],\n", " NUM_CLASS,\n", " drop_r)\n", " model = model.to(device)\n", " optimizer = torch.optim.NAdam(model.parameters(), lr=0.001)\n", " loss_fn = nn.CrossEntropyLoss()\n", "\n", " # 確保每次都設定一樣的參數\n", " history = run(20, model, optimizer, loss_fn, train_loader, valid_loader,\n", " early_stop=False,\n", " verbose=1)\n", "\n", " # 將訓練結果記錄下來\n", " train_loss_list.append(history[0])\n", " train_acc_list.append(history[1])\n", " valid_loss_list.append(history[2])\n", " valid_acc_list.append(history[3])\n", " test_eval.append(test_epoch(model, loss_fn, test_loader))\n", "print('----------------- training done! -----------------')" ] }, { "cell_type": "code", "execution_count": null, "id": "7c313846", "metadata": { "id": "7c313846" }, "outputs": [], "source": [ "# 視覺化訓練過程\n", "plt.figure(figsize=(15, 7))\n", "\n", "train_line = ()\n", "valid_line = ()\n", "\n", "# 繪製 Training loss\n", "plt.subplot(121)\n", "for k in range(len(dropout_rates)):\n", " loss = train_loss_list[k]\n", " val_loss = valid_loss_list[k]\n", " train_l = plt.plot(\n", " range(len(loss)), loss,\n", " label=f'Training dropout rate:{dropout_rates[k]}')\n", " valid_l = plt.plot(\n", " range(len(val_loss)), val_loss, '--',\n", " label=f'Validation dropout rate:{dropout_rates[k]}')\n", "\n", " train_line += tuple(train_l)\n", " valid_line += tuple(valid_l)\n", "plt.title('Loss')\n", "\n", "# 繪製 Training accuracy\n", "plt.subplot(122)\n", "train_acc_line = []\n", "valid_acc_line = []\n", "for k in range(len(dropout_rates)):\n", " acc = train_acc_list[k]\n", " val_acc = valid_acc_list[k]\n", " plt.plot(range(len(acc)), acc,\n", " label=f'Training dropout rate:{dropout_rates[k]}')\n", " plt.plot(range(len(val_acc)), val_acc, '--',\n", " label=f'Validation dropout rate:{dropout_rates[k]}')\n", "plt.title('Accuracy')\n", "\n", "first_legend = plt.legend(handles=train_line,\n", " bbox_to_anchor=(1.05, 1))\n", "\n", "plt.gca().add_artist(first_legend)\n", "plt.legend(handles=valid_line,\n", " bbox_to_anchor=(1.05, 0.8))\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "id": "21014c14", "metadata": { "id": "21014c14" }, "outputs": [], "source": [ "# Print the results of testing data\n", "for k in range(len(dropout_rates)):\n", " print('============================')\n", " print(f'dropout_rate = {dropout_rates[k]}')\n", " print('============================')\n", " print(f'loss: {test_eval[k][0]}')\n", " print(f'acc: {test_eval[k][1]}\\n')" ] }, { "cell_type": "markdown", "id": "b6710c34", "metadata": { "id": "b6710c34" }, "source": [ "\n", "* ## Parameter Initialization\n", "torch.nn.init: https://pytorch.org/docs/stable/nn.init.html" ] }, { "cell_type": "code", "execution_count": null, "id": "f8f68a5c", "metadata": { "id": "f8f68a5c" }, "outputs": [], "source": [ "import functools\n", "import math\n", "\n", "def lecun_normal_(tensor: torch.Tensor) -> torch.Tensor:\n", " # Assuming that the weights' input dimension is the last.\n", " input_size = tensor.shape[-1]\n", " std = math.sqrt(1/input_size)\n", " with torch.no_grad():\n", " return tensor.normal_(-std,std)\n", "\n", "# 針對不同網路層採取不同初始化方法\n", "def weights_init(m, init_fn):\n", " if isinstance(m, nn.Linear):\n", " init_fn(m.weight)\n", " torch.nn.init.zeros_(m.bias)\n", "\n", "def build_model_init(input_shape, num_class, init_fn):\n", " torch.manual_seed(5566)\n", " model = nn.Sequential(\n", " nn.Linear(input_shape, 64),\n", " nn.Tanh(),\n", " nn.Linear(64, 64),\n", " nn.Tanh(),\n", " nn.Linear(64, num_class),\n", " )\n", " model.apply(functools.partial(weights_init, init_fn=init_fn))\n", " return model" ] }, { "cell_type": "code", "execution_count": null, "id": "636543a7", "metadata": { "id": "636543a7" }, "outputs": [], "source": [ "# 以下放置要比較的 initializer\n", "init_l = [\n", " torch.nn.init.xavier_normal_, # glorot init\n", " torch.nn.init.kaiming_normal_, # he init\n", " lecun_normal_,\n", " torch.nn.init.normal_,\n", " torch.nn.init.trunc_normal_,\n", "]\n", "\n", "# 建立兩個 list 記錄選用不同 initializer 的訓練結果\n", "train_loss_list = []\n", "train_acc_list = []\n", "\n", "# 建立兩個 list 記錄選用不同 initializer 的驗證結果\n", "valid_loss_list = []\n", "valid_acc_list = []\n", "\n", "# 建立一個 list 紀錄選用不同 initializer 數值的測試結果\n", "test_eval = []\n", "\n", "# 迭代不同的 initializer 去訓練模型\n", "for init in init_l:\n", " print(f'Training model, init = {init}')\n", "\n", " # 確保每次都是訓練新的模型,而不是接續上一輪的模型\n", " model = build_model_init(X_train.shape[1],\n", " NUM_CLASS,\n", " init)\n", " model = model.to(device)\n", " optimizer = torch.optim.NAdam(model.parameters(), lr=0.001)\n", " loss_fn = nn.CrossEntropyLoss()\n", " history = run(20, model, optimizer, loss_fn, train_loader, valid_loader,\n", " early_stop=False,\n", " verbose=0)\n", "\n", " # 將訓練結果記錄下來\n", " train_loss_list.append(history[0])\n", " train_acc_list.append(history[1])\n", " valid_loss_list.append(history[2])\n", " valid_acc_list.append(history[3])\n", " test_eval.append(test_epoch(model, loss_fn, test_loader))\n", "print('----------------- training done! -----------------')" ] }, { "cell_type": "code", "execution_count": null, "id": "1ac5fe3d", "metadata": { "id": "1ac5fe3d" }, "outputs": [], "source": [ "# 視覺化訓練過程\n", "plt.figure(figsize=(15, 7))\n", "\n", "train_line = ()\n", "valid_line = ()\n", "\n", "# 繪製 Training loss\n", "plt.subplot(121)\n", "for k in range(len(init_l)):\n", " loss = train_loss_list[k]\n", " val_loss = valid_loss_list[k]\n", " train_l = plt.plot(\n", " range(len(loss)), loss,\n", " label=f'Training init: {init_l[k]}')\n", " valid_l = plt.plot(\n", " range(len(val_loss)), val_loss, '--',\n", " label=f'Validation init: {init_l[k]}')\n", "\n", " train_line += tuple(train_l)\n", " valid_line += tuple(valid_l)\n", "plt.title('Loss')\n", "\n", "# 繪製 Training accuracy\n", "plt.subplot(122)\n", "train_acc_line = []\n", "valid_acc_line = []\n", "for k in range(len(init_l)):\n", " acc = train_acc_list[k]\n", " val_acc = valid_acc_list[k]\n", " plt.plot(range(len(acc)), acc,\n", " label=f'Training init: {init_l[k]}')\n", " plt.plot(range(len(val_acc)), val_acc, '--',\n", " label=f'Validation init: {init_l[k]}')\n", "plt.title('Accuracy')\n", "\n", "first_legend = plt.legend(handles=train_line,\n", " bbox_to_anchor=(1.05, 1))\n", "\n", "plt.gca().add_artist(first_legend)\n", "plt.legend(handles=valid_line,\n", " bbox_to_anchor=(1.05, 0.75))\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "id": "2c7840e9", "metadata": { "id": "2c7840e9" }, "outputs": [], "source": [ "# Print the results of testing data\n", "for k in range(len(init_l)):\n", " print('============================')\n", " print(f'initializer = {init_l[k]}')\n", " print('============================')\n", " print(f'loss: {test_eval[k][0]}')\n", " print(f'acc: {test_eval[k][1]}\\n')" ] }, { "cell_type": "markdown", "id": "4aed4aa8", "metadata": { "id": "4aed4aa8" }, "source": [ "\n", "* ## Batch Normalization" ] }, { "cell_type": "code", "execution_count": null, "id": "b88dfdec", "metadata": { "id": "b88dfdec" }, "outputs": [], "source": [ "class LinearBN(nn.Module):\n", " def __init__(self, in_features, out_features, bn=True):\n", " super().__init__()\n", " self.linear = nn.Linear(in_features, out_features)\n", " if bn:\n", " self.bn = nn.BatchNorm1d(out_features)\n", " else:\n", " self.bn = nn.Identity()\n", " self.act = nn.Tanh()\n", " def forward(self, x):\n", " return self.act(self.bn(self.linear(x)))\n", "\n", "\n", "def build_model_bn(input_shape, num_class, bn=True):\n", " torch.manual_seed(5566)\n", " model = nn.Sequential(\n", " LinearBN(input_shape, 64, bn),\n", " LinearBN(64, 64, bn),\n", " nn.Linear(64, num_class),\n", " )\n", " return model" ] }, { "cell_type": "code", "execution_count": null, "id": "adf4cb5e", "metadata": { "id": "adf4cb5e" }, "outputs": [], "source": [ "BN = [False, True]\n", "\n", "# 建立兩個 list 記錄是否加入 BatchNormalization 的訓練結果\n", "train_loss_list = []\n", "train_acc_list = []\n", "\n", "# 建立兩個 list 記錄是否加入 BatchNormalization 的驗證結果\n", "valid_loss_list = []\n", "valid_acc_list = []\n", "\n", "# 建立一個 list 紀錄是否加入 BatchNormalization 的測試結果\n", "test_eval = []\n", "\n", "# 迭代是否加入 BatchNormalization 去訓練模型\n", "for bn in BN:\n", " print('Training a model with BatchNormalization: {}'\n", " .format(str(bn)))\n", "\n", " # 確保每次都是訓練新的模型,而不是接續上一輪的模型\n", " model = build_model_bn(X_train.shape[1], NUM_CLASS, bn)\n", " model = model.to(device)\n", " optimizer = torch.optim.NAdam(model.parameters(), lr=0.001)\n", " loss_fn = nn.CrossEntropyLoss()\n", " history = run(20, model, optimizer, loss_fn, train_loader, valid_loader,\n", " early_stop=False,\n", " verbose=0)\n", "\n", " train_loss_list.append(history[0])\n", " train_acc_list.append(history[1])\n", " valid_loss_list.append(history[2])\n", " valid_acc_list.append(history[3])\n", " test_eval.append(test_epoch(model, loss_fn, test_loader))\n", "print('----------------- training done! -----------------')" ] }, { "cell_type": "code", "execution_count": null, "id": "f2699c27", "metadata": { "id": "f2699c27" }, "outputs": [], "source": [ "# 視覺化訓練過程\n", "plt.figure(figsize=(15, 7))\n", "\n", "train_line = ()\n", "valid_line = ()\n", "\n", "# 繪製 Training loss\n", "plt.subplot(121)\n", "for k in range(len(BN)):\n", " loss = train_loss_list[k]\n", " val_loss = valid_loss_list[k]\n", " train_l = plt.plot(\n", " range(len(loss)), loss,\n", " label=f'Training BatchNormalization:{str(BN[k])}')\n", " valid_l = plt.plot(\n", " range(len(val_loss)), val_loss, '--',\n", " label=f'Validation BatchNormalization:{str(BN[k])}')\n", "\n", " train_line += tuple(train_l)\n", " valid_line += tuple(valid_l)\n", "plt.title('Loss')\n", "\n", "# 繪製 Training accuracy\n", "plt.subplot(122)\n", "train_acc_line = []\n", "valid_acc_line = []\n", "for k in range(len(BN)):\n", " acc = train_acc_list[k]\n", " val_acc = valid_acc_list[k]\n", " plt.plot(range(len(acc)), acc,\n", " label=f'Training BatchNormalization:{str(BN[k])}')\n", " plt.plot(range(len(val_acc)), val_acc, '--',\n", " label=f'Validation BatchNormalization:{str(BN[k])}')\n", "plt.title('Accuracy')\n", "\n", "first_legend = plt.legend(handles=train_line,\n", " bbox_to_anchor=(1.05, 1))\n", "\n", "plt.gca().add_artist(first_legend)\n", "plt.legend(handles=valid_line,\n", " bbox_to_anchor=(1.05, 0.75))\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "id": "93b68689", "metadata": { "id": "93b68689" }, "outputs": [], "source": [ "# Print the results of testing data\n", "for k in range(len(BN)):\n", " print('============================')\n", " print(f'BatchNormalization = {BN[k]}')\n", " print('============================')\n", " print(f'loss: {test_eval[k][0]}')\n", " print(f'acc: {test_eval[k][1]}\\n')" ] }, { "cell_type": "markdown", "id": "8883e69e", "metadata": { "id": "8883e69e" }, "source": [ "---\n", "### Quiz\n", "請試著利用 Data/pkgo_train.csv 做多元分類問題,預測五個種類的 pokemon,並使用 Data/pkgo_test.csv 驗證結果。\n", "\n", "若出現 Overfitting 的情況,嘗試使用以上抑制 Overfitting 的方法調整訓練模型的策略。" ] }, { "cell_type": "code", "execution_count": null, "id": "dd2ab8ac", "metadata": { "id": "dd2ab8ac" }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.12" }, "colab": { "provenance": [] }, "accelerator": "GPU", "gpuClass": "standard" }, "nbformat": 4, "nbformat_minor": 5 }