{ "cells": [ { "cell_type": "markdown", "id": "86b47daa", "metadata": { "id": "86b47daa" }, "source": [ "# Unet\n", "source: https://amaarora.github.io/2020/09/13/unet.html\n", "\n", "![image](https://hackmd.io/_uploads/B1gnWeHdT.png)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "c9d4caf7", "metadata": { "id": "c9d4caf7" }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import tensorflow as tf\n", "from tensorflow.keras import Input, Model, Sequential, layers\n", "# import tensorflow_addons as tfa" ] }, { "cell_type": "code", "execution_count": null, "id": "22e481c6", "metadata": { "id": "22e481c6" }, "outputs": [], "source": [ "import warnings\n", "warnings.filterwarnings(\"ignore\")" ] }, { "cell_type": "code", "execution_count": null, "id": "07244fe9", "metadata": { "id": "07244fe9" }, "outputs": [], "source": [ "BATCH_SIZE = 32\n", "NUM_LABELS = 1\n", "WIDTH = 512\n", "HEIGHT = 512" ] }, { "cell_type": "markdown", "id": "c8527c9c", "metadata": { "id": "c8527c9c" }, "source": [ "## ConvBlock\n", "- 加入 Instance Norm.\n", "![image](https://hackmd.io/_uploads/BJkpZgSOa.png)\n", "\n", "> 上圖為一整個 batch 的 feature-map。輸入 6 張圖片,輸入 6 chs, 輸出也是 6 chs (C 方向看進去是 channel, N 方向看進去是圖片)" ] }, { "cell_type": "code", "execution_count": null, "id": "b2d05400", "metadata": { "id": "b2d05400" }, "outputs": [], "source": [ "class convBlock(layers.Layer):\n", " def __init__(self, out_ch, padding='same', kernel_size=3):\n", " super().__init__()\n", " kernel_size = kernel_size\n", "\n", " self.conv_1 = layers.Conv2D(out_ch, (kernel_size, kernel_size),\n", " strides=(1,1), padding=padding)\n", " self.relu = layers.Activation('relu')\n", " self.conv_2 = layers.Conv2D(out_ch, (kernel_size, kernel_size),\n", " strides=(1,1), padding=padding)\n", " # self.INorm = tfa.layers.InstanceNormalization(axis=3, center=True, scale=True)\n", "\n", " def call(self, input, training = None):\n", " x = self.conv_1(input)\n", " # x = self.INorm(x)\n", " x = self.relu(x)\n", " x = self.conv_2(x)\n", " # x = self.INorm(x)\n", " x = self.relu(x)\n", " return x" ] }, { "cell_type": "code", "execution_count": null, "id": "164f79e7", "metadata": { "id": "164f79e7" }, "outputs": [], "source": [ "block = convBlock(64)\n", "inputs = np.zeros((1, HEIGHT, WIDTH, 3), dtype=np.float32)\n", "block(inputs).shape" ] }, { "cell_type": "markdown", "id": "eba40565", "metadata": { "id": "eba40565" }, "source": [ "## Encoder (DownStream)\n", "將影像進行編碼,過程中解析度會縮小 (maxpooling、convolution)" ] }, { "cell_type": "code", "execution_count": null, "id": "b1c5ccef", "metadata": { "id": "b1c5ccef" }, "outputs": [], "source": [ "class Encoder(layers.Layer):\n", " def __init__(self, chs=(32, 64, 128, 256, 512), padding='same'):\n", " super().__init__()\n", " self.FPN_enc_ftrs = [convBlock(chs[i]) for i in range(len(chs))]\n", " self.pool = layers.MaxPooling2D(pool_size=(2, 2),\n", " strides=(2, 2), padding=padding)\n", "\n", " def call(self, x, training=None):\n", " features = []\n", " for block in self.FPN_enc_ftrs:\n", " x = block(x)\n", " features.append(x)\n", " x = self.pool(x)\n", " return features" ] }, { "cell_type": "code", "execution_count": null, "id": "12333849", "metadata": { "id": "12333849" }, "outputs": [], "source": [ "encoder = Encoder()\n", "inputs = np.zeros((1, HEIGHT, WIDTH, 3), dtype=np.float32)\n", "features = encoder(inputs)\n", "for f in features:\n", " print(f.shape)" ] }, { "cell_type": "markdown", "id": "af545e6a", "metadata": { "id": "af545e6a" }, "source": [ "## Decoder (UpStream)\n", "將編碼還原成影像,過程中解析度會放大直到回復成輸入影像解析度 (transposed Convolution)。\n", "- 將編碼還原成影像是因為影像分割是 pixel-wise 的精度進行預測,解析度被還原後,就可以知道指定 pixel 位置所對應的類別\n", "- 類別資訊通常用 feature-map 的 channels(chs) 去劃分,一個 channel 代表一個 class\n", "- 有許多 UNet 模型架構會有輸入 576x576,但輸出只有 388x388 的情況,是因為他們沒有對卷積過程做 padding,導致解析度自然下降。最後只要把 mask resize 到 388x388 就能繼續計算 loss。" ] }, { "cell_type": "markdown", "id": "e4b45ef7", "metadata": { "id": "e4b45ef7" }, "source": [ "### Transposed Conv and UpsampleConv\n", "\n", "Transposed Conv\n", "- 透過上面的操作做轉置卷積,feature-map 上的數值會作為常數與 kernel 相乘\n", "![image](https://hackmd.io/_uploads/B1I0ZgHOa.png)\n", "- 會導致 Gridding Effect (棋盤格效應)\n", "#### 替代方案 UpSampling(Unpooling)+Convolution\n", "- 先做上採樣 (Upsample/ Unpooling)\n", "- 然後作卷積 (padding = same)\n", "\n", "![棋盤格效應](https://hackmd.io/_uploads/HJqiUxBup.png)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "417e3e19", "metadata": { "id": "417e3e19" }, "outputs": [], "source": [ "# ConvTranspose2d 透過設定 k=2, s=2, output_padding=0 可以讓影像從 28x28 變成 56x56\n", "\n", "x = np.zeros((1, 28, 28, 3), dtype=np.float32)\n", "x = layers.Conv2DTranspose(30, kernel_size=(2, 2),\n", " strides=(2, 2), padding='valid')(x)\n", "x.shape" ] }, { "cell_type": "code", "execution_count": null, "id": "e517210a", "metadata": { "id": "e517210a" }, "outputs": [], "source": [ "class UpSampleConvs(layers.Layer):\n", " def __init__(self, out_ch, padding='same'):\n", " super().__init__()\n", " self.conv = layers.Conv2D(out_ch, (3, 3),\n", " strides=(1, 1), padding=padding)\n", " self.relu = layers.Activation('relu')\n", " self.upSample = layers.UpSampling2D(size=2)\n", "# self.INorm = tfa.layers.InstanceNormalization(axis=3,\n", "# center=True,\n", "# scale=True)\n", "\n", " def call(self, x):\n", " x = self.upSample(x)\n", " x = self.conv(x)\n", " # x = self.INorm(x)\n", " x = self.relu(x)\n", " return x" ] }, { "cell_type": "code", "execution_count": null, "id": "722610e9", "metadata": { "id": "722610e9" }, "outputs": [], "source": [ "x = np.zeros((1, 28, 28, 3), dtype=np.float32)\n", "x = UpSampleConvs(30)(x)\n", "print(x.shape)" ] }, { "cell_type": "markdown", "id": "a5aa087c", "metadata": { "id": "a5aa087c" }, "source": [ "### decoder (上採樣) module" ] }, { "cell_type": "code", "execution_count": null, "id": "f2c44fc7", "metadata": { "id": "f2c44fc7" }, "outputs": [], "source": [ "class Decoder(layers.Layer):\n", " def __init__(self, chs=(256, 128, 64, 32), padding='same'):\n", " super().__init__()\n", "\n", " self.chs = chs\n", " self.padding = padding\n", " # 上採樣後卷積\n", " self.upconvs = [UpSampleConvs(chs[i], padding=padding)\n", " for i in range(len(chs))]\n", " self.FPN_dec_ftrs = [convBlock(chs[i], padding=padding)\n", " for i in range(len(chs))]\n", "\n", " def call(self, x, encoder_features):\n", " for i in range(len(self.chs)):\n", " enc_ftrs = encoder_features[i]\n", " x = self.upconvs[i](x)\n", "\n", " # enc_ftrs = self.crop(encoder_features[i], x)\n", " x = layers.Concatenate(axis=-1)([x, enc_ftrs])\n", " x = self.FPN_dec_ftrs[i](x)\n", " return x\n", "\n", " def crop(self, enc_ftrs, x):\n", " _, H, W, _ = x.shape\n", " enc_ftrs = layers.CenterCrop(H, W)(enc_ftrs)\n", " return enc_ftrs" ] }, { "cell_type": "code", "execution_count": null, "id": "85cea334", "metadata": { "id": "85cea334" }, "outputs": [], "source": [ "decoder = Decoder()\n", "decoder\n", "x = np.zeros((1, HEIGHT//16, WIDTH//16, 512), dtype=np.float32)\n", "print(decoder(x, features[::-1][1:]).shape)" ] }, { "cell_type": "markdown", "id": "9427e6b3", "metadata": { "id": "9427e6b3" }, "source": [ "## Unet 構建\n", "結合 encoder 和 decoder 組成 Unet。\n", "- 在輸出層如果用 softmax 做多元分類問題預測的話,類別數量要 +1 (num_classes+background)" ] }, { "cell_type": "code", "execution_count": null, "id": "62ebae1c", "metadata": { "id": "62ebae1c" }, "outputs": [], "source": [ "class UNet(Model):\n", " def __init__(self, enc_chs=(64, 128, 256, 512, 1024),\n", " dec_chs=(512, 256, 128, 64),\n", " num_class=1, padding='same',\n", " retain_dim=None, activation=None):\n", " super().__init__()\n", " self.encoder = Encoder(enc_chs, padding=padding)\n", " self.decoder = Decoder(dec_chs, padding=padding)\n", " self.head = layers.Conv2D(num_class, (1, 1),\n", " strides=(1, 1), padding=padding)\n", " self.retain_dim = retain_dim\n", " self.activation = activation\n", "\n", " def call(self, inputs):\n", " enc_ftrs = self.encoder(inputs)\n", " # 把不同尺度的所有 featuremap 都輸入 decoder,我們在 decoder 需要做 featuremap 的拼接\n", " outputs = self.decoder(enc_ftrs[::-1][0], enc_ftrs[::-1][1:])\n", " outputs = self.head(outputs)\n", "\n", " if self.retain_dim:\n", " outputs = tf.image.resize(outputs,\n", " self.retain_dim,\n", " method='nearest')\n", "\n", " if self.activation:\n", " outputs = self.activation(outputs)\n", "\n", " return outputs" ] }, { "cell_type": "code", "execution_count": null, "id": "e6e19c1d", "metadata": { "id": "e6e19c1d" }, "outputs": [], "source": [ "unet = UNet(num_class=2, padding='same', retain_dim=(WIDTH, HEIGHT))\n", "x = np.zeros((1, WIDTH, HEIGHT, 3), dtype=np.float32)\n", "y_pred = unet(x)\n", "print(y_pred.shape)" ] }, { "cell_type": "code", "execution_count": null, "id": "34b28b39", "metadata": { "id": "34b28b39" }, "outputs": [], "source": [] } ], "metadata": { "accelerator": "GPU", "colab": { "provenance": [] }, "gpuClass": "standard", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.12" } }, "nbformat": 4, "nbformat_minor": 5 }