{
  "cells": [
    {
      "cell_type": "markdown",
      "id": "86b47daa",
      "metadata": {
        "id": "86b47daa"
      },
      "source": [
        "# Unet\n",
        "source: https://amaarora.github.io/2020/09/13/unet.html\n",
        "\n",
        "![image](https://hackmd.io/_uploads/B1gnWeHdT.png)\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "c9d4caf7",
      "metadata": {
        "id": "c9d4caf7"
      },
      "outputs": [],
      "source": [
        "import numpy as np\n",
        "import pandas as pd\n",
        "import tensorflow as tf\n",
        "from tensorflow.keras import Input, Model, Sequential, layers\n",
        "# import tensorflow_addons as tfa"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "22e481c6",
      "metadata": {
        "id": "22e481c6"
      },
      "outputs": [],
      "source": [
        "import warnings\n",
        "warnings.filterwarnings(\"ignore\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "07244fe9",
      "metadata": {
        "id": "07244fe9"
      },
      "outputs": [],
      "source": [
        "BATCH_SIZE = 32\n",
        "NUM_LABELS = 1\n",
        "WIDTH = 512\n",
        "HEIGHT = 512"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "c8527c9c",
      "metadata": {
        "id": "c8527c9c"
      },
      "source": [
        "## ConvBlock\n",
        "- 加入 Instance Norm.\n",
        "![image](https://hackmd.io/_uploads/BJkpZgSOa.png)\n",
        "\n",
        "> 上圖為一整個 batch 的 feature-map。輸入 6 張圖片，輸入 6 chs, 輸出也是 6 chs (C 方向看進去是 channel, N 方向看進去是圖片)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "b2d05400",
      "metadata": {
        "id": "b2d05400"
      },
      "outputs": [],
      "source": [
        "class convBlock(layers.Layer):\n",
        "    def __init__(self, out_ch, padding='same', kernel_size=3):\n",
        "        super().__init__()\n",
        "        kernel_size = kernel_size\n",
        "\n",
        "        self.conv_1 = layers.Conv2D(out_ch, (kernel_size, kernel_size),\n",
        "                                    strides=(1,1), padding=padding)\n",
        "        self.relu  = layers.Activation('relu')\n",
        "        self.conv_2 = layers.Conv2D(out_ch, (kernel_size, kernel_size),\n",
        "                                    strides=(1,1), padding=padding)\n",
        "        # self.INorm = tfa.layers.InstanceNormalization(axis=3, center=True, scale=True)\n",
        "\n",
        "    def call(self, input, training = None):\n",
        "        x = self.conv_1(input)\n",
        "        # x = self.INorm(x)\n",
        "        x = self.relu(x)\n",
        "        x = self.conv_2(x)\n",
        "        # x = self.INorm(x)\n",
        "        x = self.relu(x)\n",
        "        return x"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "164f79e7",
      "metadata": {
        "id": "164f79e7"
      },
      "outputs": [],
      "source": [
        "block = convBlock(64)\n",
        "inputs = np.zeros((1, HEIGHT, WIDTH, 3), dtype=np.float32)\n",
        "block(inputs).shape"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "eba40565",
      "metadata": {
        "id": "eba40565"
      },
      "source": [
        "## Encoder (DownStream)\n",
        "將影像進行編碼，過程中解析度會縮小 (maxpooling、convolution)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "b1c5ccef",
      "metadata": {
        "id": "b1c5ccef"
      },
      "outputs": [],
      "source": [
        "class Encoder(layers.Layer):\n",
        "    def __init__(self, chs=(32, 64, 128, 256, 512), padding='same'):\n",
        "        super().__init__()\n",
        "        self.FPN_enc_ftrs = [convBlock(chs[i]) for i in range(len(chs))]\n",
        "        self.pool = layers.MaxPooling2D(pool_size=(2, 2),\n",
        "                                        strides=(2, 2), padding=padding)\n",
        "\n",
        "    def call(self, x, training=None):\n",
        "        features = []\n",
        "        for block in self.FPN_enc_ftrs:\n",
        "            x = block(x)\n",
        "            features.append(x)\n",
        "            x = self.pool(x)\n",
        "        return features"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "12333849",
      "metadata": {
        "id": "12333849"
      },
      "outputs": [],
      "source": [
        "encoder = Encoder()\n",
        "inputs = np.zeros((1, HEIGHT, WIDTH, 3), dtype=np.float32)\n",
        "features = encoder(inputs)\n",
        "for f in features:\n",
        "    print(f.shape)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "af545e6a",
      "metadata": {
        "id": "af545e6a"
      },
      "source": [
        "## Decoder (UpStream)\n",
        "將編碼還原成影像，過程中解析度會放大直到回復成輸入影像解析度 (transposed Convolution)。\n",
        "- 將編碼還原成影像是因為影像分割是 pixel-wise 的精度進行預測，解析度被還原後，就可以知道指定 pixel 位置所對應的類別\n",
        "- 類別資訊通常用 feature-map 的 channels(chs) 去劃分，一個 channel 代表一個 class\n",
        "- 有許多 UNet 模型架構會有輸入 576x576，但輸出只有 388x388 的情況，是因為他們沒有對卷積過程做 padding，導致解析度自然下降。最後只要把 mask resize 到 388x388 就能繼續計算 loss。"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "e4b45ef7",
      "metadata": {
        "id": "e4b45ef7"
      },
      "source": [
        "### Transposed Conv and UpsampleConv\n",
        "\n",
        "Transposed Conv\n",
        "- 透過上面的操作做轉置卷積，feature-map 上的數值會作為常數與 kernel 相乘\n",
        "![image](https://hackmd.io/_uploads/B1I0ZgHOa.png)\n",
        "- 會導致 Gridding Effect (棋盤格效應)\n",
        "#### 替代方案 UpSampling(Unpooling)+Convolution\n",
        "- 先做上採樣 (Upsample/ Unpooling)\n",
        "- 然後作卷積 (padding = same)\n",
        "\n",
        "![棋盤格效應](https://hackmd.io/_uploads/HJqiUxBup.png)\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "417e3e19",
      "metadata": {
        "id": "417e3e19"
      },
      "outputs": [],
      "source": [
        "# ConvTranspose2d 透過設定 k=2, s=2, output_padding=0 可以讓影像從 28x28 變成 56x56\n",
        "\n",
        "x = np.zeros((1, 28, 28, 3), dtype=np.float32)\n",
        "x = layers.Conv2DTranspose(30, kernel_size=(2, 2),\n",
        "                           strides=(2, 2), padding='valid')(x)\n",
        "x.shape"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "e517210a",
      "metadata": {
        "id": "e517210a"
      },
      "outputs": [],
      "source": [
        "class UpSampleConvs(layers.Layer):\n",
        "    def __init__(self, out_ch, padding='same'):\n",
        "        super().__init__()\n",
        "        self.conv = layers.Conv2D(out_ch, (3, 3),\n",
        "                                  strides=(1, 1), padding=padding)\n",
        "        self.relu = layers.Activation('relu')\n",
        "        self.upSample = layers.UpSampling2D(size=2)\n",
        "#         self.INorm = tfa.layers.InstanceNormalization(axis=3,\n",
        "#                                                       center=True,\n",
        "#                                                       scale=True)\n",
        "\n",
        "    def call(self, x):\n",
        "        x = self.upSample(x)\n",
        "        x = self.conv(x)\n",
        "        # x = self.INorm(x)\n",
        "        x = self.relu(x)\n",
        "        return x"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "722610e9",
      "metadata": {
        "id": "722610e9"
      },
      "outputs": [],
      "source": [
        "x = np.zeros((1, 28, 28, 3), dtype=np.float32)\n",
        "x = UpSampleConvs(30)(x)\n",
        "print(x.shape)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "a5aa087c",
      "metadata": {
        "id": "a5aa087c"
      },
      "source": [
        "### decoder (上採樣) module"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "f2c44fc7",
      "metadata": {
        "id": "f2c44fc7"
      },
      "outputs": [],
      "source": [
        "class Decoder(layers.Layer):\n",
        "    def __init__(self, chs=(256, 128, 64, 32), padding='same'):\n",
        "        super().__init__()\n",
        "\n",
        "        self.chs = chs\n",
        "        self.padding = padding\n",
        "        # 上採樣後卷積\n",
        "        self.upconvs = [UpSampleConvs(chs[i], padding=padding)\n",
        "                        for i in range(len(chs))]\n",
        "        self.FPN_dec_ftrs = [convBlock(chs[i], padding=padding)\n",
        "                             for i in range(len(chs))]\n",
        "\n",
        "    def call(self, x, encoder_features):\n",
        "        for i in range(len(self.chs)):\n",
        "            enc_ftrs = encoder_features[i]\n",
        "            x = self.upconvs[i](x)\n",
        "\n",
        "            # enc_ftrs = self.crop(encoder_features[i], x)\n",
        "            x = layers.Concatenate(axis=-1)([x, enc_ftrs])\n",
        "            x = self.FPN_dec_ftrs[i](x)\n",
        "        return x\n",
        "\n",
        "    def crop(self, enc_ftrs, x):\n",
        "        _, H, W, _ = x.shape\n",
        "        enc_ftrs = layers.CenterCrop(H, W)(enc_ftrs)\n",
        "        return enc_ftrs"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "85cea334",
      "metadata": {
        "id": "85cea334"
      },
      "outputs": [],
      "source": [
        "decoder = Decoder()\n",
        "decoder\n",
        "x = np.zeros((1, HEIGHT//16, WIDTH//16, 512), dtype=np.float32)\n",
        "print(decoder(x, features[::-1][1:]).shape)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "9427e6b3",
      "metadata": {
        "id": "9427e6b3"
      },
      "source": [
        "## Unet 構建\n",
        "結合 encoder 和 decoder 組成 Unet。\n",
        "- 在輸出層如果用 softmax 做多元分類問題預測的話，類別數量要 +1 (num_classes+background)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "62ebae1c",
      "metadata": {
        "id": "62ebae1c"
      },
      "outputs": [],
      "source": [
        "class UNet(Model):\n",
        "    def __init__(self, enc_chs=(64, 128, 256, 512, 1024),\n",
        "                 dec_chs=(512, 256, 128, 64),\n",
        "                 num_class=1, padding='same',\n",
        "                 retain_dim=None, activation=None):\n",
        "        super().__init__()\n",
        "        self.encoder = Encoder(enc_chs, padding=padding)\n",
        "        self.decoder = Decoder(dec_chs, padding=padding)\n",
        "        self.head = layers.Conv2D(num_class, (1, 1),\n",
        "                                  strides=(1, 1), padding=padding)\n",
        "        self.retain_dim = retain_dim\n",
        "        self.activation = activation\n",
        "\n",
        "    def call(self, inputs):\n",
        "        enc_ftrs = self.encoder(inputs)\n",
        "        # 把不同尺度的所有 featuremap 都輸入 decoder，我們在 decoder 需要做 featuremap 的拼接\n",
        "        outputs = self.decoder(enc_ftrs[::-1][0], enc_ftrs[::-1][1:])\n",
        "        outputs = self.head(outputs)\n",
        "\n",
        "        if self.retain_dim:\n",
        "            outputs = tf.image.resize(outputs,\n",
        "                                      self.retain_dim,\n",
        "                                      method='nearest')\n",
        "\n",
        "        if self.activation:\n",
        "            outputs = self.activation(outputs)\n",
        "\n",
        "        return outputs"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "e6e19c1d",
      "metadata": {
        "id": "e6e19c1d"
      },
      "outputs": [],
      "source": [
        "unet = UNet(num_class=2, padding='same', retain_dim=(WIDTH, HEIGHT))\n",
        "x = np.zeros((1, WIDTH, HEIGHT, 3), dtype=np.float32)\n",
        "y_pred = unet(x)\n",
        "print(y_pred.shape)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "34b28b39",
      "metadata": {
        "id": "34b28b39"
      },
      "outputs": [],
      "source": []
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "provenance": []
    },
    "gpuClass": "standard",
    "kernelspec": {
      "display_name": "Python 3 (ipykernel)",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.12"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}