{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "view-in-github", "colab_type": "text" }, "source": [ "\"Open" ] }, { "cell_type": "markdown", "metadata": { "id": "maiBHYDsAycF" }, "source": [ "# StyleGAN" ] }, { "cell_type": "markdown", "metadata": { "id": "OhSQeY2tAycU" }, "source": [ "### 本章節內容大綱\n", "* [StyleGAN](#StyleGAN)\n", "* [StyleGAN in Anime dataset (by Gwern)](#StyleGAN-in-Anime-dataset-(by-Gwern))" ] }, { "cell_type": "markdown", "metadata": { "id": "XKX-EhQAAycX" }, "source": [ "這個章節要來 demo 2019年初才釋出 weight 的 StyleGAN,本次的教學是由 https://github.com/NVlabs/stylegan clone下來的 weight 以及 code 再作一些修改,由於該模型是用 tf1.X 版本訓練的,故助教在這邊有修改一些版本的細節,學員如果要在本機端上clone github,記得使用tf 1.X版本來使用,或是將這份教材複製到本機端用tf2.0跑也是可以的。" ] }, { "cell_type": "markdown", "metadata": { "id": "xcgM4usGAyca" }, "source": [ "StyleGAN Generator的架構,可以理解為前面幾層是在勾勒輪廓,後面是在畫精細的細節。\n", "\n", "\n", "\n", "\n", "\n", "StyleGAN 承襲了 ProgressiveGAN 的 Discriminator,基本上也是用PatchGAN的概念。\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "YM7J-tpdAycj" }, "outputs": [], "source": [ "# 上傳資料\n", "!wget -q https://github.com/TA-aiacademy/course_3.0/releases/download/v2.5_gan/GAN_part4.zip\n", "!unzip -q GAN_part4.zip" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "-U9UUylSAycd" }, "outputs": [], "source": [ "import os\n", "import pickle\n", "import numpy as np\n", "import PIL.Image\n", "import dnnlib\n", "import dnnlib.tflib as tflib\n", "import matplotlib.pyplot as plt\n", "\n", "import imageio\n", "import glob\n", "from IPython.display import display, Image\n", "import cv2" ] }, { "cell_type": "markdown", "metadata": { "id": "cDGJ7MauAych" }, "source": [ "### 讀入產生高畫質人臉圖片的權重" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "E_r2M3LDAycl" }, "outputs": [], "source": [ "url = 'cache/2019stylegan-ffhq-1024x1024_mod.pkl'\n", "\n", "tflib.init_tf()\n", "with open(url, 'rb') as f:\n", " _G, _D, Gs = pickle.load(f)" ] }, { "cell_type": "markdown", "metadata": { "id": "J9A71hbmAycn" }, "source": [ "### 產生隨機的圖片" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "nuR445fJAycq" }, "outputs": [], "source": [ "# 隨機sample一組潛在向量(latent vector)來產生圖片\n", "rnd = np.random.RandomState(420)\n", "latents = rnd.randn(1, Gs.input_shape[1])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "IEwYLMnUAyct" }, "outputs": [], "source": [ "# Generate image\n", "fmt = dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True)\n", "images = Gs.run(latents, None, truncation_psi=0.7, randomize_noise=True, output_transform=fmt)\n", "\n", "plt.figure(figsize=(10,10))\n", "plt.imshow(images[-1])\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "id": "lYjOmnuaAycw" }, "source": [ "### 試著一次改變vector中的一個element來看看有什麼變化吧" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Tfx7gYpuAycx" }, "outputs": [], "source": [ "# 先固定一個值都是 1 的向量\n", "latents = np.ones((1, Gs.input_shape[1]))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "rjYp09h2Aycz" }, "outputs": [], "source": [ "# 產生圖片\n", "fmt = dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True)\n", "images = Gs.run(latents, None, truncation_psi=0.7, randomize_noise=True, output_transform=fmt)\n", "\n", "plt.figure(figsize=(10, 10))\n", "plt.imshow(images[-1])\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true }, "id": "wiFINIRjAyc0" }, "outputs": [], "source": [ "save_path = './exp_img/lat_18' # 向量總共有 512 維,如果想要改變第18維就修改成 lat_18\n", "\n", "if not os.path.exists(save_path):\n", " os.makedirs(save_path)\n", "\n", "ind = int(save_path.split('_')[-1])\n", "\n", "for i in np.arange(-15, 16, 0.5): # 每次改變 0.5 的值看看\n", " latents[0][ind] = i\n", " fmt = dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True)\n", " images = Gs.run(latents, None, truncation_psi=0.7, randomize_noise=True, output_transform=fmt)\n", "\n", " plt.figure(figsize=(5, 5))\n", " plt.imshow(images[-1])\n", "\n", " plt.savefig(os.path.join(save_path, 'image_{:03f}.png'.format(i)))\n", "\n", " plt.show()" ] }, { "cell_type": "markdown", "metadata": { "id": "6WxsIcqsAyc2" }, "source": [ "下面的gif可以方便我們觀察改動不同的 element ,會對於 output 有什麼影響" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "JtaoxlzUAyc3" }, "outputs": [], "source": [ "# 使用imageio製作gif圖\n", "anim_file = save_path + '/anim.gif'\n", "\n", "with imageio.get_writer(anim_file, mode='I') as writer:\n", "\n", " filenames = glob.glob(save_path + '/image*.png')\n", "# filenames = sorted(filenames)\n", " filenames.sort(key=lambda x: os.path.getmtime(x))\n", "\n", " last = -1\n", " for i, filename in enumerate(filenames):\n", " frame = 2*(i**0.5)\n", " if round(frame) > round(last):\n", " last = frame\n", " else:\n", " continue\n", " image = imageio.imread(filename)\n", " writer.append_data(image)\n", " image = imageio.imread(filename)\n", " writer.append_data(image)\n", "\n", "display(Image(filename=anim_file))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "aoGntZyCAyc6" }, "outputs": [], "source": [ "# change the first element in latent vector\n", "display(Image(filename='./exp_img/lat_0/anim.gif'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "FN9oCJlUAyc8" }, "outputs": [], "source": [ "# change the 256th element in latent vector\n", "display(Image(filename='./exp_img/lat_255/anim.gif'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "qAEVAvegAyc9" }, "outputs": [], "source": [ "# change the last element in latent vector\n", "display(Image(filename='./exp_img/lat_511/anim.gif'))" ] }, { "cell_type": "markdown", "metadata": { "id": "gE7_06o9Ayc-" }, "source": [ "# 風格混合(style mixing)" ] }, { "cell_type": "markdown", "metadata": { "id": "-rhc2716Ayc_" }, "source": [ "StyleGAN 不只像是一般的 GAN 能隨機生成一張逼真的圖片,因為它一層層疊加的結構,讓它有辦法可以做各種不同細緻度的風格轉換。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "osQWl_BTAydA" }, "outputs": [], "source": [ "def draw_style_mixing_figure(png, Gs, w, h, src_seeds, dst_seeds, style_ranges):\n", " print(png)\n", " src_latents = np.stack(np.random.RandomState(seed).randn(Gs.input_shape[1]) for seed in src_seeds)\n", " dst_latents = np.stack(np.random.RandomState(seed).randn(Gs.input_shape[1]) for seed in dst_seeds)\n", " src_dlatents = Gs.components.mapping.run(src_latents, None) # [seed, layer, component]\n", " dst_dlatents = Gs.components.mapping.run(dst_latents, None) # [seed, layer, component]\n", " src_images = Gs.components.synthesis.run(src_dlatents, randomize_noise=False, **synthesis_kwargs)\n", " dst_images = Gs.components.synthesis.run(dst_dlatents, randomize_noise=False, **synthesis_kwargs)\n", "\n", " canvas = PIL.Image.new('RGB', (w * (len(src_seeds) + 1), h * (len(dst_seeds) + 1)), 'white')\n", " for col, src_image in enumerate(list(src_images)):\n", " canvas.paste(PIL.Image.fromarray(src_image, 'RGB'), ((col + 1) * w, 0))\n", " for row, dst_image in enumerate(list(dst_images)):\n", " canvas.paste(PIL.Image.fromarray(dst_image, 'RGB'), (0, (row + 1) * h))\n", " row_dlatents = np.stack([dst_dlatents[row]] * len(src_seeds))\n", " row_dlatents[:, style_ranges[row]] = src_dlatents[:, style_ranges[row]]\n", " row_images = Gs.components.synthesis.run(row_dlatents, randomize_noise=False, **synthesis_kwargs)\n", " for col, image in enumerate(list(row_images)):\n", " canvas.paste(PIL.Image.fromarray(image, 'RGB'), ((col + 1) * w, (row + 1) * h))\n", " canvas.save(png)\n", "\n", "synthesis_kwargs = dict(output_transform=dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True),\n", " minibatch_size=8)\n", "\n", "result_dir = 'results'\n", "if not os.path.exists(result_dir):\n", " os.makedirs(result_dir)\n", "\n", "draw_style_mixing_figure(os.path.join(result_dir, 'style-mixing-human8.png'), Gs, w=1024, h=1024,\n", " src_seeds=[639, 701, 687, 615, 2268],\n", " dst_seeds=[888, 829, 1898, 1733, 1614, 845, 1450, 2266],\n", " style_ranges=[range(0, 4)]*2+[range(4, 8)]*2+[range(8, 12)]*2+[range(12, 16)]*2)" ] }, { "cell_type": "markdown", "metadata": { "id": "shWs4hY2AydB" }, "source": [ "### 轉換結果\n", "下圖的第一個 row 是隨機產生的來源圖片(source image),第一個 column 是隨機產生的目標圖片(destination image),透過將目標圖片的部分層的「中層潛在向量」(intermediate latent vector)替換成來源圖片的向量層,就可以達到變換風格的效果。下面的例子是以兩個 row 為一單位,每單位分別是變換第 0-3 層的向量、4-7 層...到第 15 層,每四個層去取代的結果,可以發現前幾層的改變幅度很大,會把整個臉型跟面向都改成另一個風格,然而後面幾層可能開始只改變五官、到最後幾層只改變整個色調細節而已。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Wrm0vfDrAydC" }, "outputs": [], "source": [ "mix_img = cv2.imread('results/style-mixing-human8.png')\n", "plt.figure(figsize=(15, 25))\n", "plt.imshow(cv2.cvtColor(mix_img, cv2.COLOR_BGR2RGB))\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "id": "gHrep_GNAydD" }, "source": [ "# StyleGAN in Anime dataset (by Gwern)" ] }, { "cell_type": "markdown", "metadata": { "id": "oJSb5nsHAydE" }, "source": [ "看到 Nvidia 釋出如此強大的模型,各路大神也紛紛來試玩看看,而這位 Gwern 用爬蟲抓了一堆動漫的角色圖,前處理後丟進模型訓練,下面是他釋出的pre-train weight,有興趣的學員也可以玩玩看,連結如下:https://www.gwern.net/Faces#" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "hwclZ5LZAydF" }, "outputs": [], "source": [ "url = 'cache/2019-04-30-stylegan-danbooru2018-portraits-02095-066083_mod.pkl'\n", "\n", "tflib.init_tf()\n", "with open(url, 'rb') as f:\n", " _G, _D, Gs = pickle.load(f)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "aX2nHNJtAydF" }, "outputs": [], "source": [ "# 產生 0 向量\n", "latents = np.zeros((1, Gs.input_shape[1]))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "EMkf7ntYAydG" }, "outputs": [], "source": [ "# Generate image.\n", "fmt = dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True)\n", "images = Gs.run(latents, None, truncation_psi=0.5, randomize_noise=True, output_transform=fmt)\n", "\n", "plt.figure(figsize=(10,10))\n", "plt.imshow(images[-1])\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "BPRg9FT4AydI" }, "outputs": [], "source": [ "save_path = './exp_img_anim/lat_255' # 向量總共有 512 維,如果想要改變第255維就修改成 lat_255\n", "\n", "plt.ioff() # 用這個 method 就能不要把圖 plot 出來\n", "\n", "if not os.path.exists(save_path):\n", " os.makedirs(save_path)\n", "\n", "\n", "ind = int(save_path.split('_')[-1])\n", "\n", "for i in np.arange(-0.004,0.0041,0.0001):\n", " latents[0][ind] = i\n", " fmt = dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True)\n", " images = Gs.run(latents, None, truncation_psi=0.7, randomize_noise=False, output_transform=fmt)\n", "\n", " plt.figure(figsize=(5, 5))\n", " plt.imshow(images[-1])\n", "\n", " plt.savefig(os.path.join(save_path, 'image_{:03f}.png'.format(i)))\n", "\n", "# plt.show()\n", " plt.close()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "cC1vTOubAydJ" }, "outputs": [], "source": [ "# 使用imageio製作gif圖\n", "anim_file = save_path + '/anim.gif'\n", "\n", "with imageio.get_writer(anim_file, mode='I') as writer:\n", "\n", " filenames = glob.glob(save_path + '/image*.png')\n", " filenames.sort(key=lambda x: os.path.getmtime(x))\n", "\n", " last = -1\n", " for i, filename in enumerate(filenames):\n", " frame = 4*(i**0.5)\n", " if round(frame) > round(last):\n", " last = frame\n", " else:\n", " continue\n", " image = imageio.imread(filename)\n", " writer.append_data(image)\n", " image = imageio.imread(filename)\n", " writer.append_data(image)\n", "\n", "display(Image(filename=anim_file))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "9cdClmeAAydK" }, "outputs": [], "source": [ "display(Image(filename='./exp_img_anim/lat_0/anim.gif'))" ] }, { "cell_type": "markdown", "metadata": { "id": "S8AXAr0GAydL" }, "source": [ "# Style mixing" ] }, { "cell_type": "markdown", "metadata": { "id": "MZoKJrWZAydL" }, "source": [ "這部分與上面的人臉類似,就不在贅述" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "lL-uigScAydM" }, "outputs": [], "source": [ "# sample a vector from Normal distribution\n", "rnd = np.random.RandomState(0)\n", "latents = rnd.randn(1, Gs.input_shape[1])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "1Bf7QBvhAydO" }, "outputs": [], "source": [ "def draw_style_mixing_figure(png, Gs, w, h, src_seeds, dst_seeds, style_ranges):\n", " print(png)\n", " src_latents = np.stack(np.random.RandomState(seed).randn(Gs.input_shape[1]) for seed in src_seeds)\n", " dst_latents = np.stack(np.random.RandomState(seed).randn(Gs.input_shape[1]) for seed in dst_seeds)\n", " src_dlatents = Gs.components.mapping.run(src_latents, None) # [seed, layer, component]\n", " dst_dlatents = Gs.components.mapping.run(dst_latents, None) # [seed, layer, component]\n", " src_images = Gs.components.synthesis.run(src_dlatents, randomize_noise=False, **synthesis_kwargs)\n", " dst_images = Gs.components.synthesis.run(dst_dlatents, randomize_noise=False, **synthesis_kwargs)\n", "\n", " canvas = PIL.Image.new('RGB', (w * (len(src_seeds) + 1), h * (len(dst_seeds) + 1)), 'white')\n", " for col, src_image in enumerate(list(src_images)):\n", " canvas.paste(PIL.Image.fromarray(src_image, 'RGB'), ((col + 1) * w, 0))\n", " for row, dst_image in enumerate(list(dst_images)):\n", " canvas.paste(PIL.Image.fromarray(dst_image, 'RGB'), (0, (row + 1) * h))\n", " row_dlatents = np.stack([dst_dlatents[row]] * len(src_seeds))\n", " row_dlatents[:, style_ranges[row]] = src_dlatents[:, style_ranges[row]]\n", " row_images = Gs.components.synthesis.run(row_dlatents, randomize_noise=False, **synthesis_kwargs)\n", " for col, image in enumerate(list(row_images)):\n", " canvas.paste(PIL.Image.fromarray(image, 'RGB'), ((col + 1) * w, (row + 1) * h))\n", " canvas.save(png)\n", "\n", "synthesis_kwargs = dict(output_transform=dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True),\n", " minibatch_size=8)\n", "\n", "draw_style_mixing_figure(os.path.join(result_dir, 'style-mixing-anim_8.png'), Gs, w=512, h=512,\n", " src_seeds=[639, 701, 687, 615, 2268],\n", " dst_seeds=[888, 829, 1898, 1733, 1614, 845, 1450, 2266],\n", " style_ranges=[range(0, 4)]*2+[range(4, 8)]*2+[range(8, 12)]*2+[range(12, 16)]*2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "PBIhpD5NAydP" }, "outputs": [], "source": [ "mix_img = cv2.imread('results/style-mixing-anim_8.png')\n", "plt.figure(figsize=(16, 24))\n", "plt.imshow(cv2.cvtColor(mix_img, cv2.COLOR_BGR2RGB))\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "1paYfOWIAydQ" }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.12" }, "colab": { "provenance": [], "gpuType": "T4", "include_colab_link": true }, "accelerator": "GPU" }, "nbformat": 4, "nbformat_minor": 0 }