{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "c96ce503",
   "metadata": {
    "id": "9d8e0fd8"
   },
   "source": [
    "# Naive Methods and Metrics of Forcasting"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "68f7c42e",
   "metadata": {
    "id": "455bccc3"
   },
   "source": [
    "這邊我們先給各位看一下使用numpy手刻完成一些最簡單的forcasting\n",
    "\n",
    "numpy手刻只是為了讓同學了解這些算法在使用python實現的方式，後面我們會附上一些簡單使用的套件，以後在使用時直接call套件的class或者funciton就可以簡單實現演算法"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "05d78867",
   "metadata": {
    "id": "ee521e0c"
   },
   "source": [
    "課程包含以下內容:\n",
    "- Data Split\n",
    "- Naive Forcasting\n",
    "    - K-Step Ahead\n",
    "    - Seasonal K-Step Ahead\n",
    "- Metrics for Forcasting"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "c5c5a0a2",
   "metadata": {
    "id": "da34f244"
   },
   "source": [
    "#### **開始前請先安裝或import基本套件**\n",
    "#### **若使用Jupyter Notebook開啟請轉成tree view方便顯示plotly出來的圖**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "96338ae3",
   "metadata": {
    "id": "e3f9e30a",
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "! pip install --user plotly"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5bb2d3ce",
   "metadata": {
    "id": "3c4f6482"
   },
   "outputs": [],
   "source": [
    "%matplotlib notebook\n",
    "\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "from plotly import express as px"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "77e8dbf9",
   "metadata": {
    "id": "25e0e629"
   },
   "source": [
    "**另外我們也先準備一個畫圖的function，我們不會放重點在這邊但後面會用它來看一些time series處理的過程**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0b1b4f0f",
   "metadata": {
    "id": "357c230d"
   },
   "outputs": [],
   "source": [
    "def plot_series(time, series, start=0, end=None, labels=None, title=None):\n",
    "    #  Visualizes time series data\n",
    "    # Args:\n",
    "    #  time (array of int) - 時間點, 長度為T\n",
    "    #  series (list of array of int) - 時間點對應的資料列表，列表內時間序列數量為D，\n",
    "    #                                  每筆資料長度為T，若非為列表則轉為列表\n",
    "    #  start (int) - 開始的資料序(第幾筆)\n",
    "    #  end (int) -   結束繪製的資料序(第幾筆)\n",
    "    #  labels (list of strings)- 對於多時間序列或多維度的標註\n",
    "    #  title (string)- 圖片標題\n",
    "\n",
    "    # 若資料只有一筆，則轉為list\n",
    "    if type(series) != list:\n",
    "        series = [series]\n",
    "\n",
    "    if not end:\n",
    "        end = len(series[0])\n",
    "\n",
    "    if labels:\n",
    "        # 設立dictionary, 讓plotly畫訊號線時可以標註label\n",
    "        dictionary = {\"time\": time}\n",
    "        for idx, l in enumerate(labels):\n",
    "            # 截斷資料，保留想看的部分，並分段紀錄於dictionary中\n",
    "            dictionary.update({l: series[idx][start:end]})\n",
    "        # 畫訊號線\n",
    "        fig = px.line(dictionary,\n",
    "                      x=\"time\",\n",
    "                      y=list(dictionary.keys())[1:],\n",
    "                      width=1000,\n",
    "                      height=400,\n",
    "                      title=title)\n",
    "    else:\n",
    "        # 畫訊號線\n",
    "        fig = px.line(x=time, y=series, width=1000, height=400, title=title)\n",
    "    fig.show()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "b2f17492",
   "metadata": {
    "id": "68a9cb51"
   },
   "source": [
    "這邊也附上我們的toy data產生器"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "79c8a77c",
   "metadata": {
    "id": "d7ba655a"
   },
   "outputs": [],
   "source": [
    "def trend(time, slope=0):\n",
    "    # 產生合成水平直線資料，其長度與時間等長，直線趨勢與設定slope相同\n",
    "    # Args:\n",
    "    #  time (array of int) - 時間點, 長度為T\n",
    "    #  slope (float) - 設定資料的傾斜程度與正負\n",
    "    # Returns:\n",
    "    #  series (array of float) -  產出slope 與設定相同的一條線\n",
    "\n",
    "    series = slope * time\n",
    "\n",
    "    return series\n",
    "\n",
    "\n",
    "def seasonal_pattern(season_time, pattern_type='triangle'):\n",
    "    # 產生某個特定pattern，\n",
    "    # Args:\n",
    "    #  season_time (array of float) - 周期內的時間點, 長度為T\n",
    "    #  pattern_type (str) -  這邊提供triangle與cosine\n",
    "    # Returns:\n",
    "    #  data_pattern (array of float) -  根據自訂函式產出特定的pattern\n",
    "\n",
    "    # 用特定function生成pattern\n",
    "    if pattern_type == 'triangle':\n",
    "        data_pattern = np.where(season_time < 0.5,\n",
    "                                season_time*2,\n",
    "                                2-season_time*2)\n",
    "    if pattern_type == 'cosine':\n",
    "        data_pattern = np.cos(season_time*np.pi*2)\n",
    "\n",
    "    return data_pattern\n",
    "\n",
    "\n",
    "def seasonality(time, period, amplitude=1, phase=30, pattern_type='triangle'):\n",
    "    # Repeats the same pattern at each period\n",
    "    # Args:\n",
    "    #   time (array of int) - 時間點, 長度為T\n",
    "    #   period (int) - 週期長度，必小於T\n",
    "    #   amplitude (float) - 序列幅度大小\n",
    "    #   phase (int) - 相位，為遞移量，正的向左(提前)、負的向右(延後)\n",
    "    #   pattern_type (str) -  這邊提供triangle與cosine\n",
    "    # Returns:\n",
    "    #   data_pattern (array of float) - 有指定周期、振幅、相位、pattern後的time series\n",
    "\n",
    "    # 將時間依週期重置為0\n",
    "    season_time = ((time + phase) % period) / period\n",
    "\n",
    "    # 產生週期性訊號並乘上幅度\n",
    "    data_pattern = amplitude * seasonal_pattern(season_time, pattern_type)\n",
    "\n",
    "    return data_pattern\n",
    "\n",
    "\n",
    "def noise(time, noise_level=1, seed=None):\n",
    "    # 合成雜訊，這邊用高斯雜訊，機率密度為常態分布\n",
    "    # Args:\n",
    "    #   time (array of int) - 時間點, 長度為T\n",
    "    #   noise_level (float) - 雜訊大小\n",
    "    #   seed (int) - 同樣的seed可以重現同樣的雜訊\n",
    "    # Returns:\n",
    "    #   noise (array of float) - 雜訊時間序列\n",
    "\n",
    "    # 做一個基於某個seed的雜訊生成器\n",
    "    rnd = np.random.RandomState(seed)\n",
    "\n",
    "    # 生與time同長度的雜訊，並且乘上雜訊大小 (不乘的話，標準差是1)\n",
    "    noise = rnd.randn(len(time)) * noise_level\n",
    "\n",
    "    return noise\n",
    "\n",
    "\n",
    "def toy_generation(time=np.arange(4 * 365),\n",
    "                   bias=500.,\n",
    "                   slope=0.1,\n",
    "                   period=180,\n",
    "                   amplitude=40.,\n",
    "                   phase=30,\n",
    "                   pattern_type='triangle',\n",
    "                   noise_level=5.,\n",
    "                   seed=2022):\n",
    "    signal_series = bias\\\n",
    "                  + trend(time, slope)\\\n",
    "                  + seasonality(time,\n",
    "                                period,\n",
    "                                amplitude,\n",
    "                                phase,\n",
    "                                pattern_type)\n",
    "    noise_series = noise(time, noise_level, seed)\n",
    "\n",
    "    series = signal_series+noise_series\n",
    "    return series"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7af4a318",
   "metadata": {
    "id": "f2039fb2"
   },
   "outputs": [],
   "source": [
    "# 先合成資料\n",
    "time = np.arange(4 * 365)  # 定義時間點\n",
    "series_sample = toy_generation(time)  # 這就是我們合成出來的資料\n",
    "\n",
    "# 畫\n",
    "plot_series(time, series_sample)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "fd71a5b7",
   "metadata": {
    "id": "c7aba289"
   },
   "source": [
    "## Data Split"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "cac05d0b",
   "metadata": {
    "id": "17730024"
   },
   "source": [
    "我們做機器學習一定要切訓練與測試集，但時間序列是會要求使用過去資料預測未來資料，所以切的時後須帶有序列性並且testing資料需要取training資料的後面的資料。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eed6d07f",
   "metadata": {
    "id": "c342eb86"
   },
   "outputs": [],
   "source": [
    "def split(x, train_size):\n",
    "    # 最簡單直接取前後，並且時間點也記得要切，我們直接立個function\n",
    "    return x[..., :train_size], x[..., train_size:]\n",
    "\n",
    "\n",
    "time_train, time_test = split(time, 365*3)\n",
    "series_train, series_test = split(series_sample, 365*3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8fc15240",
   "metadata": {
    "id": "887df120"
   },
   "outputs": [],
   "source": [
    "# 畫一下\n",
    "plot_series(time_train, series_train, title=\"Training\")\n",
    "plot_series(time_test, series_test, title=\"Testing\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "a5ded3ca",
   "metadata": {
    "id": "dc645a08"
   },
   "source": [
    "## K-step Ahead"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "40236792",
   "metadata": {
    "id": "e25ce59a"
   },
   "source": [
    "這邊我們用最天真的方式來預測看看我們手上的資料，\n",
    "\n",
    "K-step ahead即是每次利用上K個時間資料直接預測下個時間點資料。\n",
    "\n",
    "這個前提是我們已經拿到K個時間點前的資料了，例如我們用今天的車流量去預測明天的車流量。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "89c912a7",
   "metadata": {
    "id": "0ef5b443"
   },
   "outputs": [],
   "source": [
    "# 使用前面時間點資料預測下時間點\n",
    "K = 1\n",
    "forcast = series_train[:-K]\n",
    "ground_truth_for_view = series_train[K:]\n",
    "time_for_view = time_train[K:]\n",
    "\n",
    "plot_series(time_for_view,\n",
    "            [forcast, ground_truth_for_view],\n",
    "            labels=['prediction', 'ground truth'])"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "c66bb897",
   "metadata": {
    "id": "1819fd10"
   },
   "source": [
    "可以看到巨觀來講會蠻準的，但是把顯示圖片zoom-in一下會看到實際上會有跟noise大小差不多的差距。"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "cd4fe9bf",
   "metadata": {
    "id": "abe4f900"
   },
   "source": [
    "## Seasonal K-step Ahead"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "a6596fe3",
   "metadata": {
    "id": "3895bf54"
   },
   "source": [
    "這個超參數K很吃資料，若不是對資料很清楚，會產生非常大的偏移。因為K-step ahead可看做對資料做時間平移。\n",
    "\n",
    "但也可以利用這個平移的特性，因為我們已經知道周期了，所以或許還可以使用一個周期來預測資料。\n",
    "\n",
    "但因為我們的case裡面有trend，所以直接apply週期上會不準"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "544c6a7d",
   "metadata": {
    "id": "a48d1d6f"
   },
   "outputs": [],
   "source": [
    "K = 5\n",
    "P = 180\n",
    "forcast = series_train[:-K-P]\n",
    "ground_truth_for_view = series_train[K+P:]\n",
    "time_for_view = time_train[K+P:]\n",
    "\n",
    "plot_series(time_for_view,\n",
    "            [forcast, ground_truth_for_view],\n",
    "            labels=['prediction', 'ground truth'])"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "61f27111",
   "metadata": {
    "id": "0ebac4e7"
   },
   "source": [
    "我們完成一個predictor可以把它包起來包成一個function，以供後續使用\n",
    "\n",
    "很多time series forcasting method都會讓資料喪失資料點，會喪失K個"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "af6d9ad9",
   "metadata": {
    "id": "92ad026f"
   },
   "outputs": [],
   "source": [
    "def k_step_ahead(data, k):\n",
    "    # 產生k-step ahead預測\n",
    "    # Args:\n",
    "    #  data (array of float) - 輸入資料\n",
    "    # Returns:\n",
    "    #  forcast (array of float) -  k-step ahead預測結果\n",
    "\n",
    "    forcast = data[:-k]\n",
    "    return forcast"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f403fc89",
   "metadata": {
    "id": "0db6e732",
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# 試用\n",
    "K = 3\n",
    "forcast = k_step_ahead(series_train, K)\n",
    "ground_truth_for_view = series_train[K:]\n",
    "time_for_view = time_train[K:]\n",
    "\n",
    "plot_series(time_for_view, \n",
    "            [forcast, ground_truth_for_view],\n",
    "            labels=['prediction', 'ground truth'])"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "b7d33b7f",
   "metadata": {
    "id": "358971a7"
   },
   "source": [
    "## Metrics for Forcasting"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "30c573ea",
   "metadata": {
    "id": "5dfb7931"
   },
   "source": [
    "當有了預測方式後，我們可以用一些指標衡量這些預測，通常是藉由error來計算的\n",
    "<img src=\"https://hackmd.io/_uploads/rJ0CGvbY3.png\" width=400 align=\"right\">\n",
    "\n",
    "\n",
    "**一些常見做法:**\n",
    "- Mean Absolute Error (MAE)\n",
    "- Mean Square Error (MSE)\n",
    "- R square Score (R2)\n",
    "\n",
    "下面我們自己刻一下"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "0f913681",
   "metadata": {
    "id": "3e1ec34a"
   },
   "source": [
    "### Home Brew Metric Function"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "62886e33",
   "metadata": {
    "id": "e126474b"
   },
   "source": [
    "<img src=\"https://hackmd.io/_uploads/SyIxQvZYh.png\" width=200 align=\"left\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c54efa49",
   "metadata": {
    "id": "091dbe35"
   },
   "outputs": [],
   "source": [
    "def MAE(pred, gt):\n",
    "    # 計算Mean Absolute Error\n",
    "    # Args:\n",
    "    #  pred (array of float) - 預測資料\n",
    "    #  gt (array of float) - 答案資料\n",
    "    # Returns:\n",
    "    #  計算結果 (float)\n",
    "    return abs(pred-gt).mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d75a3734",
   "metadata": {
    "id": "6646e8dc"
   },
   "outputs": [],
   "source": [
    "MAE(forcast, series_train[K:])"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "09531b12",
   "metadata": {
    "id": "a78748fa"
   },
   "source": [
    "<img src=\"https://hackmd.io/_uploads/SyabXvbFh.png\" width=200 align=\"left\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "76c05839",
   "metadata": {
    "id": "cc543f6b"
   },
   "outputs": [],
   "source": [
    "def MSE(pred, gt):\n",
    "    # 計算Mean Square Error\n",
    "    # Args:\n",
    "    #  pred (array of float) - 預測資料\n",
    "    #  gt (array of float) - 答案資料\n",
    "    # Returns:\n",
    "    #  計算結果 (float)\n",
    "    return pow(pred-gt, 2).mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "766bd6bd",
   "metadata": {
    "id": "a767b2ca"
   },
   "outputs": [],
   "source": [
    "MSE(forcast, series_train[K:])"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "eb4b4e26",
   "metadata": {
    "id": "8d678eed"
   },
   "source": [
    "<img src=\"https://hackmd.io/_uploads/rJiEXvWYn.png\" width=400 align=\"left\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c0b98f37",
   "metadata": {
    "id": "da2f8e1a"
   },
   "outputs": [],
   "source": [
    "def R2(pred, gt):\n",
    "    # 計算R square score\n",
    "    # Args:\n",
    "    #  pred (array of float) - 預測資料\n",
    "    #  gt (array of float) - 答案資料\n",
    "    # Returns:\n",
    "    #  計算結果 (float)\n",
    "    return 1-pow(pred-gt, 2).sum()/pow(gt-gt.mean(), 2).sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fc8ea503",
   "metadata": {
    "id": "f34a4d75"
   },
   "outputs": [],
   "source": [
    "R2(forcast, series_train[K:])\n",
    "\n",
    "# 輸出值會在 [-inf~1]，若等於0代表error與原本訊號的"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "3dfe3659",
   "metadata": {
    "id": "0e5fced8"
   },
   "source": [
    "### Call Function from Module"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "35a6039f",
   "metadata": {
    "id": "bcc37e2a"
   },
   "source": [
    "而在tensorflow有直接使用的套件，都在```tensorflow.keras.metrics```裡面。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6e2474b7",
   "metadata": {
    "id": "09bb778a"
   },
   "outputs": [],
   "source": [
    "import tensorflow.keras.metrics as metrics"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "82afe640",
   "metadata": {
    "id": "cb76e4db"
   },
   "outputs": [],
   "source": [
    "# MAE\n",
    "print(metrics.mean_absolute_error(forcast, series_train[K:]))\n",
    "# MSE\n",
    "print(metrics.mean_squared_error(forcast, series_train[K:]))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "0d1e0ac1",
   "metadata": {
    "id": "15f813ea"
   },
   "source": [
    "比較特別的是它們會用tf.Tensor的方式跑，也會儲存為tf.Tensor。\n",
    "\n",
    "如果自己設計training+validation機制，用tensorflow的這幾個算法可以讓Metric計算都在GPU裡面執行"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}