{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1ad71161",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import random"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8685cbb3",
   "metadata": {},
   "source": [
    "請從給定的網址讀取本次測驗的資料集：https://github.com/TA-aiacademy/course_3.0/releases/download/Python/housing.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "08f684a7",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "52974cba",
   "metadata": {},
   "source": [
    "首先，先查看整份資料集相關資訊。  \n",
    "hint：info"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ff234199",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "e37beff5",
   "metadata": {},
   "source": [
    "從上面可以看出，longitude、latitude 和 ocean_proximity 具有缺值。  \n",
    "接下來，試著印出資料集的前 8 筆資料。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1f3717f1",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "54437801",
   "metadata": {},
   "source": [
    "再來，試著檢查資料集的大小 (形狀)，了解資料及共有幾筆資料和幾個欄位。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "df45fe90",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "519ba1d8",
   "metadata": {},
   "source": [
    "觀察了資料後，第一步，我們要先處理缺失值。  \n",
    "首先先找出含有 na 的全部資料。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e470ea0e",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "26445227",
   "metadata": {},
   "source": [
    "從上面可以知道，所有有缺失值的資料筆數總共是 860 筆。  \n",
    "我們也可以只取某一欄位含有缺失值的資料出來觀察，接下來，請試著取出 ocean_proximity 為缺失值的資料，並存到另一變數當中。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "debbc5b0",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "7a2603c8",
   "metadata": {},
   "source": [
    "觀察完有缺失值的資料後，我們要開始對缺失值進行處理。  \n",
    "首先，ocean_proximity 為類別型資料，請試著列出該欄位所有類別和類別的個數。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c34dddb0",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "c911bbaf",
   "metadata": {},
   "source": [
    "請試著用個數最多的類別填補 ocean_proximity 的缺失值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0743cbf4",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "3a4f77f7",
   "metadata": {},
   "source": [
    "剩下含有缺失值的 longitude、latitude 和 total_bedrooms 為數值型資料。  \n",
    "接著，請清除所有具缺失值的資料。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2df634a1",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "463d2bb2",
   "metadata": {},
   "source": [
    "最後，再顯示一次所有資料的相關資資訊，確定是否還有缺失值。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4a74b13d",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "3a7ee7e9",
   "metadata": {},
   "source": [
    "使用 describe 顯示各個欄位的統計量"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ac479a56",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "86b431f4",
   "metadata": {},
   "source": [
    "請依照 median_house_value 由大至小排序。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "377c4d1e",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "e5854f0a",
   "metadata": {},
   "source": [
    "請畫出 median_house_value 的直方圖 (hist)、機率密度統計圖 (kde)、盒型圖 (box)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b6eb6bd7",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "80fa9e2c",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "03d634bf",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "1d08c958",
   "metadata": {},
   "source": [
    "請依照 median_house_value 的高低，  \n",
    "將低於(含) 25 百分位的分為 'L'；  \n",
    "25 百分位至 75 百分位的分為 'M'；  \n",
    "高於 (含) 75 百分位的分為 'H'，並存至 'level' 欄位中"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6dc5e147",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "ef2d3912",
   "metadata": {},
   "source": [
    "請顯示 ocean_proximity 和 level 的統計量"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3a66ae42",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "cc7c2456",
   "metadata": {},
   "source": [
    "請使用 pivot_table ，設定 level 的各個類別為列、ocean_proximity 的各個類別為欄，  \n",
    "並且計算每個組別下 median_house_value 的平均。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "69faee81",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "b37fa3b2",
   "metadata": {},
   "source": [
    "上述發現， ISLAND 類別中有 NaN ，請列出所有 ocean_proximity 為 ISLAND 的資料觀察原因。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ea244739",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "9993ccef",
   "metadata": {},
   "source": [
    "請使用 groupby ，算出 level 欄位各個類別的 median_house_value 平均值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6efd50f8",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "0e7fafae",
   "metadata": {},
   "source": [
    "最後，請使用 left join 的方式，將  level 欄位各個類別的 median_house_value 併入到資料集當中。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "99251339",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}