{ "cells": [ { "cell_type": "code", "execution_count": null, "id": "1ad71161", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import random" ] }, { "cell_type": "markdown", "id": "8685cbb3", "metadata": {}, "source": [ "請從給定的網址讀取本次測驗的資料集:https://github.com/TA-aiacademy/course_3.0/releases/download/Python/housing.csv" ] }, { "cell_type": "code", "execution_count": null, "id": "08f684a7", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "52974cba", "metadata": {}, "source": [ "首先,先查看整份資料集相關資訊。 \n", "hint:info" ] }, { "cell_type": "code", "execution_count": null, "id": "ff234199", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "e37beff5", "metadata": {}, "source": [ "從上面可以看出,longitude、latitude 和 ocean_proximity 具有缺值。 \n", "接下來,試著印出資料集的前 8 筆資料。" ] }, { "cell_type": "code", "execution_count": null, "id": "1f3717f1", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "54437801", "metadata": {}, "source": [ "再來,試著檢查資料集的大小 (形狀),了解資料及共有幾筆資料和幾個欄位。" ] }, { "cell_type": "code", "execution_count": null, "id": "df45fe90", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "519ba1d8", "metadata": {}, "source": [ "觀察了資料後,第一步,我們要先處理缺失值。 \n", "首先先找出含有 na 的全部資料。" ] }, { "cell_type": "code", "execution_count": null, "id": "e470ea0e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "26445227", "metadata": {}, "source": [ "從上面可以知道,所有有缺失值的資料筆數總共是 860 筆。 \n", "我們也可以只取某一欄位含有缺失值的資料出來觀察,接下來,請試著取出 ocean_proximity 為缺失值的資料,並存到另一變數當中。" ] }, { "cell_type": "code", "execution_count": null, "id": "debbc5b0", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "7a2603c8", "metadata": {}, "source": [ "觀察完有缺失值的資料後,我們要開始對缺失值進行處理。 \n", "首先,ocean_proximity 為類別型資料,請試著列出該欄位所有類別和類別的個數。" ] }, { "cell_type": "code", "execution_count": null, "id": "c34dddb0", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "c911bbaf", "metadata": {}, "source": [ "請試著用個數最多的類別填補 ocean_proximity 的缺失值" ] }, { "cell_type": "code", "execution_count": null, "id": "0743cbf4", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "3a4f77f7", "metadata": {}, "source": [ "剩下含有缺失值的 longitude、latitude 和 total_bedrooms 為數值型資料。 \n", "接著,請清除所有具缺失值的資料。" ] }, { "cell_type": "code", "execution_count": null, "id": "2df634a1", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "463d2bb2", "metadata": {}, "source": [ "最後,再顯示一次所有資料的相關資資訊,確定是否還有缺失值。" ] }, { "cell_type": "code", "execution_count": null, "id": "4a74b13d", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "3a7ee7e9", "metadata": {}, "source": [ "使用 describe 顯示各個欄位的統計量" ] }, { "cell_type": "code", "execution_count": null, "id": "ac479a56", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "86b431f4", "metadata": {}, "source": [ "請依照 median_house_value 由大至小排序。" ] }, { "cell_type": "code", "execution_count": null, "id": "377c4d1e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "e5854f0a", "metadata": {}, "source": [ "請畫出 median_house_value 的直方圖 (hist)、機率密度統計圖 (kde)、盒型圖 (box)" ] }, { "cell_type": "code", "execution_count": null, "id": "b6eb6bd7", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "80fa9e2c", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "03d634bf", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "1d08c958", "metadata": {}, "source": [ "請依照 median_house_value 的高低, \n", "將低於(含) 25 百分位的分為 'L'; \n", "25 百分位至 75 百分位的分為 'M'; \n", "高於 (含) 75 百分位的分為 'H',並存至 'level' 欄位中" ] }, { "cell_type": "code", "execution_count": null, "id": "6dc5e147", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "ef2d3912", "metadata": {}, "source": [ "請顯示 ocean_proximity 和 level 的統計量" ] }, { "cell_type": "code", "execution_count": null, "id": "3a66ae42", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "cc7c2456", "metadata": {}, "source": [ "請使用 pivot_table ,設定 level 的各個類別為列、ocean_proximity 的各個類別為欄, \n", "並且計算每個組別下 median_house_value 的平均。" ] }, { "cell_type": "code", "execution_count": null, "id": "69faee81", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "b37fa3b2", "metadata": {}, "source": [ "上述發現, ISLAND 類別中有 NaN ,請列出所有 ocean_proximity 為 ISLAND 的資料觀察原因。" ] }, { "cell_type": "code", "execution_count": null, "id": "ea244739", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "9993ccef", "metadata": {}, "source": [ "請使用 groupby ,算出 level 欄位各個類別的 median_house_value 平均值" ] }, { "cell_type": "code", "execution_count": null, "id": "6efd50f8", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "0e7fafae", "metadata": {}, "source": [ "最後,請使用 left join 的方式,將 level 欄位各個類別的 median_house_value 併入到資料集當中。" ] }, { "cell_type": "code", "execution_count": null, "id": "99251339", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.12" } }, "nbformat": 4, "nbformat_minor": 5 }