{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "ggFUZOmm3V8H" }, "source": [ "## $\\Large{Pandas\\; 練習題}$" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "6KNEKLNu3V8H" }, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": { "id": "oTwSt8wsthv7" }, "source": [ "## 範例資料" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "rac-V0Zzthv7" }, "outputs": [], "source": [ "data = {'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],\n", " 'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],\n", " 'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],\n", " 'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}" ] }, { "cell_type": "markdown", "metadata": { "id": "vpvu3q3Tthv8" }, "source": [ "## Exercise 1\n", "以上面提供的字典資料建立dataframe資料,並且命名為df" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "q13U4OLZthv8" }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "id": "hMG5gNlmthv8" }, "source": [ "## Exercise 2\n", "使用describe呈現df資料的基本資訊" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "M0xW_f1Othv9" }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "id": "V-tlQJKbthv9" }, "source": [ "## Exercise 3\n", "從df資料中挑選animal與priority兩個欄位的資料" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "gadybog6thv9" }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "id": "gErXk1Ybthv9" }, "source": [ "## Exercise 4\n", "使用.loc的方式選取df資料中index為3, 4, 8且欄位為animal與age的資料\n", "\n", "範例輸出\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "QpvdTIlBthv-" }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "id": "lVHOq1Dwthv-" }, "source": [ "## Exercise 5\n", "選取df資料中age欄位非遺漏值的所有資料\n", "\n", "範例輸出\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "i6wQUEOothv-" }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "id": "fcsHpO2Rthv-" }, "source": [ "## Exercise 6\n", "承上,在剔除掉age的遺漏資料後將df資料以age欄位由小到大排序" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "a04ZN9Jxthv_" }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "id": "SFMiQXa3thv_" }, "source": [ "## Exercise 7\n", "找出df資料中age的最大值" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "A4fEjIYmthv_" }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "id": "4d45AtFJthv_" }, "source": [ "## Exercise 8\n", "使用groupby方法依據priority欄位分組並計算visits的平均數\n", "\n", "範例輸出\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "8a5d0qaXthv_" }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "id": "jkbSaGPmthv_" }, "source": [ "## Exercise 9\n", "繪製df資料中animal欄位的長條圖" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "f1yi6iUFthv_" }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "id": "rJx7yq89thwA" }, "source": [ "## Exercise 10\n", "繪製df資料中age欄位的機率密度函數圖" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "1sCHxph_thwA" }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "id": "ECE333NPthwA" }, "source": [ "---\n", "## 範例資料 2\n", "資料路徑: 'https://github.com/TA-aiacademy/course_3.0/releases/download/Python/airline.csv'" ] }, { "cell_type": "markdown", "metadata": { "id": "JuT8pRt-thwA" }, "source": [ "## Exercise 11\n", "\n", "將下列的csv檔讀取為pandas dataFrame型態並且命名為df。檔案包含 header且第一列是index" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "WGDNBGrfthwA" }, "outputs": [], "source": [ "csv_path = 'https://github.com/TA-aiacademy/course_3.0/releases/download/Python/airline.csv'" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "yweE_yGGthwA" }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "id": "xGABsDSlthwA" }, "source": [ "## Exercise 12\n", "承上,從df中隨機選取10筆資料印出" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "0afKE3rxthwB" }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "id": "DtL3Ty_3thwB" }, "source": [ "## Exercise 13\n", "\n", "移除df中數值完全重複的列(rows),並同樣存回df變數中。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "3Ti0f8aUthwB" }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "id": "c7qxmXU0thwB" }, "source": [ "## Exercise 14\n", "\n", "印出df中每個欄位的遺漏值數量,將含有遺漏值的列(row)移除並同樣存回df變數中。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "h1W0MtRothwB" }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "id": "esS_1VhpthwB" }, "source": [ "## Excerise 15\n", "\n", "將df中第89行(index=199)的「src_airport」欄位取代為 `SFO`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "cs4Jt5lUthwB" }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "id": "BqHy0Yq7thwB" }, "source": [ "## Exercise 16\n", " \n", "複製一個新的資料表df2,並回傳一個和原本一模一樣的新資料表。\n", "\n", "註:你可以試著對df作一些修改,若df2也同樣被改動代表你沒有成功複製一個新的資料表" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "qkL3M6xethwC" }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "id": "8CND-I1othwC" }, "source": [ "## Exercise 17\n", "重新編號df資料的索引使其變成連續(0, 1, 2,....),並將舊索引存成index欄位,將執行後的資料表命名為df_new" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "7WZl6bwfthwC" }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "id": "OvonmBFxthwC" }, "source": [ "## Exercise 18\n", "\n", "將df資料轉換成Numpy array的物件類型,並且命名為df_np" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Xq-xuRYAthwC" }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "id": "Mg6ZFLI_thwC" }, "source": [ "### 請參考 [100-pandas-puzzles](https://github.com/ajcr/100-pandas-puzzles/blob/master/100-pandas-puzzles.ipynb) 做更多 pandas 的資料操作練習\n", "\n", "以下也有許多其他資源可供大家練習Pandas或做參考。\n", "* [10 minutes to pandas](http://pandas.pydata.org/pandas-docs/stable/10min.html)\n", "* [pandas basics](http://pandas.pydata.org/pandas-docs/stable/basics.html)\n", "* [tutorials](http://pandas.pydata.org/pandas-docs/stable/tutorials.html)\n", "* [cookbook and idioms](http://pandas.pydata.org/pandas-docs/version/0.17.0/cookbook.html#cookbook)\n", "* [Guilherme Samora's pandas exercises](https://github.com/guipsamora/pandas_exercises)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" }, "colab": { "provenance": [] } }, "nbformat": 4, "nbformat_minor": 0 }