{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Loading Dataset & Quick Overview" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "from matplotlib.pylab import rcParams\n", "from sklearn.datasets import load_boston\n", "\n", "%matplotlib inline\n", "rcParams['figure.figsize'] = 15, 15\n", "# 讀取資料\n", "data_url = \"http://lib.stat.cmu.edu/datasets/boston\"\n", "raw_df = pd.read_csv(data_url, sep=\"\\s+\", skiprows=22, header=None)\n", "data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])\n", "target = raw_df.values[1::2, 2]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(data.data.shape) # data ==> Features\n", "print(data.target.shape) # target ==> Label" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(data.DESCR)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Plot Features V.S. Y\n", "Can you explain the relation between other features with house prices?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "\"\"\"\n", "- CRIM per capita crime rate by town\n", "- ZN proportion of residential land zoned for lots over 25,000 sq.ft.\n", "- INDUS proportion of non-retail business acres per town\n", "- CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)\n", "- NOX nitric oxides concentration (parts per 10 million)\n", "- RM average number of rooms per dwelling\n", "- AGE proportion of owner-occupied units built prior to 1940\n", "- DIS weighted distances to five Boston employment centres\n", "- RAD index of accessibility to radial highways\n", "- TAX full-value property-tax rate per $10,000\n", "- PTRATIO pupil-teacher ratio by town\n", "- B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town\n", "- LSTAT % lower status of the population\n", "\"\"\"\n", "pass" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Predicting Home Prices: SVR\n", "在沒有做任何new feature生成的情況下,是否可以讓SVR的performance接近linear regression?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# import needed models in scikit-learn by yourself\n", "# and fit a linear model using training dataset\n", "from sklearn.model_selection import train_test_split\n", "X_train, X_test, y_train, y_test = train_test_split(X,\n", " y,\n", " test_size=0.25,\n", " random_state=42,\n", " shuffle=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Make Prediciton with SVC" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Evaluation \n", "利用Root mean square或 mean absolute error來評估結果,看看是否有辦法做的比\n", "linear regression好" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.2" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }