Best Python code snippet using autotest_python
ML0101EN-Proj-Loan-answer-py-v1-checkpoint.py
Source:ML0101EN-Proj-Loan-answer-py-v1-checkpoint.py
1# -*- coding: utf-8 -*-2# ---3# jupyter:4# jupytext:5# cell_markers: '{{{,}}}'6# text_representation:7# extension: .py8# format_name: light9# format_version: '1.5'10# jupytext_version: 1.4.211# kernelspec:12# display_name: Python 313# language: python14# name: python315# ---16# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}17# <a href="https://www.bigdatauniversity.com"><img src = "https://ibm.box.com/shared/static/cw2c7r3o20w9zn8gkecaeyjhgw3xdgbj.png" width = 400, align = "center"></a>18#19# <h1 align=center><font size = 5> Classification with Python</font></h1>20# }}}21# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}22# In this notebook we try to practice all the classification algorithms that we learned in this course.23#24# We load a dataset using Pandas library, and apply the following algorithms, and find the best one for this specific dataset by accuracy evaluation methods.25#26# Lets first load required libraries:27# }}}28# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}29import itertools30import numpy as np31import matplotlib.pyplot as plt32from matplotlib.ticker import NullFormatter33import pandas as pd34import numpy as np35import matplotlib.ticker as ticker36from sklearn import preprocessing37# %matplotlib inline38# }}}39# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}40# ### About dataset41# }}}42# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}43# This dataset is about past loans. The __Loan_train.csv__ data set includes details of 346 customers whose loan are already paid off or defaulted. It includes following fields:44#45# | Field | Description |46# |----------------|---------------------------------------------------------------------------------------|47# | Loan_status | Whether a loan is paid off on in collection |48# | Principal | Basic principal loan amount at the |49# | Terms | Origination terms which can be weekly (7 days), biweekly, and monthly payoff schedule |50# | Effective_date | When the loan got originated and took effects |51# | Due_date | Since itâs one-time payoff schedule, each loan has one single due date |52# | Age | Age of applicant |53# | Education | Education of applicant |54# | Gender | The gender of applicant |55# }}}56# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}57# Lets download the dataset58# }}}59# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}60# !wget -O loan_train.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/loan_train.csv61# }}}62# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}63# ### Load Data From CSV File 64# }}}65# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}66df = pd.read_csv('loan_train.csv')67df.head()68# }}}69df.shape70# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}71# ### Convert to date time object 72# }}}73# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}74df['due_date'] = pd.to_datetime(df['due_date'])75df['effective_date'] = pd.to_datetime(df['effective_date'])76df.head()77# }}}78# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}79# # Data visualization and pre-processing80#81#82# }}}83# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}84# Letâs see how many of each class is in our data set 85# }}}86# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}87df['loan_status'].value_counts()88# }}}89# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}90# 260 people have paid off the loan on time while 86 have gone into collection 91#92# }}}93# Lets plot some columns to underestand data better:94# notice: installing seaborn might takes a few minutes95# !conda install -c anaconda seaborn -y96# {{{97import seaborn as sns98bins = np.linspace(df.Principal.min(), df.Principal.max(), 10)99g = sns.FacetGrid(df, col="Gender", hue="loan_status", palette="Set1", col_wrap=2)100g.map(plt.hist, 'Principal', bins=bins, ec="k")101g.axes[-1].legend()102plt.show()103# }}}104# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}105bins=np.linspace(df.age.min(), df.age.max(), 10)106g = sns.FacetGrid(df, col="Gender", hue="loan_status", palette="Set1", col_wrap=2)107g.map(plt.hist, 'age', bins=bins, ec="k")108g.axes[-1].legend()109plt.show()110# }}}111# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}112# # Pre-processing: Feature selection/extraction113# }}}114# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}115# ### Lets look at the day of the week people get the loan 116# }}}117# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}118df['dayofweek'] = df['effective_date'].dt.dayofweek119bins=np.linspace(df.dayofweek.min(), df.dayofweek.max(), 10)120g = sns.FacetGrid(df, col="Gender", hue="loan_status", palette="Set1", col_wrap=2)121g.map(plt.hist, 'dayofweek', bins=bins, ec="k")122g.axes[-1].legend()123plt.show()124# }}}125# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}126# We see that people who get the loan at the end of the week dont pay it off, so lets use Feature binarization to set a threshold values less then day 4 127# }}}128# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}129df['weekend']= df['dayofweek'].apply(lambda x: 1 if (x>3) else 0)130df.head()131# }}}132# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}133# ## Convert Categorical features to numerical values134# }}}135# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}136# Lets look at gender:137# }}}138# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}139df.groupby(['Gender'])['loan_status'].value_counts(normalize=True)140# }}}141# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}142# 86 % of female pay there loans while only 73 % of males pay there loan143#144# }}}145# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}146# Lets convert male to 0 and female to 1:147#148# }}}149# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}150df['Gender'].replace(to_replace=['male','female'], value=[0,1],inplace=True)151df.head()152# }}}153# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}154# ## One Hot Encoding 155# #### How about education?156# }}}157# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}158df.groupby(['education'])['loan_status'].value_counts(normalize=True)159# }}}160# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}161# #### Feature befor One Hot Encoding162# }}}163# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}164df[['Principal','terms','age','Gender','education']].head()165# }}}166# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}167# #### Use one hot encoding technique to conver categorical varables to binary variables and append them to the feature Data Frame 168# }}}169# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}170Feature = df[['Principal','terms','age','Gender','weekend']]171Feature = pd.concat([Feature,pd.get_dummies(df['education'])], axis=1)172Feature.drop(['Master or Above'], axis = 1,inplace=True)173Feature.head()174# }}}175# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}176# ### Feature selection177# }}}178# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}179# Lets defind feature sets, X:180# }}}181# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}182X = Feature183X[0:5]184# }}}185# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}186# What are our lables?187# }}}188# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}189y = df['loan_status'].values190y[0:5]191# }}}192# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}193# ## Normalize Data 194# }}}195# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}196# Data Standardization give data zero mean and unit variance (technically should be done after train test split )197# }}}198# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}199X = preprocessing.StandardScaler().fit(X).transform(X)200X[0:5]201# }}}202# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}203# # Classification 204# }}}205# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}206# Now, it is your turn, use the training set to build an accurate model. Then use the test set to report the accuracy of the model207# You should use the following algorithm:208# - K Nearest Neighbor(KNN)209# - Decision Tree210# - Support Vector Machine211# - Logistic Regression212#213#214#215# __ Notice:__ 216# - You can go above and change the pre-processing, feature selection, feature-extraction, and so on, to make a better model.217# - You should use either scikit-learn, Scipy or Numpy libraries for developing the classification algorithms.218# - You should include the code of the algorithm in the following cells.219# }}}220# # K Nearest Neighbor(KNN)221# Notice: You should find the best k to build the model with the best accuracy. 222# **warning:** You should not use the __loan_test.csv__ for finding the best k, however, you can split your train_loan.csv into train and test to find the best __k__.223# We split the X into train and test to find the best k224from sklearn.model_selection import train_test_split225X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=4)226print ('Train set:', X_train.shape, y_train.shape)227print ('Test set:', X_test.shape, y_test.shape)228# Modeling229from sklearn.neighbors import KNeighborsClassifier230k = 3231#Train Model and Predict 232kNN_model = KNeighborsClassifier(n_neighbors=k).fit(X_train,y_train)233kNN_model234# just for sanity chaeck235yhat = kNN_model.predict(X_test)236yhat[0:5]237# Best k238Ks=15239mean_acc=np.zeros((Ks-1))240std_acc=np.zeros((Ks-1))241ConfustionMx=[];242for n in range(1,Ks):243 244 #Train Model and Predict 245 kNN_model = KNeighborsClassifier(n_neighbors=n).fit(X_train,y_train)246 yhat = kNN_model.predict(X_test)247 248 249 mean_acc[n-1]=np.mean(yhat==y_test);250 251 std_acc[n-1]=np.std(yhat==y_test)/np.sqrt(yhat.shape[0])252mean_acc253# Building the model again, using k=7254from sklearn.neighbors import KNeighborsClassifier255k = 7256#Train Model and Predict 257kNN_model = KNeighborsClassifier(n_neighbors=k).fit(X_train,y_train)258kNN_model259# # Decision Tree260from sklearn.tree import DecisionTreeClassifier261DT_model = DecisionTreeClassifier(criterion="entropy", max_depth = 4)262DT_model.fit(X_train,y_train)263DT_model264yhat = DT_model.predict(X_test)265yhat266# # Support Vector Machine267from sklearn import svm268SVM_model = svm.SVC()269SVM_model.fit(X_train, y_train) 270yhat = SVM_model.predict(X_test)271yhat272# # Logistic Regression273from sklearn.linear_model import LogisticRegression274LR_model = LogisticRegression(C=0.01).fit(X_train,y_train)275LR_model276yhat = LR_model.predict(X_test)277yhat278# # Model Evaluation using Test set279from sklearn.metrics import jaccard_similarity_score280from sklearn.metrics import f1_score281from sklearn.metrics import log_loss282# First, download and load the test set:283# !wget -O loan_test.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/loan_test.csv284# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}285# ### Load Test set for evaluation 286# }}}287# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}288test_df = pd.read_csv('loan_test.csv')289test_df.head()290# }}}291## Preprocessing292test_df['due_date'] = pd.to_datetime(test_df['due_date'])293test_df['effective_date'] = pd.to_datetime(test_df['effective_date'])294test_df['dayofweek'] = test_df['effective_date'].dt.dayofweek295test_df['weekend'] = test_df['dayofweek'].apply(lambda x: 1 if (x>3) else 0)296test_df['Gender'].replace(to_replace=['male','female'], value=[0,1],inplace=True)297test_Feature = test_df[['Principal','terms','age','Gender','weekend']]298test_Feature = pd.concat([test_Feature,pd.get_dummies(test_df['education'])], axis=1)299test_Feature.drop(['Master or Above'], axis = 1,inplace=True)300test_X = preprocessing.StandardScaler().fit(test_Feature).transform(test_Feature)301test_X[0:5]302test_y = test_df['loan_status'].values303test_y[0:5]304knn_yhat = kNN_model.predict(test_X)305print("KNN Jaccard index: %.2f" % jaccard_similarity_score(test_y, knn_yhat))306print("KNN F1-score: %.2f" % f1_score(test_y, knn_yhat, average='weighted') )307DT_yhat = DT_model.predict(test_X)308print("DT Jaccard index: %.2f" % jaccard_similarity_score(test_y, DT_yhat))309print("DT F1-score: %.2f" % f1_score(test_y, DT_yhat, average='weighted') )310SVM_yhat = SVM_model.predict(test_X)311print("SVM Jaccard index: %.2f" % jaccard_similarity_score(test_y, SVM_yhat))312print("SVM F1-score: %.2f" % f1_score(test_y, SVM_yhat, average='weighted') )313LR_yhat = LR_model.predict(test_X)314LR_yhat_prob = LR_model.predict_proba(test_X)315print("LR Jaccard index: %.2f" % jaccard_similarity_score(test_y, LR_yhat))316print("LR F1-score: %.2f" % f1_score(test_y, LR_yhat, average='weighted') )317print("LR LogLoss: %.2f" % log_loss(test_y, LR_yhat_prob))318# # Report319# You should be able to report the accuracy of the built model using different evaluation metrics:320# | Algorithm | Jaccard | F1-score | LogLoss |321# |--------------------|---------|----------|---------|322# | KNN | 0.67 | 0.63 | NA |323# | Decision Tree | 0.72 | 0.74 | NA |324# | SVM | 0.80 | 0.76 | NA |325# | LogisticRegression | 0.74 | 0.66 | 0.57 |326# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}327# ## Want to learn more?328#329# IBM SPSS Modeler is a comprehensive analytics platform that has many machine learning algorithms. It has been designed to bring predictive intelligence to decisions made by individuals, by groups, by systems â by your enterprise as a whole. A free trial is available through this course, available here: [SPSS Modeler](http://cocl.us/ML0101EN-SPSSModeler).330#331# Also, you can use Watson Studio to run these notebooks faster with bigger datasets. Watson Studio is IBM's leading cloud solution for data scientists, built by data scientists. With Jupyter notebooks, RStudio, Apache Spark and popular libraries pre-packaged in the cloud, Watson Studio enables data scientists to collaborate on their projects without having to install anything. Join the fast-growing community of Watson Studio users today with a free account at [Watson Studio](https://cocl.us/ML0101EN_DSX)332#333#334# <hr>335# Copyright © 2018 [Cognitive Class](https://cocl.us/DX0108EN_CC). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/).â336# }}}337# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}338# ### Thanks for completing this lesson!339#340# Notebook created by: <a href = "https://ca.linkedin.com/in/saeedaghabozorgi">Saeed Aghabozorgi</a>...
ML0101EN-Proj-Loan-answer-py-v1.py
Source:ML0101EN-Proj-Loan-answer-py-v1.py
1# -*- coding: utf-8 -*-2# ---3# jupyter:4# jupytext:5# cell_markers: '{{{,}}}'6# text_representation:7# extension: .py8# format_name: light9# format_version: '1.5'10# jupytext_version: 1.4.211# kernelspec:12# display_name: Python 313# language: python14# name: python315# ---16# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}17# <a href="https://www.bigdatauniversity.com"><img src = "https://ibm.box.com/shared/static/cw2c7r3o20w9zn8gkecaeyjhgw3xdgbj.png" width = 400, align = "center"></a>18#19# <h1 align=center><font size = 5> Classification with Python</font></h1>20# }}}21# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}22# In this notebook we try to practice all the classification algorithms that we learned in this course.23#24# We load a dataset using Pandas library, and apply the following algorithms, and find the best one for this specific dataset by accuracy evaluation methods.25#26# Lets first load required libraries:27# }}}28# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}29import itertools30import numpy as np31import matplotlib.pyplot as plt32from matplotlib.ticker import NullFormatter33import pandas as pd34import numpy as np35import matplotlib.ticker as ticker36from sklearn import preprocessing37# %matplotlib inline38# }}}39# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}40# ### About dataset41# }}}42# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}43# This dataset is about past loans. The __Loan_train.csv__ data set includes details of 346 customers whose loan are already paid off or defaulted. It includes following fields:44#45# | Field | Description |46# |----------------|---------------------------------------------------------------------------------------|47# | Loan_status | Whether a loan is paid off on in collection |48# | Principal | Basic principal loan amount at the |49# | Terms | Origination terms which can be weekly (7 days), biweekly, and monthly payoff schedule |50# | Effective_date | When the loan got originated and took effects |51# | Due_date | Since itâs one-time payoff schedule, each loan has one single due date |52# | Age | Age of applicant |53# | Education | Education of applicant |54# | Gender | The gender of applicant |55# }}}56# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}57# Lets download the dataset58# }}}59# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}60# !wget -O loan_train.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/loan_train.csv61# }}}62# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}63# ### Load Data From CSV File 64# }}}65# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}66df = pd.read_csv('loan_train.csv')67df.head()68# }}}69df.shape70# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}71# ### Convert to date time object 72# }}}73# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}74df['due_date'] = pd.to_datetime(df['due_date'])75df['effective_date'] = pd.to_datetime(df['effective_date'])76df.head()77# }}}78# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}79# # Data visualization and pre-processing80#81#82# }}}83# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}84# Letâs see how many of each class is in our data set 85# }}}86# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}87df['loan_status'].value_counts()88# }}}89# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}90# 260 people have paid off the loan on time while 86 have gone into collection 91#92# }}}93# Lets plot some columns to underestand data better:94# notice: installing seaborn might takes a few minutes95# !conda install -c anaconda seaborn -y96# {{{97import seaborn as sns98bins = np.linspace(df.Principal.min(), df.Principal.max(), 10)99g = sns.FacetGrid(df, col="Gender", hue="loan_status", palette="Set1", col_wrap=2)100g.map(plt.hist, 'Principal', bins=bins, ec="k")101g.axes[-1].legend()102plt.show()103# }}}104# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}105bins=np.linspace(df.age.min(), df.age.max(), 10)106g = sns.FacetGrid(df, col="Gender", hue="loan_status", palette="Set1", col_wrap=2)107g.map(plt.hist, 'age', bins=bins, ec="k")108g.axes[-1].legend()109plt.show()110# }}}111# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}112# # Pre-processing: Feature selection/extraction113# }}}114# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}115# ### Lets look at the day of the week people get the loan 116# }}}117# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}118df['dayofweek'] = df['effective_date'].dt.dayofweek119bins=np.linspace(df.dayofweek.min(), df.dayofweek.max(), 10)120g = sns.FacetGrid(df, col="Gender", hue="loan_status", palette="Set1", col_wrap=2)121g.map(plt.hist, 'dayofweek', bins=bins, ec="k")122g.axes[-1].legend()123plt.show()124# }}}125# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}126# We see that people who get the loan at the end of the week dont pay it off, so lets use Feature binarization to set a threshold values less then day 4 127# }}}128# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}129df['weekend']= df['dayofweek'].apply(lambda x: 1 if (x>3) else 0)130df.head()131# }}}132# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}133# ## Convert Categorical features to numerical values134# }}}135# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}136# Lets look at gender:137# }}}138# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}139df.groupby(['Gender'])['loan_status'].value_counts(normalize=True)140# }}}141# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}142# 86 % of female pay there loans while only 73 % of males pay there loan143#144# }}}145# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}146# Lets convert male to 0 and female to 1:147#148# }}}149# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}150df['Gender'].replace(to_replace=['male','female'], value=[0,1],inplace=True)151df.head()152# }}}153# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}154# ## One Hot Encoding 155# #### How about education?156# }}}157# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}158df.groupby(['education'])['loan_status'].value_counts(normalize=True)159# }}}160# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}161# #### Feature befor One Hot Encoding162# }}}163# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}164df[['Principal','terms','age','Gender','education']].head()165# }}}166# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}167# #### Use one hot encoding technique to conver categorical varables to binary variables and append them to the feature Data Frame 168# }}}169# {{{ button=false new_sheet=false run_control={"read_only": false}170Feature = df[['Principal','terms','age','Gender','weekend']]171Feature = pd.concat([Feature,pd.get_dummies(df['education'])], axis=1)172Feature.drop(['Master or Above'], axis = 1,inplace=True)173Feature.head()174# }}}175# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}176# ### Feature selection177# }}}178# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}179# Lets defind feature sets, X:180# }}}181# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}182X = Feature183X[0:5]184# }}}185# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}186# What are our lables?187# }}}188# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}189y = df['loan_status'].values190y[0:5]191# }}}192# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}193# ## Normalize Data 194# }}}195# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}196# Data Standardization give data zero mean and unit variance (technically should be done after train test split )197# }}}198# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}199X = preprocessing.StandardScaler().fit(X).transform(X)200X[0:5]201# }}}202# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}203# # Classification 204# }}}205# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}206# Now, it is your turn, use the training set to build an accurate model. Then use the test set to report the accuracy of the model207# You should use the following algorithm:208# - K Nearest Neighbor(KNN)209# - Decision Tree210# - Support Vector Machine211# - Logistic Regression212#213#214#215# __ Notice:__ 216# - You can go above and change the pre-processing, feature selection, feature-extraction, and so on, to make a better model.217# - You should use either scikit-learn, Scipy or Numpy libraries for developing the classification algorithms.218# - You should include the code of the algorithm in the following cells.219# }}}220# # K Nearest Neighbor(KNN)221# Notice: You should find the best k to build the model with the best accuracy. 222# **warning:** You should not use the __loan_test.csv__ for finding the best k, however, you can split your train_loan.csv into train and test to find the best __k__.223# We split the X into train and test to find the best k224from sklearn.model_selection import train_test_split225X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=4)226print ('Train set:', X_train.shape, y_train.shape)227print ('Test set:', X_test.shape, y_test.shape)228# Modeling229from sklearn.neighbors import KNeighborsClassifier230k = 3231#Train Model and Predict 232kNN_model = KNeighborsClassifier(n_neighbors=k).fit(X_train,y_train)233kNN_model234# just for sanity chaeck235yhat = kNN_model.predict(X_test)236yhat[0:5]237# Best k238Ks=15239mean_acc=np.zeros((Ks-1))240std_acc=np.zeros((Ks-1))241ConfustionMx=[];242for n in range(1,Ks):243 244 #Train Model and Predict 245 kNN_model = KNeighborsClassifier(n_neighbors=n).fit(X_train,y_train)246 yhat = kNN_model.predict(X_test)247 248 249 mean_acc[n-1]=np.mean(yhat==y_test);250 251 std_acc[n-1]=np.std(yhat==y_test)/np.sqrt(yhat.shape[0])252mean_acc253# Building the model again, using k=7254from sklearn.neighbors import KNeighborsClassifier255k = 7256#Train Model and Predict 257kNN_model = KNeighborsClassifier(n_neighbors=k).fit(X_train,y_train)258kNN_model259# # Decision Tree260from sklearn.tree import DecisionTreeClassifier261DT_model = DecisionTreeClassifier(criterion="entropy", max_depth = 4)262DT_model.fit(X_train,y_train)263DT_model264yhat = DT_model.predict(X_test)265yhat266# # Support Vector Machine267from sklearn import svm268SVM_model = svm.SVC()269SVM_model.fit(X_train, y_train) 270yhat = SVM_model.predict(X_test)271yhat272# # Logistic Regression273from sklearn.linear_model import LogisticRegression274LR_model = LogisticRegression(C=0.01).fit(X_train,y_train)275LR_model276yhat = LR_model.predict(X_test)277yhat278# # Model Evaluation using Test set279from sklearn.metrics import jaccard_similarity_score280from sklearn.metrics import f1_score281from sklearn.metrics import log_loss282# First, download and load the test set:283# !wget -O loan_test.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/loan_test.csv284# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}285# ### Load Test set for evaluation 286# }}}287# {{{ button=false new_sheet=false run_control={"read_only": false} jupyter={"outputs_hidden": true}288test_df = pd.read_csv('loan_test.csv')289test_df.head()290# }}}291## Preprocessing292test_df['due_date'] = pd.to_datetime(test_df['due_date'])293test_df['effective_date'] = pd.to_datetime(test_df['effective_date'])294test_df['dayofweek'] = test_df['effective_date'].dt.dayofweek295test_df['weekend'] = test_df['dayofweek'].apply(lambda x: 1 if (x>3) else 0)296test_df['Gender'].replace(to_replace=['male','female'], value=[0,1],inplace=True)297test_Feature = test_df[['Principal','terms','age','Gender','weekend']]298test_Feature = pd.concat([test_Feature,pd.get_dummies(test_df['education'])], axis=1)299test_Feature.drop(['Master or Above'], axis = 1,inplace=True)300test_X = preprocessing.StandardScaler().fit(test_Feature).transform(test_Feature)301test_X[0:5]302test_y = test_df['loan_status'].values303test_y[0:5]304knn_yhat = kNN_model.predict(test_X)305print("KNN Jaccard index: %.2f" % jaccard_similarity_score(test_y, knn_yhat))306print("KNN F1-score: %.2f" % f1_score(test_y, knn_yhat, average='weighted') )307DT_yhat = DT_model.predict(test_X)308print("DT Jaccard index: %.2f" % jaccard_similarity_score(test_y, DT_yhat))309print("DT F1-score: %.2f" % f1_score(test_y, DT_yhat, average='weighted') )310SVM_yhat = SVM_model.predict(test_X)311print("SVM Jaccard index: %.2f" % jaccard_similarity_score(test_y, SVM_yhat))312print("SVM F1-score: %.2f" % f1_score(test_y, SVM_yhat, average='weighted') )313LR_yhat = LR_model.predict(test_X)314LR_yhat_prob = LR_model.predict_proba(test_X)315print("LR Jaccard index: %.2f" % jaccard_similarity_score(test_y, LR_yhat))316print("LR F1-score: %.2f" % f1_score(test_y, LR_yhat, average='weighted') )317print("LR LogLoss: %.2f" % log_loss(test_y, LR_yhat_prob))318# # Report319# You should be able to report the accuracy of the built model using different evaluation metrics:320# | Algorithm | Jaccard | F1-score | LogLoss |321# |--------------------|---------|----------|---------|322# | KNN | 0.67 | 0.63 | NA |323# | Decision Tree | 0.72 | 0.74 | NA |324# | SVM | 0.80 | 0.76 | NA |325# | LogisticRegression | 0.74 | 0.66 | 0.57 |326# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}327# ## Want to learn more?328#329# IBM SPSS Modeler is a comprehensive analytics platform that has many machine learning algorithms. It has been designed to bring predictive intelligence to decisions made by individuals, by groups, by systems â by your enterprise as a whole. A free trial is available through this course, available here: [SPSS Modeler](http://cocl.us/ML0101EN-SPSSModeler).330#331# Also, you can use Watson Studio to run these notebooks faster with bigger datasets. Watson Studio is IBM's leading cloud solution for data scientists, built by data scientists. With Jupyter notebooks, RStudio, Apache Spark and popular libraries pre-packaged in the cloud, Watson Studio enables data scientists to collaborate on their projects without having to install anything. Join the fast-growing community of Watson Studio users today with a free account at [Watson Studio](https://cocl.us/ML0101EN_DSX)332#333#334# <hr>335# Copyright © 2018 [Cognitive Class](https://cocl.us/DX0108EN_CC). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/).â336# }}}337# {{{ [markdown] button=false new_sheet=false run_control={"read_only": false}338# ### Thanks for completing this lesson!339#340# Notebook created by: <a href = "https://ca.linkedin.com/in/saeedaghabozorgi">Saeed Aghabozorgi</a>...
Config_Run_Control.py
Source:Config_Run_Control.py
1# -*- coding: utf-8 -*-2"""3Created on Tue Nov 5 01:07:39 20194@author: seongpar5"""6#import datetime as dt7# import Lib_Corp_Model as Corp8import Lib_BSCR_Calc as BSCR_calc9import Config_BSCR as BSCR_Config10import pandas as pd11def update_runControl(run_control):12 13 if run_control._run_control_ver == '2018Q4_Base':14 ### Hard-coded inputs from I_Control15 run_control.time0_LTIC = 9763349216 run_control.surplus_life_0 = 1498468068.6239217 run_control.surplus_PC_0 = 1541220172.2532418 run_control.I_SFSLiqSurplus = 1809687680.1417819 run_control.GAAP_Reserve_method = 'Roll-forward' #### 'Product_Level" or 'Roll-forward'20 ### Update assumptions as needed #####21 # Load ModCo Asset Projection22 BSCR_mapping = run_control.modco_BSCR_mapping23 BSCR_charge = BSCR_Config.BSCR_Asset_Risk_Charge_v124 25 run_control.asset_proj_modco = pd.read_csv("L:\Global Profitability Standards and ALM\Legacy Portfolio\SAM RE\Fortitude-Re Asset Model Team\___Ad-Hoc___\Asset Category projection\cm_input_asset_category_proj_annual_20200515.csv")26 # run_control.asset_proj_modco = pd.read_csv("### Please fill file path here ###")27 run_control.asset_proj_modco.columns = ['val_date', 'proj_time', 'rowNo', 'LOB', 'asset_class', 'MV', 'BV', 'Dur', 'run_id']28 # run_control.asset_proj_modco = Corp.get_asset_category_proj(run_control._val_date, 'alm', freq = run_control._freq)29 run_control.asset_proj_modco['MV_Dur'] = run_control.asset_proj_modco['MV'] * run_control.asset_proj_modco['Dur']30 run_control.asset_proj_modco['FI_Alts'] = run_control.asset_proj_modco.apply(lambda x: 'Alts' if x['asset_class'] == 'Alts' else 'FI', axis=1)31 run_control.asset_proj_modco['risk_charge_factor'] \32 = run_control.asset_proj_modco.apply(lambda \33 x: BSCR_calc.proj_BSCR_asset_risk_charge(BSCR_mapping[x['asset_class']], BMA_asset_risk_charge = BSCR_charge), axis=1)34 35 run_control.asset_proj_modco['asset_risk_charge'] = run_control.asset_proj_modco['MV'] * run_control.asset_proj_modco['risk_charge_factor']36 37 run_control.asset_proj_modco_agg = run_control.asset_proj_modco.groupby(['val_date', 'rowNo', 'proj_time', 'FI_Alts']).sum().reset_index() 38 run_control.asset_proj_modco_agg['Dur'] = run_control.asset_proj_modco_agg['MV_Dur'] / run_control.asset_proj_modco_agg['MV']39 run_control.asset_proj_modco_agg['risk_charge_factor'] = run_control.asset_proj_modco_agg['asset_risk_charge'] / run_control.asset_proj_modco_agg['MV']40 run_control.asset_proj_modco_agg.fillna(0, inplace=True)41 42 # Dividend Schedule43 run_control.proj_schedule[1]['dividend_schedule'] = 'Y'44 run_control.proj_schedule[1]['dividend_schedule_amt'] = 50000000045 run_control.proj_schedule[2]['dividend_schedule'] = 'Y'46 run_control.proj_schedule[2]['dividend_schedule_amt'] = 100000000047 run_control.proj_schedule[3]['dividend_schedule'] = 'Y'48 run_control.proj_schedule[3]['dividend_schedule_amt'] = 100000000049 50 # LOC SFS Limit51 SFS_limit = [ 0.25,52 0.248285375,53 0.252029898,54 0.285955276,55 0.38330347,56 0.441096663,57 0.460419183,58 0.470052356,59 0.462474131,60 0.448079715,61 0.42905067,62 0.401161493,63 0.374517694,64 0.354974144,65 0.312988995,66 0.298355834,67 0.301492895,68 0.294050736,69 0.284305953,70 0.274386361,71 0.264800284,72 0.257449286,73 0.253203818,74 0.248658944,75 0.240894683,76 0.235788524,77 0.224783331,78 0.218340687,79 0.214482419,80 0.209553337,81 0.208785641,82 0.211190424,83 0.210920573,84 0.216687659,85 0.222172075,86 0.227580018,87 0.232088173,88 0.236832925,89 0.241751243,90 0.245726161,91 0.248599255,92 0.248058163,93 0.246790226,94 0.24740993,95 0.247595613,96 0.247582842,97 0.244536254,98 0.254065247,99 0.246678196,100 0.242226328,101 0.25,102 0.242866482,103 0.244004135,104 0.301949013,105 0.244575476,106 0.244081956,107 0.243021021,108 0.250817216,109 0.252697673,110 0.200005319,111 0.200005319,112 0.200005319,113 0.200005319,114 0.200005319,115 0.200005319,116 0.200005319,117 0.200005319,118 0.200005319,119 0.200005319,120 0.200005319,121 0.200005319,122 0.200005319,123 0.200005319,124 0.200005319,125 0.200006653,126 0.200011781,127 0.2,128 0.2,129 0.2,130 0.2]131 132 for i in range(min(len(run_control._dates), len(SFS_limit))):133 run_control.proj_schedule[i]['LOC_SFS_Limit'] = SFS_limit[i]134 135 # Initial and Ultimate Spread136 run_control.initial_spread = {137 'Term' : [1, 2, 3, 5, 7, 10, 20, 30 ],138 'AAA' : [ 0.0016, 0.0028, 0.0035, 0.0037, 0.0042, 0.0054, 0.0069, 0.0084 ],139 'AA' : [ 0.0019, 0.0034, 0.0042, 0.0048, 0.0055, 0.007, 0.0085, 0.0099 ],140 'A' : [ 0.0023, 0.0042, 0.0052, 0.0062, 0.0095, 0.0109, 0.0105, 0.0127 ],141 'BBB' : [ 0.0044, 0.0082, 0.0092, 0.0114, 0.0156, 0.0162, 0.0161, 0.0183 ],142 'BB' : [ 0.012, 0.0162, 0.0193, 0.0234, 0.0267, 0.0306, 0.0337, 0.0368 ],143 'B' : [ 0.0183, 0.0237, 0.0277, 0.0331, 0.0373, 0.0421, 0.046, 0.0499 ],144 'CCC' : [ 0.0546, 0.0579, 0.0602, 0.0631, 0.0655, 0.0684, 0.0705, 0.0726 ]145 }146 147 run_control.ultimate_spread = {148 'Term' : [1, 2, 3, 5, 7, 10, 20, 30 ],149 'AAA' : [ 0.0049, 0.0053, 0.0058, 0.0065, 0.0066, 0.0062, 0.0065, 0.008],150 'AA' : [ 0.0053, 0.0059, 0.0066, 0.0076, 0.0081, 0.0081, 0.0091, 0.0107 ],151 'A' : [ 0.0073, 0.0081, 0.009, 0.0106, 0.0114, 0.0117, 0.0134, 0.0154 ],152 'BBB' : [ 0.0123, 0.0136, 0.0149, 0.0173, 0.0187, 0.0196, 0.0219, 0.0238 ],153 'BB' : [ 0.0295, 0.0311, 0.0328, 0.0357, 0.0375, 0.0386, 0.0407, 0.0421 ],154 'B' : [ 0.0492, 0.0569, 0.059, 0.0626, 0.0647, 0.0661, 0.068, 0.0681 ],155 'CCC' : [ 0.1221, 0.1255, 0.1269, 0.1288, 0.1287, 0.1263, 0.1196, 0.1129 ]...
Learn to execute automation testing from scratch with LambdaTest Learning Hub. Right from setting up the prerequisites to run your first automation test, to following best practices and diving deeper into advanced test scenarios. LambdaTest Learning Hubs compile a list of step-by-step guides to help you be proficient with different test automation frameworks i.e. Selenium, Cypress, TestNG etc.
You could also refer to video tutorials over LambdaTest YouTube channel to get step by step demonstration from industry experts.
Get 100 minutes of automation test minutes FREE!!