How to use outliers method in pytest-benchmark

Best Python code snippet using pytest-benchmark

outliers.py

Source:outliers.py Github

copy

Full Screen

1import pandas as pd # importing libraries2import numpy as np3from matplotlib import pyplot as plt4import seaborn as sns5from feature_engine.outliers import Winsorizer67df = pd.read_csv("bostondata.csv") # importing dataset8df910# checking outliers for crim feature ********************************************1112sns.boxplot(df.crim) # using box plot outliers detected 1314Q1 = df.crim.quantile(0.25) # calculating lower quartile15Q116Q3 = df.crim.quantile(0.75) # calculating upper quartile17Q318IQR = Q3 - Q1 # calculating IQR19IQR20upper_limit = df.crim.quantile(0.75) +1.5*IQR # calculating upper limit 21upper_limit2223lower_limit = df.crim.quantile(0.25) -1.5*IQR # calculating lower limit24lower_limit25# detecting outliers in terms of true2627outliers_df = np.where((df.crim > upper_limit), True, np.where(df.crim< lower_limit, True, False))28outliers_df29df_trimmed = df.loc[~(outliers_df), ] # trimming the outliers30df_trimmed3132df.shape,df_trimmed.shape # shape of original dataframe and trimmed dataframe33sns.boxplot(df_trimmed.crim) # checking outliers in trimmed dataframe3435# replacing outliers to their upper limit and lower limit respectively3637df['df_replaced'] = pd.DataFrame(np.where(df['crim'] > upper_limit, upper_limit, np.where(df['crim'] < lower_limit, lower_limit, df['crim'])))3839plt.boxplot(df.df_replaced) # checking outliers in replaced dataframe4041# replacing outliers using winsorization4243# initialising winsorizer with iqr as capping method fold up to 1.544winsor = Winsorizer(capping_method='iqr',tail='both',fold=1.5,variables=['crim'])4546df_t = winsor.fit_transform(df[['crim']]) # transforming for replacing outliers 47sns.boxplot(df_t.crim) # checking for outliers 484950# checking outliers for zn feature ************************************************5152sns.boxplot(df.zn) # checking outliers for zn 5354Q1 = df.zn.quantile(0.25) # calculating lower quartile55Q156Q3 = df.zn.quantile(0.75) # calculating upper quartile57Q358IQR = Q3 - Q1 # calculating IQR59IQR60upper_limit = df.zn.quantile(0.75) +1.5*IQR # calculating upper limit 61upper_limit6263lower_limit = df.zn.quantile(0.25) -1.5*IQR # calculating lower limit64lower_limit65# detecting outliers in terms of true6667outliers_df = np.where((df.zn > upper_limit), True, np.where(df.zn< lower_limit, True, False))68outliers_df69df_trimmed = df.loc[~(outliers_df), ] # trimming the outliers70df_trimmed7172df.shape,df_trimmed.shape # shape of original dataframe and trimmed dataframe73sns.boxplot(df_trimmed.zn) # checking outliers in trimmed dataframe7475# replacing outliers to their upper limit and lower limit respectively7677df['df_replaced'] = pd.DataFrame(np.where(df['zn'] > upper_limit, upper_limit, np.where(df['zn'] < lower_limit, lower_limit, df['zn'])))7879plt.boxplot(df.df_replaced) # checking outliers in replaced dataframe8081# replacing outliers using winsorization8283# initialising winsorizer with iqr as capping method fold up to 1.584winsor = Winsorizer(capping_method='iqr',tail='both',fold=1.5,variables=['zn'])8586df_t = winsor.fit_transform(df[['zn']]) # transforming for replacing outliers 87sns.boxplot(df_t.zn) # checking for outliers 888990# checking outliers for indus feature*********************************************9192sns.boxplot(df.indus) # no outlier detected939495# checking outlier for chas feature **********************************************9697sns.boxplot(df.chas) # checking for outlier 9899Q1 = df.chas.quantile(0.25) # calculating lower quartile100Q1101Q3 = df.chas.quantile(0.75) # calculating upper quartile102Q3103IQR = Q3 - Q1 # calculating IQR104IQR105upper_limit = df.chas.quantile(0.75) +1.5*IQR # calculating upper limit 106upper_limit107108lower_limit = df.chas.quantile(0.25) -1.5*IQR # calculating lower limit109lower_limit110# detecting outliers in terms of true111112outliers_df = np.where((df.chas > upper_limit), True, np.where(df.chas< lower_limit, True, False))113outliers_df114df_trimmed = df.loc[~(outliers_df), ] # trimming the outliers115df_trimmed116117df.shape,df_trimmed.shape # shape of original dataframe and trimmed dataframe118sns.boxplot(df_trimmed.chas) # checking outliers in trimmed dataframe119120# replacing outliers to their upper limit and lower limit respectively121122df['df_replaced'] = pd.DataFrame(np.where(df['chas'] > upper_limit, upper_limit, np.where(df['chas'] < lower_limit, lower_limit, df['chas'])))123124plt.boxplot(df.df_replaced) # checking outliers in replaced dataframe125126# replacing outliers using winsorization127128# initialising winsorizer with iqr as capping method fold up to 1.5129winsor = Winsorizer(capping_method='iqr',tail='both',fold=1.5,variables=['chas'])130131df_t = winsor.fit_transform(df[['chas']]) # transforming for replacing outliers 132sns.boxplot(df_t.chas) # checking for outliers 133134# checking outliers for nox feature **********************************************135136sns.boxplot(df.nox) # no outlier detected137138139# checking outliers for rm feature ***********************************************140141sns.boxplot(df.rm) # outliers detected142143Q1 = df.rm.quantile(0.25) # calculating lower quartile144Q1145Q3 = df.rm.quantile(0.75) # calculating upper quartile146Q3147IQR = Q3 - Q1 # calculating IQR148IQR149upper_limit = df.rm.quantile(0.75) +1.5*IQR # calculating upper limit 150upper_limit151152lower_limit = df.rm.quantile(0.25) -1.5*IQR # calculating lower limit153lower_limit154# detecting outliers in terms of true155156outliers_df = np.where((df.rm > upper_limit), True, np.where(df.rm< lower_limit, True, False))157outliers_df158df_trimmed = df.loc[~(outliers_df), ] # trimming the outliers159df_trimmed160161df.shape,df_trimmed.shape # shape of original dataframe and trimmed dataframe162sns.boxplot(df_trimmed.rm) # checking outliers in trimmed dataframe163164# replacing outliers to their upper limit and lower limit respectively165166df['df_replaced'] = pd.DataFrame(np.where(df['rm'] > upper_limit, upper_limit, np.where(df['rm'] < lower_limit, lower_limit, df['rm'])))167168plt.boxplot(df.df_replaced) # checking outliers in replaced dataframe169170# replacing outliers using winsorization171172# initialising winsorizer with iqr as capping method fold up to 1.5173winsor = Winsorizer(capping_method='iqr',tail='both',fold=1.5,variables=['rm'])174175df_t = winsor.fit_transform(df[['rm']]) # transforming for replacing outliers 176sns.boxplot(df_t.rm) # checking for outliers 177178179# checking outliers for age feature **********************************************180181sns.boxplot(df.age) # no outlier detected182183184# checking outlier for dis feature ***********************************************185186sns.boxplot(df.dis) # outliers detected187188Q1 = df.dis.quantile(0.25) # calculating lower quartile189Q1190Q3 = df.dis.quantile(0.75) # calculating upper quartile191Q3192IQR = Q3 - Q1 # calculating IQR193IQR194upper_limit = df.dis.quantile(0.75) +1.5*IQR # calculating upper limit 195upper_limit196197lower_limit = df.dis.quantile(0.25) -1.5*IQR # calculating lower limit198lower_limit199# detecting outliers in terms of true200201outliers_df = np.where((df.dis > upper_limit), True, np.where(df.dis< lower_limit, True, False))202outliers_df203df_trimmed = df.loc[~(outliers_df), ] # trimming the outliers204df_trimmed205206df.shape,df_trimmed.shape # shape of original dataframe and trimmed dataframe207sns.boxplot(df_trimmed.dis) # checking outliers in trimmed dataframe208209# replacing outliers to their upper limit and lower limit respectively210211df['df_replaced'] = pd.DataFrame(np.where(df['dis'] > upper_limit, upper_limit, np.where(df['dis'] < lower_limit, lower_limit, df['dis'])))212213plt.boxplot(df.df_replaced) # checking outliers in replaced dataframe214215# replacing outliers using winsorization216217# initialising winsorizer with iqr as capping method fold up to 1.5218winsor = Winsorizer(capping_method='iqr',tail='both',fold=1.5,variables=['dis'])219220df_t = winsor.fit_transform(df[['dis']]) # transforming for replacing outliers 221sns.boxplot(df_t.dis) # checking for outliers 222223224# checking outliers for rad feature **********************************************225226sns.boxplot(df.rad) # no outlier detected227228# checking outliers for tax feature **********************************************229 230sns.boxplot(df.tax) # no outliers detected231232# checking outliers for ptratio feature ******************************************233234sns.boxplot(df.ptratio) # outliers detected235236Q1 = df.ptratio.quantile(0.25) # calculating lower quartile237Q1238Q3 = df.ptratio.quantile(0.75) # calculating upper quartile239Q3240IQR = Q3 - Q1 # calculating IQR241IQR242upper_limit = df.ptratio.quantile(0.75) +1.5*IQR # calculating upper limit 243upper_limit244245lower_limit = df.ptratio.quantile(0.25) -1.5*IQR # calculating lower limit246lower_limit247# detecting outliers in terms of true248249outliers_df = np.where((df.ptratio > upper_limit), True, np.where(df.ptratio< lower_limit, True, False))250outliers_df251df_trimmed = df.loc[~(outliers_df), ] # trimming the outliers252df_trimmed253254df.shape,df_trimmed.shape # shape of original dataframe and trimmed dataframe255sns.boxplot(df_trimmed.ptratio) # checking outliers in trimmed dataframe256257# replacing outliers to their upper limit and lower limit respectively258259df['df_replaced'] = pd.DataFrame(np.where(df['ptratio'] > upper_limit, upper_limit, np.where(df['ptratio'] < lower_limit, lower_limit, df['ptratio'])))260261plt.boxplot(df.df_replaced) # checking outliers in replaced dataframe262263# replacing outliers using winsorization264265# initialising winsorizer with iqr as capping method fold up to 1.5266winsor = Winsorizer(capping_method='iqr',tail='both',fold=1.5,variables=['ptratio'])267268df_t = winsor.fit_transform(df[['ptratio']]) # transforming for replacing outliers 269sns.boxplot(df_t.ptratio) # checking for outliers 270271272# checking outliers for black feature *********************************************273274sns.boxplot(df.black) # outliers detected275 276Q1 = df.black.quantile(0.25) # calculating lower quartile277Q1278Q3 = df.black.quantile(0.75) # calculating upper quartile279Q3280IQR = Q3 - Q1 # calculating IQR281IQR282upper_limit = df.black.quantile(0.75) +1.5*IQR # calculating upper limit 283upper_limit284285lower_limit = df.black.quantile(0.25) -1.5*IQR # calculating lower limit286lower_limit287# detecting outliers in terms of true288289outliers_df = np.where((df.black > upper_limit), True, np.where(df.black< lower_limit, True, False))290outliers_df291df_trimmed = df.loc[~(outliers_df), ] # trimming the outliers292df_trimmed293294df.shape,df_trimmed.shape # shape of original dataframe and trimmed dataframe295sns.boxplot(df_trimmed.black) # checking outliers in trimmed dataframe296297# replacing outliers to their upper limit and lower limit respectively298299df['df_replaced'] = pd.DataFrame(np.where(df['black'] > upper_limit, upper_limit, np.where(df['black'] < lower_limit, lower_limit, df['black'])))300301plt.boxplot(df.df_replaced) # checking outliers in replaced dataframe302303# replacing outliers using winsorization304305# initialising winsorizer with iqr as capping method fold up to 1.5306winsor = Winsorizer(capping_method='iqr',tail='both',fold=1.5,variables=['black'])307308df_t = winsor.fit_transform(df[['black']]) # transforming for replacing outliers 309sns.boxplot(df_t.black) # checking for outliers 310311312# checking outliers for lstat feature *********************************************313314sns.boxplot(df.lstat ) # outliers detected 315316Q1 = df.lstat.quantile(0.25) # calculating lower quartile317Q1318Q3 = df.lstat.quantile(0.75) # calculating upper quartile319Q3320IQR = Q3 - Q1 # calculating IQR321IQR322upper_limit = df.lstat.quantile(0.75) +1.5*IQR # calculating upper limit 323upper_limit324325lower_limit = df.lstat.quantile(0.25) -1.5*IQR # calculating lower limit326lower_limit327# detecting outliers in terms of true328329outliers_df = np.where((df.lstat > upper_limit), True, np.where(df.lstat< lower_limit, True, False))330outliers_df331df_trimmed = df.loc[~(outliers_df), ] # trimming the outliers332df_trimmed333334df.shape,df_trimmed.shape # shape of original dataframe and trimmed dataframe335sns.boxplot(df_trimmed.lstat) # checking outliers in trimmed dataframe336337# replacing outliers to their upper limit and lower limit respectively338339df['df_replaced'] = pd.DataFrame(np.where(df['lstat'] > upper_limit, upper_limit, np.where(df['lstat'] < lower_limit, lower_limit, df['lstat'])))340341plt.boxplot(df.df_replaced) # checking outliers in replaced dataframe342343# replacing outliers using winsorization344345# initialising winsorizer with iqr as capping method fold up to 1.5346winsor = Winsorizer(capping_method='iqr',tail='both',fold=1.5,variables=['lstat'])347348df_t = winsor.fit_transform(df[['lstat']]) # transforming for replacing outliers 349sns.boxplot(df_t.lstat) # checking for outliers 350351352# checking outliers for medv feature *********************************************353354sns.boxplot(df.medv)355356Q1 = df.medv.quantile(0.25) # calculating lower quartile357Q1358Q3 = df.medv.quantile(0.75) # calculating upper quartile359Q3360IQR = Q3 - Q1 # calculating IQR361IQR362upper_limit = df.medv.quantile(0.75) +1.5*IQR # calculating upper limit 363upper_limit364365lower_limit = df.medv.quantile(0.25) -1.5*IQR # calculating lower limit366lower_limit367# detecting outliers in terms of true368369outliers_df = np.where((df.medv > upper_limit), True, np.where(df.medv< lower_limit, True, False))370outliers_df371df_trimmed = df.loc[~(outliers_df), ] # trimming the outliers372df_trimmed373374df.shape,df_trimmed.shape # shape of original dataframe and trimmed dataframe375sns.boxplot(df_trimmed.medv) # checking outliers in trimmed dataframe376377# replacing outliers to their upper limit and lower limit respectively378379df['df_replaced'] = pd.DataFrame(np.where(df['medv'] > upper_limit, upper_limit, np.where(df['medv'] < lower_limit, lower_limit, df['medv'])))380381plt.boxplot(df.df_replaced) # checking outliers in replaced dataframe382383# replacing outliers using winsorization384385# initialising winsorizer with iqr as capping method fold up to 1.5386winsor = Winsorizer(capping_method='iqr',tail='both',fold=1.5,variables=['medv'])387388df_t = winsor.fit_transform(df[['medv']]) # transforming for replacing outliers 389sns.boxplot(df_t.medv) # checking for outliers 390391392393394395396397398399400401402403404405406407408409 ...

Full Screen

Full Screen

load_data.py

Source:load_data.py Github

copy

Full Screen

...7from sklearn.preprocessing import StandardScaler8import scipy.stats as st9from collections import Counter10from sklearn.decomposition import PCA11def feature_outliers(dataframe, col, param=1.5):12 Q1 = np.percentile(dataframe[col], 25)13 Q3 = np.percentile(dataframe[col], 75)14 tukey_window = param*(Q3-Q1)15 less_than_Q1 = dataframe[col] < Q1 - tukey_window16 greater_than_Q3 = dataframe[col] > Q3 + tukey_window17 tukey_mask = (less_than_Q1 | greater_than_Q3)18 return dataframe[tukey_mask]19def multiple_outliers(dataframe, count=2):20 raw_outliers = []21 for col in dataframe:22 outlier_df = feature_outliers(dataframe, col)23 raw_outliers += list(outlier_df.index)24 outlier_count = Counter(raw_outliers)25 outliers = [k for k,v in outlier_count.items() if v >= count]26 return outliers27customers = pd.read_csv('Wholesale_customers_data.csv')28customers.Region = customers.Region.astype('category')29customers.Channel = customers.Channel.astype('category')30customer_features = customers.select_dtypes([int])31scaler = StandardScaler()32customer_sc = scaler.fit_transform(customer_features)33customer_sc_df = pd.DataFrame(customer_sc, columns=customer_features.columns)34customer_log_df = np.log(1+customer_features)35scaler.fit(customer_log_df)36customer_log_sc = scaler.transform(customer_log_df)37customer_log_sc_df = pd.DataFrame(customer_log_sc, columns=customer_features.columns)38customer_box_cox_df = pd.DataFrame()39for col in customer_features.columns:40 box_cox_trans = st.boxcox(customer_features[col])[0]41 customer_box_cox_df[col] = pd.Series(box_cox_trans)42del box_cox_trans43scaler.fit(customer_box_cox_df)44customer_box_cox_sc = scaler.transform(customer_box_cox_df)45customer_box_cox_sc_df = pd.DataFrame(customer_box_cox_sc, columns=customer_features.columns)46customer_features_outliers_removed = customer_features.drop(multiple_outliers(customer_features))47customer_sc_df_outliers_removed = customer_sc_df.drop(multiple_outliers(customer_sc_df))48customer_log_sc_df_outliers_removed = customer_log_sc_df.drop(multiple_outliers(customer_log_sc_df))49customer_box_cox_sc_df_outliers_removed = customer_box_cox_sc_df.drop(multiple_outliers(customer_box_cox_sc_df))50channel_original_outliers_removed = np.array(customers.Channel.loc[customer_features_outliers_removed.index].values) - 151channel_scaled_outliers_removed = np.array(customers.Channel.loc[customer_sc_df_outliers_removed.index].values) - 152channel_log_outliers_removed = np.array(customers.Channel.loc[customer_log_sc_df_outliers_removed.index].values) - 153channel_box_cox_outliers_removed = np.array(customers.Channel.loc[customer_box_cox_sc_df_outliers_removed.index].values) - 154customer_features_pca_2 = PCA(2).fit_transform(customer_features)55customer_sc_pca_2 = PCA(2).fit_transform(customer_sc_df)56customer_log_sc_pca_2 = PCA(2).fit_transform(customer_log_sc_df)57customer_box_cox_sc_pca_2 = PCA(2).fit_transform(customer_box_cox_sc_df)58customer_features_outliers_removed_pca_2 = PCA(2).fit_transform(customer_features_outliers_removed)59customer_sc_outliers_removed_pca_2 = PCA(2).fit_transform(customer_sc_df_outliers_removed)60customer_log_sc_outliers_removed_pca_2 = PCA(2).fit_transform(customer_log_sc_df_outliers_removed)61customer_box_cox_sc_outliers_removed_pca_2 = PCA(2).fit_transform(customer_box_cox_sc_df_outliers_removed)62customer_features_pca_3 = PCA(3).fit_transform(customer_features)63customer_sc_pca_3 = PCA(3).fit_transform(customer_sc_df)...

Full Screen

Full Screen

Automation Testing Tutorials

Learn to execute automation testing from scratch with LambdaTest Learning Hub. Right from setting up the prerequisites to run your first automation test, to following best practices and diving deeper into advanced test scenarios. LambdaTest Learning Hubs compile a list of step-by-step guides to help you be proficient with different test automation frameworks i.e. Selenium, Cypress, TestNG etc.

LambdaTest Learning Hubs:

YouTube

You could also refer to video tutorials over LambdaTest YouTube channel to get step by step demonstration from industry experts.

Run pytest-benchmark automation tests on LambdaTest cloud grid

Perform automation testing on 3000+ real desktop and mobile devices online.

Try LambdaTest Now !!

Get 100 minutes of automation test minutes FREE!!

Next-Gen App & Browser Testing Cloud

Was this article helpful?

Helpful

NotHelpful