Skip to content

Sklearn

Sklearn is built for general machine learning.

import sklearn
pip3 install sklearn
python3 -m pip install --upgrade sklearn

Training and test set

from sklearn.model_selection import train_test_split ||| |---|---| X_train, X_test, Y_train, Y_test =train_test_split(X,Y,test_size=.2)| Split arrays or matrices to training and testing sets, |

Preprocessing Data

from sklearn import preprocessing

Standardization

Normalize each column individually to unit norm. ||| |---|---| st_scaler = preprocessing.StandardScaler().fit(X_train)| Calculate the mean (m) and std (s) from X_train columns stand_X = st_scaler.transform(X)| Standardize X using m and s

Normalization

Normalize row individually to unit norm.

nr_scaler = preprocessing.Normalizer().fit(X_train) Calculate the mean (m) and std (s) from X_train rows
normal_X = nr_scaler.transform(X) Normalize X using m and s

Binarization

Binarize the data for a given threshold.

bn = preprocessing.Binarizer(threshold=0.0).fit(X) Define binarization for a given threshold
bin_X = bn.transform(X) Run binarization on X

Encoding

Relabel the elements of array to integer number ||| |---|---| le = preprocessing.LabelEncoder()| Assign the label coding function to le y = le.fit_transform(y)| Run transformation on y

Modeling

Building model

from sklearn.linear_model import LinearRegression ||| |---|---| lr = LinearRegression(normalize=True)| Create linear model| |from sklearn.svm import SVC
svc = SVC(kernel='linear')| Create support vector machine for classification| |from sklearn import neighbors
knn = neighbors.KNeighborsClassifier(n_neighbors=4)| Create Knn | |from sklearn.cluster import KMeans
k_means = KMeans(n_clusters=4, random_state=0)| Create KMeans |

Fitting model to Data

lr.fit(X_train, Y_train) Fit linear model
svc.fit(X_train, Y_train) fit SVC
knn.fit(X_train, Y_train) fit knn
k_means.fit(X_train) Fit KMeans

Predicting

Y_pred = lr.predict(X_test) Predict using linear
Y_pred = svc.predict(X_test) Predict using SVC
Y_pred = knn.predict_proba(X_test) Predict using Knn
Y_pred = k_means.predict(X_test) Predict using KMeans

Evaluation

Linear model

from sklearn.metrics import mean_absolute_error, mean_squared_error,r2_score ||| |---|---| mean_absolute_error(Y_test, Y_pred)| Calculate MAE mean_squared_error(Y_test, Y_pred)| Calculate MSE r2_score(Y_test, Y_pred) | Calculate R-square

Classification

from sklearn.metrics import accuracy_score,classification_report,confusion_matrix ||| |---|---| accuracy_score(Y_test, Y_pred)|Calculate accuracy| classification_report(Y_test, Y_pred)| Calculate different classification metrics| confusion_matrix(Y_test, Y_pred)| Calculate the confusion matrix

Clustering

from sklearn.metrics import adjusted_rand_score,homogeneity_score,v_measure_score ||| |---|---| adjusted_rand_score(Y_true, Y_pred)| Calculate adjusted_rand_score homogeneity_score(Y_true, Y_pred)| Calculate homogeneity_score v_measure_score(Y_true, Y_pred)| Calculate v_measure_score

Crosse-validation

from sklearn.cross_validation import cross_val_score ||| |---|---| |cross_val_score(lr, X_train, Y_train, cv=2)| Cross-validation on linear model |cross_val_score(knn, X_train, Y_train, cv=4)|Cross-validation on Knn

Tuning Model

from sklearn.grid_search import GridSearchCV ||| |---|---| params = {"n_neighbors": np.arange(1,3),"metric": ["euclidean", "cityblock"]}| Define parameters grid = GridSearchCV(estimator=knn,param_grid=params)| Define estimator grid.fit(X_train, y_train)| Fit to training set print(grid.best_score_)| print the best score print(grid.best_estimator_.n_neighbors)| Print the estimate of parameters

Randomized parameter optimization

from sklearn.grid_search import RandomizedSearchCV ||| |---|---| params = {"n_neighbors": range(1,5),"weights": ["uniform", "distance"]}| Define parameters rand_search = RandomizedSearchCV(estimator=knn, param_distributions=params, cv=4, n_iter=8,random_state=5)| Define estimator rand_search.fit(X_train, y_train)| Fit to training set print(rand_search.best_score_)| print the best score print(grid.best_estimator_.n_neighbors)| Print the estimate of parameters