Evaluation Method

April 13, 2022 6 분 소요

Evaluation Method

Ways to evaluate supervised learning

The evaluation method of supervised learning varies depending on whether it is a classification problem or a regression problem.

분류 문제	회귀 문제
혼동 행렬(confusion matrix)	평균제곱오차(mean square error)
정확도(accuracy)	결정계수(coefficient of determination)
정밀도(precision)
재현율(recall)
F값(F1-score)
곡선아래면적(area under the curve, AUC)

Even if evaluation indicators are appropriately selected, a good model cannot be created if they are overfitting the learning data. This should be noted.

How to evaluate classification problems

from sklearn.datasets import load_breast_cancer

data = load_breast_cancer()

X = data.data

y = 1 - data.target

X = X[:, :10]

from sklearn.linear_model import LogisticRegression

model_lor = LogisticRegression(solver = 'lbfgs')

model_lor.fit(X, y)
y_pred = model_lor.predict(X)

Confusion matrix

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y, y_pred)

print(cm)

[[337  20]
 [ 30 182]]

	0	1
0	TN	FP
1	FN	TP

TN: Properly predict real negative data as negative
FP: Misleading prediction of true negative data as positive
FN: Misleading prediction of true positive data as negative
TP: Real Positive Data Predicts Positive
accuracy

precision

recall

F1-score

accuracy

Percentage of total forecast results correctly predicted

from sklearn.metrics import accuracy_score

print(accuracy_score(y, y_pred))

0.9121265377855887

precision

The percentage of the actual positive results that you predicted was really positive

from sklearn.metrics import precision_score

print(precision_score(y, y_pred))

0.900990099009901

recall

Percentage correctly predicted as positive as actual positive

from sklearn.metrics import recall_score

print(recall_score(y, y_pred))

0.8584905660377359

F1-score

reflecting both precision and recall

from sklearn.metrics import f1_score

print(f1_score(y, y_pred))

0.8792270531400966

prediction probability

Calculate the probability of being classified as 0 and the probability of being classified as 1 respectively

model_lor.predict_proba(X)

array([[7.67439178e-03, 9.92325608e-01],
       [2.03127465e-02, 9.79687254e-01],
       [2.34630052e-03, 9.97653699e-01],
       ...,
       [2.35208050e-02, 9.76479195e-01],
       [8.13260214e-06, 9.99991867e-01],
       [9.99572309e-01, 4.27690516e-04]])

import numpy as np

y_pred2 = (model_lor.predict_proba(X)[:, 1] > 0.1).astype(np.int64)

print(confusion_matrix(y, y_pred2))

[[267  90]
 [  6 206]]

print(accuracy_score(y, y_pred2))

print(recall_score(y, y_pred2))

0.8312829525483304
0.9716981132075472

ROC (Receiver Operating Characteristic) curve and AUC (area under the curve)

The ratio of FP is set to the horizontal axis and the ratio of TP is set to the vertical axis and displayed as a graph.
The change in the relationship between FP and TP can be seen graphically when the threshold is slightly lowered from 1 to be treated positive among the predicted probabilities.
If the area is up to 1 and the area under the curve is close to 1, the precision is high. On the other hand, around 0.5 is not well predicted.
The use of ROC is recommended when dealing with unbalanced data.

from sklearn.metrics import roc_curve

probas = model_lor.predict_proba(X)

fpr, tpr, thresholds = roc_curve(y, probas[:, 1])

%matplotlib inline
import matplotlib.pyplot as plt

plt.style.use('fivethirtyeight')
fig, ax = plt.subplots()

fig.set_size_inches(4.8, 5)
ax.step(fpr, tpr, 'gray')
ax.fill_between(fpr, tpr, 0, color = 'skyblue', alpha = 0.8)

ax.set_xlabel('False Postive Rate')
ax.set_ylabel('True Postive Rate')
ax.set_facecolor('xkcd:white')
plt.show()

png

from sklearn.metrics import roc_auc_score

roc_auc_score(y, probas[:, 1])

0.9741557000158554

Evaluation method of Regression Problem

import pandas as pd
import numpy as np

from sklearn.datasets import load_boston

data = load_boston()

X = data.data[:, [5, ]]

y = data.target

C:\Users\bl4an\anaconda3\lib\site-packages\sklearn\utils\deprecation.py:87: FutureWarning: Function load_boston is deprecated; `load_boston` is deprecated in 1.0 and will be removed in 1.2.

    The Boston housing prices dataset has an ethical problem. You can refer to
    the documentation of this function for further details.

    The scikit-learn maintainers therefore strongly discourage the use of this
    dataset unless the purpose of the code is to study and educate about
    ethical issues in data science and machine learning.

    In this special case, you can fetch the dataset from the original
    source::

        import pandas as pd
        import numpy as np


        data_url = "http://lib.stat.cmu.edu/datasets/boston"
        raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
        data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
        target = raw_df.values[1::2, 2]

    Alternative datasets include the California housing dataset (i.e.
    :func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing
    dataset. You can load the datasets as follows::

        from sklearn.datasets import fetch_california_housing
        housing = fetch_california_housing()

    for the California housing dataset and::

        from sklearn.datasets import fetch_openml
        housing = fetch_openml(name="house_prices", as_frame=True)

    for the Ames housing dataset.
    
  warnings.warn(msg, category=FutureWarning)

from sklearn.linear_model import LinearRegression

model_lir = LinearRegression()

model_lir.fit(X, y)
y_pred = model_lir.predict(X)

print(model_lir.coef_)
print(model_lir.intercept_)

[9.10210898]
-34.670620776438554

%matplotlib inline
import matplotlib.pyplot as plt

fig, ax = plt.subplots()

ax.scatter(X, y, color = 'pink', marker = 's', label = 'data set')
ax.plot(X, y_pred, color = 'blue', label = 'regression curve')

ax.legend()

plt.show()

png

mean square error

The mean square error is averaged after calculating both the square of error between the data to be evaluated and the predicted value.
The smaller the mean square error, the more correct the prediction is.

from sklearn.metrics import mean_squared_error

print(mean_squared_error(y, y_pred))

43.60055177116956

coefficient of determination

It usually appears as a value between 0.0 and 1.0.
If the error between the predicted value and the actual data is too large, it may be expressed as a negative value.
The closer to 1.0, the more accurately the model represents the data point.

from sklearn.metrics import r2_score

print(r2_score(y, y_pred))

0.48352545599133423

difference between mean square error and coefficient of determination

If the variance of the dependent variable is large, the mean square error is also large.
The coefficient of determination may be expressed as a value between 0.0 and 1.0 without depending on the variance of the dependent variable.

Compared to other algorithms

difference with SVC (Support Vector Classification) and SVR (Support Vector Regression)

SVC shall contain as few data points as possible in margin.
SVR should contain as many data points as possible in margin.

SVR Sample Code

from sklearn.svm import SVR

model_svr_linear = SVR(C = 0.01, kernel = 'linear')

model_svr_linear.fit(X, y)
y_svr_pred = model_svr_linear.predict(X)

%matplotlib inline
import matplotlib.pyplot as plt

fig, ax = plt.subplots()

ax.scatter(X, y, color = 'pink', marker = 's', label = 'data set')
ax.plot(X, y_pred, color = 'blue', label = 'regression curve')
ax.plot(X, y_svr_pred, color = 'red', label = 'SVR')

ax.legend()

plt.show()

png

Setting hyper parameter

The learning parameter is updated with the machine learning algorithm, but the hyperparameter must be set by the user before learning.
Failure to set appropriate hyperparameters results in poor model performance.

print(mean_squared_error(y, y_svr_pred))
print(r2_score(y, y_svr_pred))
print(model_svr_linear.coef_)
print(model_svr_linear.intercept_)

72.14197118147209
0.14543531775956597
[[1.64398]]
[11.13520958]

model_svr_rbf = SVR(C=1.0, kernel = 'rbf', gamma = 'auto')

model_svr_rbf.fit(X, y)
y_svr_pred = model_svr_rbf.predict(X)

print(mean_squared_error(y, y_svr_pred))
print(r2_score(y, y_svr_pred))

36.42126375260171
0.5685684051071418

Over-fitting

The phenomenon in which good prediction results are shown as learning data, but bad prediction results are shown as test data that are not used for learning is called overfitting.
The model performance when using unknown data is called generalization performance.
Even if the mean square error is small when using the learning data, the generalization performance is low if it is an overfitting model.

train_X, test_X = X[:400], X[400:]

train_y, test_y = y[:400], y[400:]

model_svr_rbf_1 = SVR(C = 1.0, kernel = 'rbf', gamma = 'auto')
model_svr_rbf_1.fit(train_X, train_y)

test_y_pred = model_svr_rbf_1.predict(test_X)

print(mean_squared_error(test_y, test_y_pred))
print(r2_score(test_y, test_y_pred))

69.16928620453004
-1.4478345530124388

Ways to prevent over-fitting

1. split training data and test data

from sklearn.datasets import load_breast_cancer

data = load_breast_cancer()

X = data.data

y = data.target

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)

Models that use learning data and test data to learn with SVR

from sklearn.svm import SVC

model_svc = SVC(gamma = 'auto')

model_svc.fit(X_train, y_train)

y_train_pred = model_svc.predict(X_train)

y_test_pred = model_svc.predict(X_test)

from sklearn.metrics import accuracy_score

print(accuracy_score(y_train, y_train_pred))

print(accuracy_score(y_test, y_test_pred))

1.0
0.5906432748538012

Models that use learning data and test data to learn with RFC

from sklearn.ensemble import RandomForestClassifier

model_rfc = RandomForestClassifier(n_estimators=10)

model_rfc.fit(X_train, y_train)

y_train_pred = model_rfc.predict(X_train)

y_test_pred = model_rfc.predict(X_test)

from sklearn.metrics import accuracy_score

print(accuracy_score(y_train, y_train_pred))

print(accuracy_score(y_test, y_test_pred))

0.9899497487437185
0.9473684210526315

Examining the accuracy of the test data can prevent the use of overfitting models.

Cross Validation

from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold

cv = KFold(5, shuffle=True)

model_rfc_1 = RandomForestClassifier(n_estimators=10)

print(cross_val_score(model_rfc_1, X, y, cv=cv, scoring='accuracy'))
print(cross_val_score(model_rfc_1, X, y, cv=cv, scoring='f1'))

[0.97368421 0.97368421 0.99122807 0.96491228 0.92920354]
[0.95172414 0.96341463 0.96350365 0.93959732 0.94214876]

Searching hyper parameter

greed search

from sklearn.datasets import load_breast_cancer

data = load_breast_cancer()

X = data.data

y = 1 - data.target

X = X[:, :10]

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import KFold

cv = KFold(5, shuffle = True)

param_grid = {'max_depth' : [5, 10, 15], 'n_estimators' : [10, 20, 30]}

model_rfc_2 = RandomForestClassifier()

grid_search = GridSearchCV(model_rfc_2, param_grid, cv=cv, scoring = 'accuracy')

grid_search.fit(X, y)

GridSearchCV(cv=KFold(n_splits=5, random_state=None, shuffle=True),
             estimator=RandomForestClassifier(),
             param_grid={'max_depth': [5, 10, 15],
                         'n_estimators': [10, 20, 30]},
             scoring='accuracy')

print(grid_search.best_score_)
print(grid_search.best_params_)

0.9420276354603322
{'max_depth': 15, 'n_estimators': 10}

참고문헌

秋庭伸也 et al. 머신러닝 도감 : 그림으로 공부하는 머신러닝 알고리즘 17 / 아키바 신야, 스기야마 아세이, 데라다 마나부 [공] 지음 ; 이중민 옮김, 2019.

Twitter Facebook LinkedIn

Juyeong Shin

Evaluation Method

Evaluation Method

Ways to evaluate supervised learning

How to evaluate classification problems

Confusion matrix

accuracy

precision

recall

F1-score

prediction probability

ROC (Receiver Operating Characteristic) curve and AUC (area under the curve)

Evaluation method of Regression Problem

mean square error

coefficient of determination

difference between mean square error and coefficient of determination

Compared to other algorithms

difference with SVC (Support Vector Classification) and SVR (Support Vector Regression)

SVR Sample Code

Setting hyper parameter

Over-fitting

Ways to prevent over-fitting

1. split training data and test data

Models that use learning data and test data to learn with SVR

Models that use learning data and test data to learn with RFC

Cross Validation

Searching hyper parameter

greed search

참고문헌

공유하기

댓글남기기

참고

Tree-KG: An Expandable Knowledge Graph Construction Framework for Knowledge-intensive Domains

✨️Going Beyond Local: Global Graph-Enhanced Personalized News Recommendation

LLM, RAG, KG, RecSys 트렌드 리뷰

사고 실험보단 컴퓨터공학, 세상의 이치로 풀어써본 P, NP, NP-난해, NP-완전