์คํฐ๋๋ ธํธ (ML_๋ชจ๋ธ ํ๊ฐ, ROC์ AUC)
๐๋ชจ๋ธ ํ๊ฐ๋ ์ด๋ป๊ฒ ํ๋ ๊ฑธ๊น?
๐ป Model evaluation
- ๊ธฐ์กด์๋ 0.5๋ฅผ ๊ธฐ์ค์ผ๋ก 0, 1๋ก ๊ฒฐ๊ณผ๋ฅผ ๋ฐ์ํ์์ผ๋ (if ์ด์ง๋ถ๋ฅ), ์ด์ ๋ ๊ฐ๋ณ์ฑ์ ๊ฐ์ง๊ณ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ์ํด๋ณด์
๐ป Accuracy
- ์ ์ฒด ๋ฐ์ดํฐ ์ค ๋ง๊ฒ ์์ธกํ ๊ฒ์ ๋น์จ
๐ป Precision (TP / (TP + FP))
- ์์ฑ์ด๋ผ๊ณ ์์ธกํ ๊ฒ ์ค์์ ์ค์ ์์ฑ์ ๋น์จ
๐ป RECALL (TP / (TP+FN))
- ์ฐธ์ธ ๋ฐ์ดํฐ๋ค ์ค์์ ์ฐธ์ด๋ผ๊ณ ์์ธกํ ๊ฒ
๐ป FALL-OUT (FP / (FP+TN))
- ์ค์ ์์ฑ์ด ์๋๋ฐ, ์์ฑ์ด๋ผ๊ณ ์๋ชป ์์ธกํ ๊ฒฝ์ฐ
๐F1 - Score
๐ป Recall๊ณผ Precision์ ๊ฒฐํฉํ ์งํ
- Recall๊ณผ Precision์ด ์ด๋ ํ์ชฝ์ผ๋ก ์น์ฐ์น์ง ์๊ณ , ๋ ๋ค ๋์ ๊ฐ์ ๊ฐ์ง ์๋ก ๋์ ๊ฐ์ ๊ฐ์ง๋ค.
๐ROC์ AUC
๐ป ROC - Recall๊ณผ Precision์ ๊ฒฐํฉํ ์งํ
- ๋ถ๋ฅ ์ฑ๋ฅ์ด ๋์๋ฉด ๋จธ์ ๋ฌ๋ ๋ชจ๋ธ์ ์ฑ๋ฅ์ด ์ง์ ์ ๊ฐ๊น๋ค
- ROC ๊ณก์ ์์ Fall Out (FPR) ๊ฐ์ด ๊ฐ์ ๋, Recall (TPR) ๊ฐ์ด ๋ฎ์ ๊ฒ์ ์ ํํ๋ค
๐ป AUC
- ROC ๊ณก์ ์๋์ ๋ฉด์
- ์ผ๋ฐ์ ์ผ๋ก 1์ ์๋ ดํ ์๋ก ์ข์ ์์น์ด๋ค
๐ROC Curve ๊ทธ๋ ค๋ณด๊ธฐ
๐ป ๋จธ์ ๋ฌ๋ ๋ฐ์ดํฐ ๊ฐ์ ธ์ค๊ธฐ
import pandas as pd
red_wine = pd.read_csv('../data/winequality-red.csv', sep=';')
white_wine = pd.read_csv('../data/winequality-white.csv', sep=';')
# wine์ ์ปฌ๋ฌ๋ก ๋๋๊ธฐ
red_wine['color'] = 1.
white_wine['color'] = 0.
# red_wine, white_wine ํฉ์น๊ธฐ
wine = pd.concat([red_wine, white_wine])
# wine quality ๋ฅผ ์ซ์๋ก ๋๋๊ธฐ
wine['taste'] = [1. if grade > 5 else 0. for grade in wine['quality']]
# ๋ฐ์ดํฐ ๋๋๊ธฐ
X = wine.drop(['taste', 'quality'], axis=1)
y = wine['taste']
๐ป ๊ฒฐ์ ํธ๋ฆฌ ์งํ ๋ฐ ์์ธก
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
# ๋ฐ์ดํฐ๋ฅผ ํ๋ จ์ฉ ๋ฐ์ดํฐ์ ํ
์คํธ ๋ฐ์ดํฐ๋ก ๋๋๊ธฐ
X_train, X_test, y_train, y_test =train_test_split(X, y, test_size=0.2, random_state=13)
# ๊ฒฐ์ ํธ๋ฆฌ
wine_tree =DecisionTreeClassifier(max_depth=2, random_state=13)
wine_tree.fit(X_train, y_train)
# ์์ธกํ๊ธฐ
y_pred_tr = wine_tree.predict(X_train)
y_pred_test = wine_tree.predict(X_test)
print('Train Acc : ', accuracy_score(y_train, y_pred_tr))
print('test Acc : ', accuracy_score(y_test, y_pred_test))
>>>>
Train Acc : 0.7294593034442948
test Acc : 0.7161538461538461
from sklearn.metrics import (accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, roc_curve)
print('Accuracy : ', accuracy_score(y_test, y_pred_test))
print('Precision_score', precision_score(y_test, y_pred_test))
print('Pricision : ', precision_score(y_test, y_pred_test))
print('AUC Score : ', roc_auc_score(y_test, y_pred_test))
print('F1 Score : ', f1_score(y_test, y_pred_test))
>>>>
Accuracy : 0.7161538461538461
Precision_score 0.8026666666666666
Pricision : 0.8026666666666666
AUC Score : 0.7105988470875331
F1 Score : 0.7654164017800381
๐ป ๊ณก์ ๊ทธ๋ฆฌ๊ธฐ
import matplotlib.pyplot as plt
%matplotlib inline
pred_proba = wine_tree.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = roc_curve(y_test, pred_proba)
plt.figure(figsize=(10, 8))
plt.plot([0,1], [0,1])
plt.plot(fpr, tpr)
plt.grid
plt.show()