Study_note(zb_data)/Machine Learning

์Šคํ„ฐ๋””๋…ธํŠธ (ML_๋ชจ๋ธ ํ‰๊ฐ€, ROC์™€ AUC)

KloudHyun 2023. 9. 25. 21:07

๐Ÿ“Œ๋ชจ๋ธ ํ‰๊ฐ€๋Š” ์–ด๋–ป๊ฒŒ ํ•˜๋Š” ๊ฑธ๊นŒ?

์ถœ์ฒ˜ : ์ œ๋กœ๋ฒ ์ด์Šค ๋ฐ์ดํ„ฐ ์Šค์ฟจ
์ถœ์ฒ˜ : ์ œ๋กœ๋ฒ ์ด์Šค ๋ฐ์ดํ„ฐ ์Šค์ฟจ

๐Ÿ”ป Model evaluation

- ๊ธฐ์กด์—๋Š” 0.5๋ฅผ ๊ธฐ์ค€์œผ๋กœ 0, 1๋กœ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜์˜ํ•˜์˜€์œผ๋‚˜ (if ์ด์ง„๋ถ„๋ฅ˜), ์ด์ œ๋Š” ๊ฐ€๋ณ€์„ฑ์„ ๊ฐ€์ง€๊ณ  ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜์˜ํ•ด๋ณด์ž

์ถœ์ฒ˜ : ์ œ๋กœ๋ฒ ์ด์Šค ๋ฐ์ดํ„ฐ ์Šค์ฟจ

๐Ÿ”ป Accuracy

- ์ „์ฒด ๋ฐ์ดํ„ฐ ์ค‘ ๋งž๊ฒŒ ์˜ˆ์ธกํ•œ ๊ฒƒ์˜ ๋น„์œจ

๐Ÿ”ป Precision (TP / (TP + FP))

- ์–‘์„ฑ์ด๋ผ๊ณ  ์˜ˆ์ธกํ•œ ๊ฒƒ ์ค‘์—์„œ ์‹ค์ œ ์–‘์„ฑ์˜ ๋น„์œจ

๐Ÿ”ป RECALL (TP / (TP+FN))

- ์ฐธ์ธ ๋ฐ์ดํ„ฐ๋“ค ์ค‘์—์„œ ์ฐธ์ด๋ผ๊ณ  ์˜ˆ์ธกํ•œ ๊ฒƒ

๐Ÿ”ป FALL-OUT (FP / (FP+TN))

- ์‹ค์ œ ์–‘์„ฑ์ด ์•„๋‹Œ๋ฐ, ์–‘์„ฑ์ด๋ผ๊ณ  ์ž˜๋ชป ์˜ˆ์ธกํ•œ ๊ฒฝ์šฐ

๐Ÿ“ŒF1 - Score

๐Ÿ”ป Recall๊ณผ Precision์„ ๊ฒฐํ•ฉํ•œ ์ง€ํ‘œ

- Recall๊ณผ Precision์ด ์–ด๋А ํ•œ์ชฝ์œผ๋กœ ์น˜์šฐ์น˜์ง€ ์•Š๊ณ , ๋‘˜ ๋‹ค ๋†’์€ ๊ฐ’์„ ๊ฐ€์งˆ ์ˆ˜๋ก ๋†’์€ ๊ฐ’์„ ๊ฐ€์ง„๋‹ค.

์ถœ์ฒ˜ : ์ œ๋กœ๋ฒ ์ด์Šค ๋ฐ์ดํ„ฐ ์Šค์ฟจ

๐Ÿ“ŒROC์™€ AUC

๐Ÿ”ป ROC - Recall๊ณผ Precision์„ ๊ฒฐํ•ฉํ•œ ์ง€ํ‘œ

- ๋ถ„๋ฅ˜ ์„ฑ๋Šฅ์ด ๋‚˜์˜๋ฉด ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ์ง์„ ์— ๊ฐ€๊น๋‹ค

- ROC ๊ณก์„ ์—์„œ Fall Out (FPR) ๊ฐ’์ด ๊ฐ™์„ ๋•, Recall (TPR) ๊ฐ’์ด ๋‚ฎ์€ ๊ฒƒ์„ ์„ ํƒํ•œ๋‹ค

์ถœ์ฒ˜ : ์ œ๋กœ๋ฒ ์ด์Šค ๋ฐ์ดํ„ฐ ์Šค์ฟจ
์ถœ์ฒ˜ : ์ œ๋กœ๋ฒ ์ด์Šค ๋ฐ์ดํ„ฐ ์Šค์ฟจ

๐Ÿ”ป AUC

- ROC ๊ณก์„  ์•„๋ž˜์˜ ๋ฉด์ 

- ์ผ๋ฐ˜์ ์œผ๋กœ 1์— ์ˆ˜๋ ดํ•  ์ˆ˜๋ก ์ข‹์€ ์ˆ˜์น˜์ด๋‹ค

์ถœ์ฒ˜ : ์ œ๋กœ๋ฒ ์ด์Šค ๋ฐ์ดํ„ฐ ์Šค์ฟจ

๐Ÿ“ŒROC Curve ๊ทธ๋ ค๋ณด๊ธฐ

๐Ÿ”ป ๋จธ์‹ ๋Ÿฌ๋‹ ๋ฐ์ดํ„ฐ ๊ฐ€์ ธ์˜ค๊ธฐ

import pandas as pd
red_wine = pd.read_csv('../data/winequality-red.csv', sep=';') 
white_wine = pd.read_csv('../data/winequality-white.csv', sep=';') 

# wine์˜ ์ปฌ๋Ÿฌ๋กœ ๋‚˜๋ˆ„๊ธฐ
red_wine['color'] = 1.
white_wine['color'] = 0.

# red_wine, white_wine ํ•ฉ์น˜๊ธฐ
wine = pd.concat([red_wine, white_wine])

# wine quality ๋ฅผ ์ˆซ์ž๋กœ ๋‚˜๋ˆ„๊ธฐ
wine['taste'] = [1. if grade > 5 else 0. for grade in wine['quality']]

# ๋ฐ์ดํ„ฐ ๋‚˜๋ˆ„๊ธฐ
X = wine.drop(['taste', 'quality'], axis=1)
y = wine['taste']

๐Ÿ”ป ๊ฒฐ์ • ํŠธ๋ฆฌ ์ง„ํ–‰ ๋ฐ ์˜ˆ์ธก

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# ๋ฐ์ดํ„ฐ๋ฅผ ํ›ˆ๋ จ์šฉ ๋ฐ์ดํ„ฐ์™€ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ๋กœ ๋‚˜๋ˆ„๊ธฐ
X_train, X_test, y_train, y_test =train_test_split(X, y, test_size=0.2, random_state=13)

# ๊ฒฐ์ •ํŠธ๋ฆฌ
wine_tree =DecisionTreeClassifier(max_depth=2, random_state=13)
wine_tree.fit(X_train, y_train)

# ์˜ˆ์ธกํ•˜๊ธฐ
y_pred_tr = wine_tree.predict(X_train)
y_pred_test = wine_tree.predict(X_test)

print('Train Acc : ', accuracy_score(y_train, y_pred_tr))
print('test Acc : ', accuracy_score(y_test, y_pred_test))
>>>>
Train Acc :  0.7294593034442948
test Acc :  0.7161538461538461
from sklearn.metrics import (accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, roc_curve)

print('Accuracy : ', accuracy_score(y_test, y_pred_test))
print('Precision_score', precision_score(y_test, y_pred_test))
print('Pricision : ', precision_score(y_test, y_pred_test))
print('AUC Score : ', roc_auc_score(y_test, y_pred_test))
print('F1 Score : ', f1_score(y_test, y_pred_test))
>>>>
Accuracy :  0.7161538461538461
Precision_score 0.8026666666666666
Pricision :  0.8026666666666666
AUC Score :  0.7105988470875331
F1 Score :  0.7654164017800381

๐Ÿ”ป ๊ณก์„  ๊ทธ๋ฆฌ๊ธฐ

import matplotlib.pyplot as plt
%matplotlib inline

pred_proba = wine_tree.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = roc_curve(y_test, pred_proba)

plt.figure(figsize=(10, 8))
plt.plot([0,1], [0,1])
plt.plot(fpr, tpr)
plt.grid
plt.show()