์คํฐ๋๋ ธํธ (Logistic Regression)
๐ Logistic Regression
- Linear Regression -> ํ๊ท
- ๋ถ๋ฅ๊ธฐ์ ์ฌ์ฉํ๊ธฐ ์ํ ๋ชฉ์
- ๋ถ๋ฅ๋ 0 or 1๋ก ์์ธก์ ํด์ผํ๋, Linear Regression์ ๊ทธ๋๋ก ์ ์ฉํ๋ฉด ์์ธก ๊ฐ์ 0๋ณด๋ค ์๊ฑฐ๋ 1๋ณด๋ค ํฐ ๊ฐ์ด ๋ ์ ์๋ค
- ์์ธก ๊ฐ์ ํญ์ 0๊ณผ 1 ์ฌ์ด์ ๊ฐ์ ๊ฐ๋๋ก ํจ์๋ฅผ ์์ ํ๋ ๊ฒ! (์๊ทธ๋ชจ์ด๋๋ฅผ ์ด์ฉ)
import numpy as np
z = np.arange(-10, 10, 0.01)
g = 1 / (1+np.exp(-z))
import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(z, g);
plt.grid()
plt.show()
plt.figure(figsize=(12, 8))
ax = plt.gca()
ax.plot(z, g)
ax.spines['left'].set_position('zero')
ax.spines['bottom'].set_position('center')
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
plt.show()
๐ Decision Boundary
- ๊ฒฐ์ ๊ฒฝ๊ณ
๐ Cost Function
- ์ง๋ Cost Function์ mse๋ก ์ก์๋ค- x์ ๋ํ 2์ฐจ์์ด๋ผ 2์ฐจํจ์๋ก ๊นจ๋ํ๊ฒ ์ถ๋ ฅ- logistic regression์ ๋ฏธ๋ถ์์ ๋ณต์กํ๊ฒ ์ถ๋ ฅ์ด ๋๋ค- ๊ทธ๋์ logistic regression ์์ ์ฌ ์ ์๋ฅผ ํ ํ์์ฑ์ด ์๋ค.
import numpy as np
h = np.arange(0.01, 1, 0.01)
C0 = -np.log(1-h)
C1 = -np.log(h)
plt.figure(figsize=(12, 8))
plt.plot(h, C0, label='y=0')
plt.plot(h, C1, label='y=1')
plt.legend()
plt.show()
๐ ๋ก์ง์คํฑ ํ๊ท ํ ์คํธ
import pandas as pd
wine = pd.read_csv('../data/wine.csv', index_col=0)
wine['taste'] = [1. if grade>5 else 0. for grade in wine['quality']]
X = wine.drop(['taste', 'quality'], axis=1)
y = wine['taste']
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=13)
# solver = ์ต์ ํ ์๊ณ ๋ฆฌ์ฆ์ ๋ญ๋ก ํ ๊ฑด์ง?
lr = LogisticRegression(solver='liblinear', random_state=13)
lr.fit(X_train, y_train)
y_pred_tr = lr.predict(X_train)
y_pred_test = lr.predict(X_test)
print('Train Accuracy : ', accuracy_score(y_train, y_pred_tr))
print('Test Accuracy : ', accuracy_score(y_test, y_pred_test))
>>>>>
Train Accuracy : 0.7425437752549547
Test Accuracy : 0.7438461538461538
๐ปPipe line์ ํ์ฉํด๋ณด์
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
estimators = [('scaler', StandardScaler()),
('clf', LogisticRegression(solver='liblinear', random_state=13))]
pipe = Pipeline(estimators)
pipe.fit(X_train, y_train)
y_pred_tr = pipe.predict(X_train)
y_pred_test = pipe.predict(X_test)
print('Train Accuracy : ', accuracy_score(y_train, y_pred_tr))
print('Test Accuracy : ', accuracy_score(y_test, y_pred_test))
>>>>
Train Accuracy : 0.7444679622859341
Test Accuracy : 0.7469230769230769
๐ปDecision Tree๋ก ๋น๊ตํ๊ธฐ
from sklearn.tree import DecisionTreeClassifier
wine_tree = DecisionTreeClassifier(max_depth=2, random_state=13)
wine_tree.fit(X_train, y_train)
models = {'logistic regression' : pipe, 'decision tree' : wine_tree}
๐ ์ปค๋ธ ๊ทธ๋ํ๋ฅผ ์ด์ฉํ ๋น๊ต
from sklearn.metrics import roc_curve
plt.figure(figsize=(10, 8))
plt.plot([0, 1], [0, 1], label = 'random_guess')
for model_name, model in models.items():
pred = model.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds =roc_curve(y_test, pred)
plt.plot(fpr, tpr, label=model_name)
plt.legend()
plt.grid()
plt.show()