Study_note(zb_data)/Machine Learning

์Šคํ„ฐ๋””๋…ธํŠธ (Logistic Regression)

KloudHyun 2023. 9. 30. 19:10

๐Ÿ“Œ Logistic Regression

- Linear Regression -> ํšŒ๊ท€

- ๋ถ„๋ฅ˜๊ธฐ์— ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•œ ๋ชฉ์  

- ๋ถ„๋ฅ˜๋Š” 0 or 1๋กœ ์˜ˆ์ธก์„ ํ•ด์•ผํ•˜๋‚˜, Linear Regression์„ ๊ทธ๋Œ€๋กœ ์ ์šฉํ•˜๋ฉด ์˜ˆ์ธก ๊ฐ’์€ 0๋ณด๋‹ค ์ž‘๊ฑฐ๋‚˜ 1๋ณด๋‹ค ํฐ ๊ฐ’์ด ๋  ์ˆ˜ ์žˆ๋‹ค

- ์˜ˆ์ธก ๊ฐ’์„ ํ•ญ์ƒ 0๊ณผ 1 ์‚ฌ์ด์˜ ๊ฐ’์„ ๊ฐ–๋„๋ก ํ•จ์ˆ˜๋ฅผ ์ˆ˜์ •ํ•˜๋Š” ๊ฒƒ! (์‹œ๊ทธ๋ชจ์ด๋“œ๋ฅผ ์ด์šฉ)

์ถœ์ฒ˜ : ์ œ๋กœ๋ฒ ์ด์Šค ๋ฐ์ดํ„ฐ ์Šค์ฟจ
์ถœ์ฒ˜ : ์ œ๋กœ๋ฒ ์ด์Šค ๋ฐ์ดํ„ฐ ์Šค์ฟจ

 

import numpy as np

z = np.arange(-10, 10, 0.01)
g = 1 / (1+np.exp(-z))
import matplotlib.pyplot as plt
%matplotlib inline

plt.plot(z, g);
plt.grid()
plt.show()

plt.figure(figsize=(12, 8))
ax = plt.gca()

ax.plot(z, g)
ax.spines['left'].set_position('zero')
ax.spines['bottom'].set_position('center')
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')

plt.show()

์ถœ์ฒ˜ : ์ œ๋กœ๋ฒ ์ด์Šค ๋ฐ์ดํ„ฐ ์Šค์ฟจ

๐Ÿ“Œ Decision Boundary

- ๊ฒฐ์ • ๊ฒฝ๊ณ„

์ถœ์ฒ˜ : ์ œ๋กœ๋ฒ ์ด์Šค ๋ฐ์ดํ„ฐ ์Šค์ฟจ

๐Ÿ“Œ Cost Function

- ์ง€๋‚œ Cost Function์€ mse๋กœ ์žก์•˜๋‹ค- x์— ๋Œ€ํ•œ 2์ฐจ์‹์ด๋ผ 2์ฐจํ•จ์ˆ˜๋กœ ๊นจ๋—ํ•˜๊ฒŒ ์ถœ๋ ฅ- logistic regression์˜ ๋ฏธ๋ถ„์‹์€ ๋ณต์žกํ•˜๊ฒŒ ์ถœ๋ ฅ์ด ๋œ๋‹ค- ๊ทธ๋ž˜์„œ logistic regression ์—์„œ ์žฌ ์ •์˜๋ฅผ ํ•  ํ•„์š”์„ฑ์ด ์žˆ๋‹ค.

์ถœ์ฒ˜ : ์ œ๋กœ๋ฒ ์ด์Šค ๋ฐ์ดํ„ฐ ์Šค์ฟจ

import numpy as np
h = np.arange(0.01, 1, 0.01)
C0 = -np.log(1-h)
C1 = -np.log(h)

plt.figure(figsize=(12, 8))
plt.plot(h, C0, label='y=0')
plt.plot(h, C1, label='y=1')
plt.legend()
plt.show()

๐Ÿ“Œ ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ ํ…Œ์ŠคํŠธ

import pandas as pd

wine = pd.read_csv('../data/wine.csv', index_col=0)

wine['taste'] = [1. if grade>5 else 0. for grade in wine['quality']]

X = wine.drop(['taste', 'quality'], axis=1)
y = wine['taste']
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=13)

# solver = ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๋ญ˜๋กœ ํ• ๊ฑด์ง€?
lr = LogisticRegression(solver='liblinear', random_state=13)
lr.fit(X_train, y_train)

y_pred_tr = lr.predict(X_train)
y_pred_test = lr.predict(X_test)

print('Train Accuracy : ', accuracy_score(y_train, y_pred_tr))
print('Test Accuracy : ', accuracy_score(y_test, y_pred_test))
>>>>>
Train Accuracy :  0.7425437752549547
Test Accuracy :  0.7438461538461538

๐Ÿ”ปPipe line์„ ํ™œ์šฉํ•ด๋ณด์ž

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

estimators = [('scaler', StandardScaler()), 
              ('clf', LogisticRegression(solver='liblinear', random_state=13))]
pipe = Pipeline(estimators)

pipe.fit(X_train, y_train)

y_pred_tr = pipe.predict(X_train)
y_pred_test = pipe.predict(X_test)

print('Train Accuracy : ', accuracy_score(y_train, y_pred_tr))
print('Test Accuracy : ', accuracy_score(y_test, y_pred_test))
>>>>
Train Accuracy :  0.7444679622859341
Test Accuracy :  0.7469230769230769

๐Ÿ”ปDecision Tree๋กœ ๋น„๊ตํ•˜๊ธฐ

from sklearn.tree import DecisionTreeClassifier

wine_tree = DecisionTreeClassifier(max_depth=2, random_state=13)
wine_tree.fit(X_train, y_train)

models = {'logistic regression' : pipe, 'decision tree' : wine_tree}

๐Ÿ“Œ ์ปค๋ธŒ ๊ทธ๋ž˜ํ”„๋ฅผ ์ด์šฉํ•œ ๋น„๊ต

from sklearn.metrics import roc_curve
plt.figure(figsize=(10, 8))
plt.plot([0, 1], [0, 1], label = 'random_guess')

for model_name, model in models.items():
    pred = model.predict_proba(X_test)[:, 1]
    fpr, tpr, thresholds =roc_curve(y_test, pred)
    plt.plot(fpr, tpr, label=model_name)

plt.legend()
plt.grid()
plt.show()