스터디노트 (Logistic Regression)

Study_note(zb_data)/Machine Learning

스터디노트 (Logistic Regression)

KloudHyun 2023. 9. 30. 19:10

📌 Logistic Regression

- Linear Regression -> 회귀

- 분류기에 사용하기 위한 목적

- 분류는 0 or 1로 예측을 해야하나, Linear Regression을 그대로 적용하면 예측 값은 0보다 작거나 1보다 큰 값이 될 수 있다

- 예측 값을 항상 0과 1 사이의 값을 갖도록 함수를 수정하는 것! (시그모이드를 이용)

import numpy as np

z = np.arange(-10, 10, 0.01)
g = 1 / (1+np.exp(-z))

import matplotlib.pyplot as plt
%matplotlib inline

plt.plot(z, g);
plt.grid()
plt.show()

plt.figure(figsize=(12, 8))
ax = plt.gca()

ax.plot(z, g)
ax.spines['left'].set_position('zero')
ax.spines['bottom'].set_position('center')
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')

plt.show()

📌 Decision Boundary

- 결정 경계

📌 Cost Function

- 지난 Cost Function은 mse로 잡았다- x에 대한 2차식이라 2차함수로 깨끗하게 출력- logistic regression의 미분식은 복잡하게 출력이 된다- 그래서 logistic regression 에서 재 정의를 할 필요성이 있다.

import numpy as np
h = np.arange(0.01, 1, 0.01)
C0 = -np.log(1-h)
C1 = -np.log(h)

plt.figure(figsize=(12, 8))
plt.plot(h, C0, label='y=0')
plt.plot(h, C1, label='y=1')
plt.legend()
plt.show()

📌 로지스틱 회귀 테스트

import pandas as pd

wine = pd.read_csv('../data/wine.csv', index_col=0)

wine['taste'] = [1. if grade>5 else 0. for grade in wine['quality']]

X = wine.drop(['taste', 'quality'], axis=1)
y = wine['taste']

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=13)

# solver = 최적화 알고리즘을 뭘로 할건지?
lr = LogisticRegression(solver='liblinear', random_state=13)
lr.fit(X_train, y_train)

y_pred_tr = lr.predict(X_train)
y_pred_test = lr.predict(X_test)

print('Train Accuracy : ', accuracy_score(y_train, y_pred_tr))
print('Test Accuracy : ', accuracy_score(y_test, y_pred_test))
>>>>>
Train Accuracy :  0.7425437752549547
Test Accuracy :  0.7438461538461538

🔻Pipe line을 활용해보자

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

estimators = [('scaler', StandardScaler()), 
              ('clf', LogisticRegression(solver='liblinear', random_state=13))]
pipe = Pipeline(estimators)

pipe.fit(X_train, y_train)

y_pred_tr = pipe.predict(X_train)
y_pred_test = pipe.predict(X_test)

print('Train Accuracy : ', accuracy_score(y_train, y_pred_tr))
print('Test Accuracy : ', accuracy_score(y_test, y_pred_test))
>>>>
Train Accuracy :  0.7444679622859341
Test Accuracy :  0.7469230769230769

🔻Decision Tree로 비교하기

from sklearn.tree import DecisionTreeClassifier

wine_tree = DecisionTreeClassifier(max_depth=2, random_state=13)
wine_tree.fit(X_train, y_train)

models = {'logistic regression' : pipe, 'decision tree' : wine_tree}

📌 커브 그래프를 이용한 비교

from sklearn.metrics import roc_curve
plt.figure(figsize=(10, 8))
plt.plot([0, 1], [0, 1], label = 'random_guess')

for model_name, model in models.items():
    pred = model.predict_proba(X_test)[:, 1]
    fpr, tpr, thresholds =roc_curve(y_test, pred)
    plt.plot(fpr, tpr, label=model_name)

plt.legend()
plt.grid()
plt.show()