Study_note(zb_data)/Machine Learning

์Šคํ„ฐ๋””๋…ธํŠธ (Regression)

KloudHyun 2023. 9. 28. 18:13

๐Ÿ“Œ ํ•™์Šต์˜ ์ข…๋ฅ˜

๐Ÿ”ป์ง€๋„ ํ•™์Šต์—๋Š” ๋ถ„๋ฅ˜์™€ ํšŒ๊ท€๊ฐ€ ์žˆ๋‹ค

์ถœ์ฒ˜ : ์ œ๋กœ๋ฒ ์ด์Šค ๋ฐ์ดํ„ฐ ์Šค์ฟจ

๐Ÿ”ป๋น„์ง€๋„ ํ•™์Šต์˜ ์ข…๋ฅ˜

์ถœ์ฒ˜ : ์ œ๋กœ๋ฒ ์ด์Šค ๋ฐ์ดํ„ฐ ์Šค์ฟจ

๐Ÿ“Œ ์„ ํ˜• ํšŒ๊ท€์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณด์ž

์ถœ์ฒ˜ : ์ œ๋กœ๋ฒ ์ด์Šค ๋ฐ์ดํ„ฐ ์Šค์ฟจ

๐Ÿ”ปOLS

- Ordinary Linear Least Square์˜ ๊ธฐ๋ณธ ๊ฐœ๋… ์ •๋ฆฌ

์ถœ์ฒ˜ : ์ œ๋กœ๋ฒ ์ด์Šค ๋ฐ์ดํ„ฐ ์Šค์ฟจ

์ถœ์ฒ˜ : ์ œ๋กœ๋ฒ ์ด์Šค ๋ฐ์ดํ„ฐ ์Šค์ฟจ

๐Ÿ”ปOLS ์‹ค์Šตํ•˜๊ธฐ (Code๋กœ ์‰ฝ๊ฒŒ ํ•  ์ˆ˜ ์žˆ๋‹ค!)

import pandas as pd

data = {'x':[1,2,3,4,5], 'y':[1,3,4,6,5]}
df = pd.DataFrame(data)

df

import statsmodels.formula.api as smf
lm_model = smf.ols(formula='y ~ x', data=df).fit() # y=ax+b <- ('y ~ x')
lm_model.params
>>>>
Intercept    0.5
x            1.1
dtype: float64

๐Ÿ”ปlmplot ์‚ฌ์šฉํ•˜์—ฌ ์ง์„ ์„ ๊ทธ๋ ค๋ณด์ž

import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize=(10, 7))
sns.lmplot(x='x', y='y', data=df)
plt.xlim([0, 5])
plt.show()

๐Ÿ”ป์ž”์ฐจ ํ‰๊ฐ€

- ์ž”์ฐจ๋Š” ํ‰๊ท ์ด 0์ธ ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๋Š” ๊ฒƒ์ด์–ด์•ผ ํ•œ๋‹ค

- ์ž”์ฐจ ํ‰๊ฐ€๋Š” ์ž”์ฐจ์˜ ํ‰๊ท ์ด 0์ด๊ณ , ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๋Š”์ง€ ํ™•์ธํ•ด์•ผํ•œ๋‹ค.

# ์•ž ์ „์— OLS ํ™•์ธํ–ˆ๋˜ model๋กœ ๊ฐ์ž์˜ ์ž”์ฐจ๋ฅผ ํ™•์ธ
resid = lm_model.resid
resid
>>>>
0   -0.6
1    0.3
2    0.2
3    1.1
4   -1.0
dtype: float64

๐Ÿ”ป๊ฒฐ์ •๊ณ„์ˆ˜ R-Squared

์ถœ์ฒ˜ : ์ œ๋กœ๋ฒ ์ด์Šค ๋ฐ์ดํ„ฐ ์Šค์ฟจ

import numpy as np
mu = np.mean(df['y'])
y= df['y']
y_hat = lm_model.predict()
np.sum((y_hat - mu)**2 / np.sum((y-mu)**2))
>>>>
0.8175675675675673
lm_model.rsquared
>>>>
0.8175675675675673

๐Ÿ”ป์ž”์ฐจ์˜ ๋ถ„ํฌ๋„ ํ™•์ธ

sns.distplot(resid, color='black')