Study_note(zb_data)/Deep Learning

์Šคํ„ฐ๋”” ๋…ธํŠธ (tensorflow, LeNET)

KloudHyun 2023. 11. 1. 18:01

๐Ÿ“Œ Classify Wearing Mask

โžก๏ธ Data Source

https://www.kaggle.com/datasets/ashishjangra27/face-mask-12k-images-dataset

โžก๏ธ Find file path using  keyword ls

→ ls ๋ช…๋ น์–ด๋กœ ํ˜„์žฌ ๊ฒฝ๋กœ์— ์กด์žฌํ•˜๋Š” ํŒŒ์ผ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

ls
>>>
2023-11-01  ์˜คํ›„ 01:37    <DIR>          .
2023-10-30  ์˜คํ›„ 04:04    <DIR>          ..
2023-10-31  ์˜คํ›„ 09:59    <DIR>          .ipynb_checkpoints
2023-10-31  ์˜คํ›„ 09:57         3,172,365 1. Beginning of Deeplearning.ipynb
2023-10-31  ์˜คํ›„ 09:56         1,298,479 2. Deep Learning from scratch.ipynb
2023-11-01  ์˜คํ›„ 01:37           167,739 3. Dive to cnn.ipynb
2023-11-01  ์˜คํ›„ 01:59    <DIR>          data
2023-10-31  ์˜คํ›„ 02:57        37,918,144 MNIST_CNN_model.h5

โžก๏ธ Dealwith zipfiles

→ zipfile ๋ช…๋ น์–ด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์•Œ์ง‘ ํŒŒ์ผ์˜ ์••์ถ•์„ ํ’€ ์ˆ˜ ์žˆ๋‹ค.

import zipfile

content_zip = zipfile.ZipFile("./data/archive.zip")
content_zip.extractall("./data")

content_zip.close()

โžก๏ธ Organizing Data files

→ listdir ๋ช…๋ น์–ด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ, ํ•ด๋‹น ๊ฒฝ๋กœ์˜ ํด๋” ํŒŒ์ผ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

import os
import glob

path = "./data/Face Mask Dataset/"
os.listdir(path)
>>>>
['Test', 'Train', 'Validation']
path = "./data/Face Mask Dataset/"
os.listdir(path+"/"+'Train')
>>>>
['WithMask', 'WithoutMask']
path = "./data/Face Mask Dataset/"
dataset = {
	"image_path" : [], 
	"mask_status" : [], 
	"where" : []
}
for where in os.listdir(path):
    for status in os.listdir(path+"/"+where):
        for image in glob.glob(path+where+"/"+status+"/"+"*.png"):
            dataset["image_path"].append(image)
            dataset["mask_status"].append(status)
            dataset["where"].append(where)
import pandas as pd

dataset = pd.DataFrame(dataset) 
dataset.head()

๐Ÿ”ปConfirm is data ratio

→ DataFrame์˜ value_counts ๋ช…๋ น์–ด๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  column์„ ๋„ฃ์–ด์ฃผ๋ฉด ํ•ด๋‹น ์ปฌ๋Ÿผ์˜ value_counts๋ฅผ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

dataset.value_counts("mask_status")
>>>>
mask_status
WithoutMask    5909
WithMask       5883
print("With Mask:", dataset.value_counts("mask_status")[0])
print("Without Mask:", dataset.value_counts("mask_status")[1])

sns.countplot(x=dataset['mask_status'])
plt.show()
>>>>
With Mask: 5909
Without Mask: 5883

๐Ÿ”ปCheck the image randomly

import cv2
plt.figure(figsize=(15, 10))

for i in range(9):
	# dataset์—์„œ random์œผ๋กœ ์ด๋ฏธ์ง€ ๊ฐ€์ ธ์˜ค๊ธฐ
	random = np.random.randint(1, len(dataset))
    
	# subplot์„ ํ™œ์šฉํ•˜์—ฌ 3x3์˜ ํ˜•ํƒœ๋กœ ๊ทธ๋ฆฌ๊ธฐ
	plt.subplot(3, 3, i + 1)
    
	# cv2 ๋ชจ๋“ˆ์„ ์ด์šฉํ•˜์—ฌ image๋ฅผ ๋กœ๋“œ, random์œผ๋กœ ์ถ”์ถœํ•œ dataset์˜ index๋ฅผ ํ™œ์šฉ, location ๋ฉ”์„œ๋“œ๋ฅผ ํ™œ์šฉ
	plt.imshow(cv2.imread(dataset.loc[random, "image_path"]))
	
	# ๊ฐ image์— ๋Œ€ํ•œ title, ์—ฌ๊ธฐ์„œ๋Š” ๋งˆ์Šคํฌ์˜ ์ƒํƒœ๋ฅผ ํ‘œํ˜„
	plt.title(dataset.loc[random, "mask_status"], size=15)
    
	# x์ถ• ๊ฐ’, y์ถ• ๊ฐ’์„ ์—†์•ค๋‹ค (๋นˆ ๋ฆฌ์ŠคํŠธ๋ฅผ ํ™œ์šฉ)
	plt.xticks([])
	plt.yticks([])
    
plt.show()

๐Ÿ”ปArrange Data

→ train data๋งŒ ์‚ฌ์šฉํ•  ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์—, train_df๋ฅผ ๋‹ค์‹œ ๋งŒ๋“ ๋‹ค.

train_df = dataset[dataset["where"]=="Train"]
test_df = dataset[dataset["where"]=="Test"]
valid_df = dataset[dataset["where"]=="Validation"]
train_df = train_df.reset_index().drop("index", axis=1)
train_df.head()

๐Ÿ”ปCheck Data ratio

→ ๊ฐ๊ฐ ๋ฐ์ดํ„ฐ ์…‹์„ ํ™•์ธ ํ•ด๋ณด๋‹ˆ, ๊ฑฐ์˜ ๋™์ผํ•œ ๋น„์œจ์„ ๋‚˜ํƒ€๋‚ด๊ณ  ์žˆ๋‹ค.

plt.figure(figsize=(15, 5))
plt.subplot(1, 3, 1)
sns.countplot(x = train_df['mask_status'])
plt.title("Training Dataset", size=10)

plt.figure(figsize=(15, 5))
plt.subplot(1, 3, 2)
sns.countplot(x = test_df['mask_status'])
plt.title("Test Dataset", size=10)

plt.figure(figsize=(15, 5))
plt.subplot(1, 3, 3)
sns.countplot(x = valid_df['mask_status'])
plt.title("Validation Dataset", size=10)

plt.show()

๐Ÿ”ปPreprocessing image data

→ image๋ฅผ grayscale๋กœ ์ฝ์€ ๋‹ค์Œ, resizing

→ ์ด ๋ฐ์ดํ„ฐ๋ฅผ ๋ฆฌ์ŠคํŠธ์— append

from tqdm.notebook import tqdm
data = []
image_size = 150

for i in range(len(train_df)):
    # image๋ฅผ grayscale๋กœ ์ฝ๋Š”๋‹ค
    img_array = cv2.imread(train_df['image_path'][i], cv2.IMREAD_GRAYSCALE)
    
    # image๋ฅผ resize
    new_image_array = cv2.resize(img_array, (image_size, image_size))
    
    if train_df["mask_status"][i] == 'WithMask":
        data.append([new_image_array, 1])
    else:
        data.append([new_image_array, 0])

 

data ์…‹ ์˜ˆ์‹œ, list์•ˆ์— resizingํ•œ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์™€, labelingํ•œ 1์ด ๋“ค์–ด๊ฐ€ ์žˆ๋‹ค.

โžก๏ธ The reason why dataset shuffle

→ ๋ฐ์ดํ„ฐ๊ฐ€ ์ˆœ์ฐจ์ ์œผ๋กœ ์ž…๋ ฅ์ด ๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์—, ํ•œ ๋ฒˆ ์„ž๋Š” ๊ฒƒ์ด ์ข‹๋‹ค.

np.random.shuffle(data)

 

๐Ÿ”ปCheck resizing image

fig, ax = plt.subplots(2, 3, figsize=(10, 6))

for row in range(2):
    for col in range(3):
        image_index = row*100+col
        
        ax[row, col].axis("off")
        ax[row, col].imshow(data[image_index][0], cmap="grey")
        
        if data[image_index][1] == 0:
            ax[row, col].set_title("Without Mask")
        else:
            ax[row, col].set_title("With Mask")

๐Ÿ”ปData Setting (Arrange)

→ Pixel Data๋ฅผ X, Labeling Data๋ฅผ y์— append

X = [] ; y= []

for image in data:
    X.append(image[0])
    y.append(image[1])

X=np.array(X)
y=np.array(y)
from sklearn.model_selection import train_test_split
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state = 13)

๐Ÿ”ปModeling

from tensorflow.keras import Sequential, models
from tensorflow.keras.layers import Flatten, Dense, Conv2D, MaxPool2D

model = models.Sequential([
    # Conv2D, 32์ฑ„๋„, ์ปค๋„ ์‚ฌ์ด์ฆˆ๋Š” 5x5 (3x3๊ณผ ํ•จ๊ป˜ ๋งŽ์ด ์“ฐ์ž„)
    # strides ์˜ต์…˜ 1x1 (ํ•œ ์นธ์”ฉ ํ›‘๋Š”๋‹ค), padding
    layers.Conv2D(32, kernel_size=(5, 5), strides=(1, 1), padding="same", activation="relu", input_shape=(150, 150, 1)),
    
    # ๋ณดํ†ต MaxPooling size์™€ strides size๋ฅผ ๋™์ผํ•˜๊ฒŒ ๊ฐ€์ ธ๊ฐ€๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๋‹ค.
    layers.MaxPooling2D(pool_size = (2, 2), strides=(2, 2)),
    
    # MaxPooling์„ ๋งŒ๋‚˜์„œ ํฌ๊ธฐ๊ฐ€ ์ค„์—ˆ์œผ๋‹ˆ, ๋” ๋งŽ์€ ํŠน์„ฑ์„ ์ฐพ๊ธฐ ์œ„ํ•ด์„œ Channel์„ ๋Š˜๋ฆฐ๋‹ค.
    layers.Conv2D(64, (2, 2), activation="relu", padding="same"),
    
    # ๋™์ผ
    layers.MaxPooling2D(pool_size = (2, 2), strides=(2, 2)),
    
    # ๊ณผ์ ํ•ฉ ๋ฐฉ์ง€๋ฅผ ์œ„ํ•ด Dropout์„ 25% ๋น„์œจ๋กœ ์„ค์ •
    layers.Dropout(0.25),
    
    layers.Flatten(),
    layers.Dense(1000, activation='relu'),
    
    # ๋งˆ์ง€๋ง‰ ์ถœ๋ ฅ๋‹จ ์„ค์ •
    layers.Dense(1, activation="sigmoid")
])
model.compile(
    optimizer="adam", loss=tf.keras.losses.BinaryCrossentropy(),
    metrics =["accuracy"]
)

๐Ÿ”ปFit

# X_train = X_train.reshape(len(X_train), X_train.shape[1], X_train.shape[2], 1)
X_train = X_train.reshape(-1, 150, 150, 1)
X_val = X_val.reshape(-1, 150, 150, 1)

hist = model.fit(X_train, y_train, epochs = 4, batch_size = 32)

๐Ÿ”ปEvaluate

→ ์„ฑ๋Šฅ ํ‰๊ฐ€

model.evaluate(X_val, y_val)
>>>>
63/63 [==============================] - 3s 54ms/step - loss: 0.1177 - accuracy: 0.9740
[0.11772098392248154, 0.9739999771118164]

๐Ÿ”ปPredict

prediction = (model.predict(X_val) > 0.5).astype("int32")

print(classification_report(y_val, prediction))
print(confusion_matrix(y_val, prediction))
>>>>
63/63 [==============================] - 3s 47ms/step
              precision    recall  f1-score   support

           0       0.98      0.97      0.97       982
           1       0.97      0.98      0.97      1018

    accuracy                           0.97      2000
   macro avg       0.97      0.97      0.97      2000
weighted avg       0.97      0.97      0.97      2000

[[950  32]
 [ 20 998]]

๐Ÿ”ป Showing Wrong number Image

→ ์˜ˆ์ธก์ด ํ‹€๋ฆฐ ๋ฐ์ดํ„ฐ ํ™•์ธ

→ ์ œ๋Œ€๋กœ ์–ผ๊ตด ์•ˆ๋ฉด์ด ์•ˆ๋‚˜์˜จ ์‚ฌ์ง„๋„ ์žˆ๊ณ ... ์ด์ƒํ•œ(?) ๋งˆ์Šคํฌ๋ฅผ ์“ฐ๊ณ  ์žˆ๋Š” ์‚ฌ์ง„๋„ ์žˆ๋Š” ๋“ฏ ํ•˜๋‹ค.

wrong_result = []

for n in range(0, len(y_val)):

# ์˜ˆ์ธกํ•œ ๊ฐ’ ์ค‘์—์„œ, ์‹ค์ œ ๊ฐ’๊ณผ ๋‹ค๋ฅธ ๊ฒƒ์˜ ์ด๋ฏธ์ง€ ๋ฒˆํ˜ธ๋ฅผ ํ™•์ธ
    if prediction[n] != y_val[n]:
        wrong_result.append(n)
        
len(wrong_result)
>>>>
52
import random

samples = random.choices(population = wrong_result, k=6)

plt.figure(figsize=(14, 12))
for idx, n in enumerate(sample):
    plt.subplot(2, 3, idx + 1)
    plt.imshow(X_val[n].reshape(150, 150), interpolation='nearest')
    plt.title(prediction[n])
    plt.axis("off")
plt.show()