๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

Study_note(zb_data)/EDA

์Šคํ„ฐ๋”” ๋…ธํŠธ (Slenium ํŒŒ์•…ํ•˜๊ธฐ)

๐Ÿ“Œ Slenium ํŒŒ์•…ํ•˜๊ธฐ

์ถœ์ฒ˜ : ์ œ๋กœ๋ฒ ์ด์Šค ๋ฐ์ดํ„ฐ ์Šค์ฟจ

๐Ÿšฉ selenium link

https://www.selenium.dev/documentation/

 

The Selenium Browser Automation Project

Selenium automates browsers. That's it!

www.selenium.dev

 

https://selenium-python.readthedocs.io/index.html

 

Selenium with Python — Selenium Python Bindings 2 documentation

Note This is not an official documentation. If you would like to contribute to this documentation, you can fork this project in GitHub and send pull requests. You can also send your feedback to my email: baiju.m.mail AT gmail DOT com. So far 60+ community

selenium-python.readthedocs.io


๐Ÿšฉ selenium import

!pip install selenium # selenium ์„ค์น˜

from selenium import webdriver # selenium web driver import

driver = webdriver.Chrome() # webdriver browser select
driver.get('https://www.naver.com') # open url

๐Ÿšฉ selenium ๊ด€๋ จ ๋ช…๋ น์–ด

  • quit()
  • maximize_window(), minimize_window(), set_window_size(width, height)
  • refresh(), back(), forward()
driver.quit() # browser ๋‚˜๊ฐ€๊ธฐ
# ํ™”๋ฉด ์ตœ๋Œ€ ํฌ๊ธฐ ์„ค์ • (ํ™”๋ฉด ์ตœ๋Œ€ํ™”)
driver.maximize_window()

# ํ™”๋ฉด ์ตœ์†Œ ํฌ๊ธฐ ์„ค์ • (ํ™”๋ฉด ์ตœ์†Œํ™”)
driver.minimize_window()

# ํ™”๋ฉด ํฌ๊ธฐ ์„ค์ • (width, height)
driver.set_window_size(600,600)
# ์ƒˆ๋กœ๊ณ ์นจ
driver.refresh()

# ๋’ค๋กœ ๊ฐ€๊ธฐ
driver.back()

# ์•ž์œผ๋กœ ๊ฐ€๊ธฐ
driver.forward()

๐Ÿšฉ selenium, find_element ๊ด€๋ จ ๋ช…๋ น์–ด

  • By.CSS_SELECTOR 
    • ๊ฐœ๋ฐœ์ž ๋„๊ตฌ์—์„œ ์›ํ•˜๋Š” ๋งํฌ ์ฃผ์†Œ๋ฅผ copyํ•  ์ˆ˜ ์žˆ๋‹ค.
  • click()
# ํด๋ฆญ
from selenium.webdriver.common.by import By

first_content = driver.find_element(By.CSS_SELECTOR, '#content > div.cover-masonry > div > ul > li:nth-child(2)')
first_content.click()

-

  • switch_to.window(driver.window_handles[2]
    • (2๋ฒˆ์งธ๋กœ ์ง€์ •ํ•œ ํƒญ์œผ๋กœ ์ด๋™)
  • close()
    • tap์˜ ๊ธฐ์ค€์„ ์žก๊ณ  ๊ทธ ํƒญ์„ closeํ•ด์ค€๋‹ค (quit์€ ์•„์˜ˆ ์ข…๋ฃŒ)
# ์ƒˆ๋กœ์šด ํƒญ ์ƒ์„ฑํ•˜๊ธฐ, (java ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์“ฐ๋Š” ๋ฌธ๋ฒ•)
driver.execute_script('window.open("http://www.naver.com")')

# ํƒญ ์ด๋™
driver.switch_to.window(driver.window_handles[2])

# quit๊ณผ ์ฐจ์ด์  - tap์˜ ๊ธฐ์ค€์„ ์žก๊ณ  ๊ทธ ํƒญ๋งŒ close
driver.close()

๐Ÿšฉ selenium, scroll

  • ์Šคํฌ๋กค ๊ฐ€๋Šฅํ•œ ์ตœ๋Œ€ ๊ธธ์ด ๊ฒ€์ƒ‰
# ์Šคํฌ๋กค ๊ฐ€๋Šฅํ•œ ๋†’์ด (์ตœ๋Œ€ ๊ธธ์ด)
# ์ž๋ฐ” ์Šคํฌ๋ฆฝํŠธ ์ฝ”๋“œ๋กœ ์‹คํ–‰
driver.execute_script('return document.body.scrollHeight')

# ํ™”๋ฉด ์Šคํฌ๋กค ํ•˜๋‹จ ์ด๋™ (์ œ์ผ ์œ„์—์„œ๋ถ€ํ„ฐ ๋งจ ๋งˆ์ง€๋ง‰ ๊นŒ์ง€)
driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')

# ํ™”๋ฉด ์Šคํฌ๋กค ์ƒ๋‹จ ์ด๋™ (๋งจ ์œ„๊นŒ์ง€ ๊นŒ์ง€)
driver.execute_script('window.scrollTo(0,0);')
  • ๋ณด์ด๋Š” ํ™”๋ฉด ์Šคํฌ๋ฆฐ ์ƒท ์ €์žฅ
# ํ˜„์žฌ ๋ณด์ด๋Š” ํ™”๋ฉด ์Šคํฌ๋ฆฐ ์ƒท ์ €์žฅ
driver.save_screenshot('./last_height.png')

๐Ÿšฉ ActionChains

  • ํŠน์ • ํƒœ๊ทธ ์ง€์ ๊นŒ์ง€ ์Šคํฌ๋กค ์ด๋™
# ํŠน์ • ํƒœ๊ทธ ์ง€์ ๊นŒ์ง€ ์Šคํฌ๋กค ์ด๋™
from selenium.webdriver import ActionChains

some_tag = driver.find_element(By.CSS_SELECTOR, '#content > div:nth-child(2) > div > ul > li:nth-child(6)')
action = ActionChains(driver) # ์šฐ๋ฆฌ๊ฐ€ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋Š” driver๋ฅผ ์ œ์–ดํ•˜๊ฒ ๋‹ค.
action.move_to_element(some_tag).perform()

๐Ÿšฉ CSS.SELECTOR

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get('https://www.naver.com')
keyword = driver.find_element(By.CSS_SELECTOR, '#query') # ๋„ค์ด๋ฒ„ ๊ฒ€์ƒ‰ ์ฐฝ html ์†Œ์Šค๋ฅผ css ์…€๋ ‰ํ„ฐ๋กœ ์ง€์ •
keyword.clear() # ํ•œ ๋ฒˆ ๊ฒ€์ƒ‰์ฐฝ์„ ๋น„์šด๋‹ค
keyword.send_keys('ํŒŒ์ด์ฌ') # ๊ฒ€์ƒ‰์–ด๋ฅผ ๋ณด๋‚ธ๋‹ค

search_btn = driver.find_element(By.CSS_SELECTOR, '#search-btn') # ๊ฒ€์ƒ‰๋ฒ„ํŠผ CSS ์…€๋ ‰ํ„ฐ ํ™œ์„ฑํ™”
search_btn.click() # ํด๋ฆญ
  • XPATH (BeautifulSoup ์—์„œ๋Š” ์‚ฌ์šฉํ•  ์ˆ˜ ์—†๋‹ค.)
'//': ์ตœ์ƒ์œ„
'/' : ์ž์‹ ํƒœ๊ทธ
'*' : ์ž์† ํƒœ๊ทธ

//*[@id="main_pack"]/section[3]/div/div[2]/panel-list/div/ul/li[3]/div/div/a
๊ฐœ๋ฐœ์ž ๋„๊ตฌ์—์„œ ๊ฒฝ๋กœ๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

๐Ÿšฉ CSS.SELECTOR

  • ๋™์  ํŽ˜์ด์ง€ (๋™๊ทธ๋ผ๋ฏธ ๋ถ€๋ถ„ ํ™•์ธ!) ์ผ๋•Œ๋Š” ActionChains๋ฅผ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

# 1. ๋‹๋ณด๊ธฐ ๋ฒ„ํŠผ ์„ ํƒ
from selenium.webdriver import ActionChains

search_tag = driver.find_element(By.CSS_SELECTOR, '.search')
action = ActionChains(driver)
action.click(search_tag)
action.perform()


# 2. ๊ฒ€์ƒ‰์–ด ์ž…๋ ฅ
driver.find_element(By.CSS_SELECTOR, '#header > div.search > input[type=text]').send_keys('ํŒŒ์ด์ฌ')

# 3. ๊ฒ€์ƒ‰ ๋ฒ„ํŠผ ์ž…๋ ฅ
driver.find_element(By.CSS_SELECTOR, '#header > div.search.on > button').click()