Posted 2022-06-01Updated 2022-06-13 minkuen chatbot / kakao_chatbot3 minutes read (About 442 words)

Crawling data for chatbot

크롤링 - 인기종목 top5

챗봇에 필요한 자료 수집 목적
크롤링 코드를 먼저 작성하고 이후에 스킬 형태로 변환하여 사용할 예정

준비

크롤링할 url 선택 : https://finance.naver.com/
추출할 정보를 우클릭 → 검사
표시된 html을 copy → copy selector

크롤링 코드

url과 copy selector를 이용하여 코드 작성

import requests
from bs4 import BeautifulSoup

import pandas as pd
import numpy as np
url = 'https://finance.naver.com/'

response = requests.get(url)
response.raise_for_status()
html = response.text
soup = BeautifulSoup(html, 'html.parser')
tbody = soup.select_one('#container > div.aside > div > div.aside_area.aside_popular > table > tbody')
trs = tbody.select('tr')
datas = []
for tr in trs:
    name = tr.select_one('th > a').get_text()
    current_price = tr.select_one('td').get_text() 
    change_direction = tr['class'][0]
    change_price = tr.select_one('td > span').get_text()
    datas.append([name, current_price, change_direction, change_price])

# print(datas)
df = pd.DataFrame(datas, columns=['종목명', '현재가', '등락', '전일대비' ], index=range(1, 6))
df = str(df)
print(df)

결과

Untitled

크롤링 코드-2-

앞의 코드를 조금만 바꿔보자

import requests
from bs4 import BeautifulSoup

import pandas as pd
import numpy as np
url = 'https://finance.naver.com/'

response = requests.get(url)
response.raise_for_status()
html = response.text
soup = BeautifulSoup(html, 'html.parser')
tbody = soup.select_one('#container > div.aside > div > div.aside_area.aside_popular > table > tbody')

trs = tbody.select('tr')
datas = []
for tr in trs:
    name = tr.select_one('a').get_text()
    current_price = tr.select_one('td').get_text()
    change_direction = []
    if tr['class'][0] == "up":
        change_direction.append("▲")
    else:
        change_direction.append("▼")
    change_price = tr.select_one('span').get_text()
    datas.append([name, current_price, change_direction, change_price])

# print(datas)
df = pd.DataFrame(datas, columns=['종목명', '현재가', '등락', '전일대비' ], index=range(1, 6))
df = str(df)
print(df)

결과

Untitled

Reference

Posted 2022-05-30Updated 2022-06-12 minkuen chatbot / kakao_chatbot5 minutes read (About 689 words)

Crawling start for chatbot data

크롤링 기초

크롤링한 데이터를 카카오 스킬을 통해 챗봇에 출력하도록 할 것이다
우선 크롤링 기초를 연습하고 시작할 예정

사이트 정보 가져오기 - requests 사용법

1. requests 모듈 설치

VSCord에서 실행. 가상환경 진입 후 진행.

1	pip install requests

2. URL 요청하기 -get

status_code 는 응답코드를 가져온다.
text에는 HTML 코드가 담겨 있다.

import requests

response = requests.get('https://www.naver.com/')

print(response.status_code)
print(response.text)

status_code 의 응답코드는 200이 출력된다.
text의 HTML 코드는 다음과 같이 출력된다.

Untitled

사이트 정보 추출하기 - beaurifulsoup 사용법

0. BeautifulSoup가 필요한 이유

request.text를 이용해 가져온 데이터는 텍스트형태의 html.
텍스트형태의 데이터에서 원하는 html 태그를 추출할 수 있을까?
이를 쉽게 할 수 있게 도와주는 녀석이 바로 “뷰티풀수프”.
즉, html을 수프객체로 만들어서 추출하기 쉽게 만들어준다.

1. beautifulsoup 설치

1	pip install beautifulsoup4

2.beautifulsoup 사용법

정보를 추출할 사이트의 url 참고
응답 코드가 200 일때, html 을 받아와 soup 객체로 변환

import requests
from bs4 import BeautifulSoup

url = 'https://kin.naver.com/search/list.nhn?query=%ED%8C%8C%EC%9D%B4%EC%8D%AC'

response = requests.get(url)

if response.status_code == 200:
    html = response.text 
    soup = BeautifulSoup(html, 'html.parser')
    print(soup)

else :
    print(response.status_code)

웹 크롤링 예제

추출할 정보를 우클릭 → 검사
- 다음과 같이 선택한 정보의 html이 표시된다

Untitled

해당 html 우클릭 → copy → copy selector

Untitled

복사한 copy selector를 붙여넣어서 코드 작성
- copy selector 예시 : #s_content > div.section > ul > li:nth-child(1) > dl > dt > a

import requests
from bs4 import BeautifulSoup

url = 'https://kin.naver.com/search/list.nhn?query=%ED%8C%8C%EC%9D%B4%EC%8D%AC'

response = requests.get(url)

if response.status_code == 200:
    html = response.text 
    soup = BeautifulSoup(html, 'html.parser')
    title = soup.select_one('#s_content > div.section > ul > li:nth-child(1) > dl > dt > a')
    print(title)

else :
    print(response.status_code)

다음과 같이 처음 지정했던 정보가 출력된다.

Untitled

텍스트만 출력

텍스트만 뽑아오고 싶다면 get_text() 함수를 이용하면 됩니다.

import requests
from bs4 import BeautifulSoup

url = 'https://kin.naver.com/search/list.nhn?query=%ED%8C%8C%EC%9D%B4%EC%8D%AC'

response = requests.get(url)

if response.status_code == 200:
    html = response.text 
    soup = BeautifulSoup(html, 'html.parser')
    title = soup.select_one('#s_content > div.section > ul > li:nth-child(1) > dl > dt > a')
    print(title.get_text()) # get_text() 이용하여 텍스트만 출력

else :
    print(response.status_code)

결과

Untitled

Reference
- https://wikidocs.net/85739
- https://github.com/AHNDUHONG/Kakaotalk_Chatbot_Finance/blob/main/bash/top5_search.py

Posted 2022-05-03Updated 2022-05-06 minkuen setting / crawling3 minutes read (About 509 words)

Crawling_practice

웹 크롤링을 시도해본다.
우선 Pycharm 환경에서 가상환경을 생성해야 한다.

바탕화면에 crawling 폴더 생성

→ 우클릭하여 pycharm으로 열기

→ File → Settings

→ Project : crawling → python interpreter

→ 톱니모양 → add

Untitled

필요한 패키지들을 설치한다.

→ git bash 터미널

→pip install beautifulsoup4

→pip install numpy pandas matplotlib seaborn

→pip install requests

브라우저에서 검색을 진행

→ 검색 : 확진자수

→ 우클릭 → 검사

Untitled

원하는 정보가 포함된 코드를 선택 가능

Untitled

index.html을 생성한다.

crawling 폴더 우클릭 → New → HTML.file

Untitled

다음과 같이 입력하고 index 파일을 열어본다.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <titl>test</titl>
</head>
<body>
    <h1>aaaaaaaa</h1>
    <h2>dddd</h2>
    <div class="chapter01">
        <p>Don't Crawl here </p>
    </div>
    <div class="chapter02">
        <p>Just Crawling here</p>
    </div>
</body>
</html>

index 파일을 열면 index.html에 작성한 대로 출력된다.

Untitled

이번에는 다른 파일에 코드를 작성해보자.
- 일단 main.py에 작성한다.

from bs4 import BeautifulSoup

# 첫 번째 작업 index.html 파일을 BeautifulSoup 객체로 변환
## 클래스 변환 --> 클래스 내부의 메서드 사용

# html 파일을 변환
soup = BeautifulSoup(open("index.html", encoding='UTF-8'), "html.parser")
# print(type(soup))

# print(soup.find("div", class_="chapter02"))
# print(soup.find("p"))
results = soup.find_all("p")
print(results[1])

Index.html에서 크롤링하여 다음과 같이 출력된다.

→ python main.py

Untitled

팁

정렬 : ctrl + alt + l

Python

웹상에 있는 데이터를 숩집하는 도구

BeautifulSoup 가장 일반적인 수집 도구 (CSS 통해서 수집)
Scrapy (CSS, XAPTH 통해서 데이터 수집 + JavaScript)
Selenium (CSS, XPATH 통해서 데이터 수집 + JAVAScript)

—> 자바 필요 + 여러가지 설치 도구 필요

웹 사이트 만드는 3대 조건 + 1

HTML, CSS, JavaScript, Ajax (비동기처리)

웹 사이트 구동 방식

GET / POST
Reference
- Beautiful Soup Documentation — Beautiful Soup 4.9.0 documentation (crummy.com)

Crawling 실습

Posted 2022-05-02Updated 2022-05-05 minkuen setting / crawling3 minutes read (About 509 words)

Crawling_setting

웹 크롤링을 시도해본다.
우선 Pycharm 환경에서 가상환경을 생성해야 한다.

바탕화면에 crawling 폴더 생성

→ 우클릭하여 pycharm으로 열기

→ File → Settings

→ Project : crawling → python interpreter

→ 톱니모양 → add

Untitled

필요한 패키지들을 설치한다.

→ git bash 터미널

→pip install beautifulsoup4

→pip install numpy pandas matplotlib seaborn

→pip install requests

브라우저에서 검색을 진행

→ 검색 : 확진자수

→ 우클릭 → 검사

Untitled

원하는 정보가 포함된 코드를 선택 가능

Untitled

index.html을 생성한다.

crawling 폴더 우클릭 → New → HTML.file

Untitled

다음과 같이 입력하고 index 파일을 열어본다.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <titl>test</titl>
</head>
<body>
    <h1>aaaaaaaa</h1>
    <h2>dddd</h2>
    <div class="chapter01">
        <p>Don't Crawl here </p>
    </div>
    <div class="chapter02">
        <p>Just Crawling here</p>
    </div>
</body>
</html>

index 파일을 열면 index.html에 작성한 대로 출력된다.

Untitled

이번에는 다른 파일에 코드를 작성해보자.
- 일단 main.py에 작성한다.

from bs4 import BeautifulSoup

# 첫 번째 작업 index.html 파일을 BeautifulSoup 객체로 변환
## 클래스 변환 --> 클래스 내부의 메서드 사용

# html 파일을 변환
soup = BeautifulSoup(open("index.html", encoding='UTF-8'), "html.parser")
# print(type(soup))

# print(soup.find("div", class_="chapter02"))
# print(soup.find("p"))
results = soup.find_all("p")
print(results[1])

Index.html에서 크롤링하여 다음과 같이 출력된다.

→ python main.py

Untitled

팁

정렬 : ctrl + alt + l

Python

웹상에 있는 데이터를 숩집하는 도구

BeautifulSoup 가장 일반적인 수집 도구 (CSS 통해서 수집)
Scrapy (CSS, XAPTH 통해서 데이터 수집 + JavaScript)
Selenium (CSS, XPATH 통해서 데이터 수집 + JAVAScript)

—> 자바 필요 + 여러가지 설치 도구 필요

웹 사이트 만드는 3대 조건 + 1

HTML, CSS, JavaScript, Ajax (비동기처리)

웹 사이트 구동 방식

GET / POST
Reference
- Beautiful Soup Documentation — Beautiful Soup 4.9.0 documentation (crummy.com)

Crawling 실습

Crawling data for chatbot

크롤링 - 인기종목 top5

준비

크롤링 코드

크롤링 코드-2-

Crawling start for chatbot data

크롤링 기초

사이트 정보 가져오기 - requests 사용법

1. requests 모듈 설치

2. URL 요청하기 -get

사이트 정보 추출하기 - beaurifulsoup 사용법

0. BeautifulSoup가 필요한 이유

1. beautifulsoup 설치

2.beautifulsoup 사용법

웹 크롤링 예제

텍스트만 출력

Crawling_practice

팁

Crawling_setting

팁

Links

Categories

Recents

Archives

Tags

Subscribe for updates

follow.it