Python Scrapy 사이트 스크랩/크롤링

lang/py

Python Scrapy 사이트 스크랩/크롤링

C/H 2018. 6. 28. 08:30

Scrapy 설치 및 프로젝트 생성

# pip 설치 pip install scrapy ... Building wheels for collected packages: Twisted, PyDispatcher, zope.interface, pycparser  scrapy startproject projectname

Ubunut에서 pip로 설치후 프로젝트 생성이 되지 않을 경우

# ubuntu 설치 sudo apt install python-scrapy scrapy startproject projectname

프로젝트 구조

projectname/scrapy.cfg: 프로젝트 파일구조 설정
projectname/items.py: 크롤링한 결과가 저장될 형태 정의
projectname/pipelines.py: 크롤링한 데이터를 items에 맞게 가공하거나, 가공된 데이터를 어떻게 처리할것인지 정의
projectname/settings.py: 프로젝트 설정 파일
projectname/spyders/: 실제 크롤링시 동작할 파일

예제

# myspider.py import scrapy  class BlogSpider(scrapy.Spider):     name = 'blogspider'     start_urls = ['https://blog.scrapinghub.com']      def parse(self, response):         for title in response.css('h2.entry-title'):             yield {'title': title.css('a ::text').extract_first()}          for next_page in response.css('div.prev-post > a'):             yield response.follow(next_page, self.parse)

실행

scrapy runspider myspider.py ... 2018-06-26 10:14:50 [scrapy.core.scraper] DEBUG: Scraped from <200 https://blog.scrapinghub.com/page/5/> {'title': u'How Web Scraping is Revealing Lobbying and Corruption in Peru'} ...

'lang > py' 카테고리의 다른 글

Python virtualenv (0)	2018.08.17
python3 를 기본값으로 설정 (0)	2018.08.11
python helloWorld (0)	2018.06.11
파이썬 라이브러리 가상환경 묶기 (0)	2017.05.16
Python 정규표현식 re (0)	2016.12.12

현재글Python Scrapy 사이트 스크랩/크롤링

C.H가 끄적이는 개발자 로그

Godot3, HTML, Python, windows, error, nodejs, javascript, CSS, Godot, Google, node, Linux, API, PHP, mysql, Android, java, 구글, 우분투, ubuntu,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Blue Breeze

Python Scrapy 사이트 스크랩/크롤링

Scrapy 설치 및 프로젝트 생성

Ubunut에서 pip로 설치후 프로젝트 생성이 되지 않을 경우

프로젝트 구조

예제

실행

'lang > py' 카테고리의 다른 글

'lang/py'의 다른글

티스토리툴바

Python Scrapy 사이트 스크랩/크롤링

Scrapy 설치 및 프로젝트 생성

Ubunut에서 pip로 설치후 프로젝트 생성이 되지 않을 경우

프로젝트 구조

예제

실행

'lang > py' 카테고리의 다른 글

'lang/py'의 다른글

관련글

티스토리툴바