Crawl a website with scrapy
Introduction In this article, we are going to see how to scrape information from a website, in particular, from all pages with a common URL pattern. We will see how to do that with Scrapy, a very powerful, and yet simple, scraping and web-crawling framework. For example, you might be interested in scraping information about each article of a blog, and store it information in a database. To achieve such a thing, we will see how to implement a simple spider using Scrapy, which will crawl the blog and store the extracted data into a MongoDB database. We will consider that you have a working MongoDB server, and that you have installed thepymongoandscrapypython packages, both installable with pip. If you have never toyed around with Scrapy, you should first read this short tutorial. First step, identify the URL pattern(s) In……