Most blog posts about Web Scraping talks about how to crawl a specific website or pages, but I insist it makes more sense to help people to learn how to analyze the website and choose the right way to get the job done well.
That is why I build
ScrapingClub, my goal is to try to break down a complex web scraping mission such as crawling a bunch of websites to some small tasks so people can learn how to solve them step by step. What is more, if they have trouble solving the exercises, they can ask for help with more detail instead of "I have trouble crawling the website, please help!".
Any people who want to learn web scraping, test the web scraping skills or want to make it for fun might need this project.
You will see many product detail pages and list pages on ScrapingClub. For example, Two product detail pages might look the same but use different ways to process the data. People should figure out the way and write a spider to extract data.
Short descriptions are at the top of each exercise page, which can help you understand what needs to do. And the tips or links can help you learn web scraping better.
Later I will create a project on Github hosting the solution code, and I will also write articles in more detail, please feel free to send me the message to let me know your thoughts.
You can subscribe to keep updated.
Web scraping using XPath or CSS expression
Load JSON string and extract data
Not only crawl products but also handle pagination
Inspect Ajax requests and mimic them
Learn to inspect the fields of HTTP request
Learn to scrape infinite scrolling pages
Make your spider can work with the cookie
Scrape data behind login form
Learn to scrape data behind a captcha