Why I create this project

Most blog posts about Web Scraping talks about how to crawl a specific website or pages, but I insist it makes more sense to help people to learn how to analyze the website and choose the right way to get the job done well.

That is why I build ScrapingClub, my goal is to try to break down a complex web scraping mission such as crawling a bunch of websites to some small tasks so people can learn how to solve them step by step. What is more, if they have trouble solving the exercises, they can ask for help with more detail instead of "I have trouble crawling the website, please help!".

Who might need this project

Any people who want to learn web scraping, test the web scraping skills or want to make it for fun might need this project.

How it works

You will see many product detail pages and list pages on ScrapingClub. For example, Two product detail pages might look the same but use different ways to process the data. People should figure out the way and write a spider to extract data.

Short descriptions are at the top of each exercise page, which can help you understand what needs to do. And the tips or links can help you learn web scraping better.

You can use it to learn Web Scraping with any relevant tech using any language such as Python, Javascript, PHP. For example, you can build a web spider to crawl list page and product page using a PHP script to solve this exercise. Recursively Scraping pages

What if you have trouble completing the exercise?

Later I will create a project on Github hosting the solution code, and I will also write articles in more detail, please feel free to send me the message to let me know your thoughts.

How to keep updated?

You can subscribe to keep updated.

Exercise List

Basic Info Scraping

Web scraping using XPath or CSS expression


Analyze JSON

Load JSON string and extract data


Recursively Scraping pages

Not only crawl products but also handle pagination


Mimicking Ajax requests

Inspect Ajax requests and mimic them


Inspect HTTP request

Learn to inspect the fields of HTTP request


Scraping Infinite Scrolling Pages (Ajax)

Learn to scrape infinite scrolling pages


Find gold in cookie

Make your spider can work with the cookie


Login form

Scrape data behind login form


Solve Captcha

Learn to scrape data behind a captcha


Decode minified javascript

Learn how to analyze minimized or compressed javascript