Why I create this project

Most blog posts about Web Scraping talks about how to crawl a specific website or pages, but I insist it makes more sense to help people to learn how to analyze the website and choose the right way to get the job done well.

That is why I build ScrapingClub, my goal is to try to break down a complex web scraping mission such as crawling a bunch of websites to some small tasks so people can learn how to solve them step by step. What is more, if they have trouble solving the exercises, they can ask for help with more detail instead of "I have trouble crawling the website, please help!".

Who might need this project

Any people who want to learn web scraping, test the web scraping skills or want to make it for fun might need this project.

How it works

You will see many product detail pages and list pages on ScrapingClub. For example, Two product detail pages might look the same but use different ways to process the data. People should figure out the way and write a spider to extract data.

Short descriptions are at the top of each exercise page, which can help you understand what needs to do. And the tips or links can help you learn web scraping better.

You can use it to learn Web Scraping with any relevant tech using any language such as Python, Javascript, PHP. For example, you can build a web spider to crawl list page and product page using a PHP script to solve this exercise. Recursively Scraping pages

What if you have trouble completing the exercise?

Later I will create a project on Github hosting the solution code, and I will also write articles in more detail, please feel free to send me the message to let me know your thoughts.

How to keep updated?

You can subscribe to keep updated using the footer form.

Exercise List

Basic Info Scraping

Web scraping using XPath or CSS expression

Analyze JSON

Load JSON string and extract data

Recursively Scraping pages

Not only crawl products but also handle pagination

Mimicking Ajax requests

Inspect Ajax requests and mimic them

Inspect HTTP request

Learn to inspect the fields of HTTP request

Scraping Infinite Scrolling Pages (Ajax)

Learn to scrape infinite scrolling pages

Find gold in cookie

Make your spider can work with the cookie

Login form

Scrape data behind login form

Solve Captcha

Learn to scrape data behind a captcha

Decode minified javascript

Learn how to analyze minimized or compressed javascript

Crafted with ❤️ by MichaelYin