Using Sitemaps in Web Scraping Robots

By |2019-03-20T13:36:08+02:00March 25th, 2019|Web Scraping|1 Comment

We often use spidering through categories technique and pagination/infinite scroll when we need to discover and crawl all items of interest on a website. However there is a simpler and more straightforward approach for this –  just using sitemaps. Sitemap based robots are easier to maintain than a mix of category drilling, pagination and dynamic content loading imitation.

After all, sitemaps are designed for robots to find all resources on a particular domain.

Example of a sitemap:

Finding Sitemaps