Smart CSS Selectors

Smart CSS Selectors

Our web scraping business requires that we develop scraper robots quickly and efficiently. We can offer competitive pricing only if we are most efficient at creating robots for each source. Old saying “time is money” means a lot here and we always look for ways to do things better and faster.

In scraper development process everyone uses either Xpaths or CSS selectors to parse DOM for data to be extracted or links to crawl through. One can inspect DOM elements (via Google Chrome) for classes, IDs or other attributes. Then solve a small or big puzzle to write a selector. It requires knowing powerful CSS Selector syntax, detective work inside DOM, and some trial and error.

There is a nifty tool that allows us to automate this task and save time: Chrome extension SelectorGadget. Using this tool anyone can create CSS Selectors or Xpaths using point and click. It allows us to click on data we need to mark it as required, then if capture is not accurate additional clicks train the selector to capture exactly what we need. In the background SelectorGadget uses matching algorithm to generate CSS Selector or XPath and instantly displays how many data items will be captured.

Here is a video demonstration how to use SelectorGadget:

We use this tool a lot and find that it works great almost every time. There are a few points that SelectorGadget user should know:

  • In several cases where it cannot calculate a selector, but these are rare and typically seen on pages that extensively use CSS defined divs and spans to display everything.
  • Cannot handle over 4096 data items.
  • This tool is also available on github with MIT license.
  • This extension has embedded Google Analytics tracking code, so potentially it’s author can track on what websites you use it.
By | 2017-01-09T16:20:31+00:00 October 13th, 2014|Uncategorized|0 Comments