We get a lot of requests to scrape data from Yelp. These requests come in on a daily basis, sometimes several times a day. At the same time we have not seen a good business case for a commercial project with scraping Yelp.
We have decided to release a simple example Yelp robot which anyone can run on Chrome inside your computer, tune to your own requirements and collect some data. With this robot you can save business contact information like address, postal code, telephone numbers, website addresses etc. Robot is placed in our Demo space on Web Robots portal for anyone to use, just sign up, find the robot and use it.
How to use it:
- Sign in to our portal here.
- Download our scraping extension from here.
- Find robot named Yelp_us_demo in the dropdown.
- Modify start URL to the first page of your search results. For example: http://www.yelp.com/search?find_desc=Restaurants&find_loc=Arlington,+VA,+USA
- Click Run.
- Let robot finish it’s job and download data from portal.
Some things to consider:
This robot is placed in our Demo space – therefore it is accessible to anyone. Anyone will be able to modify and run it, anyone will be able to download collected data. Robot’s code may be edited by someone else, but you can always restore it from sample code below. Yelp limits number of search results, so do not expect to scrape more results than you would normally see by search.
In case you want to create your own version of such robot, here it’s full code:
// starting URL above must be the first page of search results. // Example: http://www.yelp.com/search?find_desc=Restaurants&find_loc=Arlington,+VA,+USA steps.start = function () { var rows = []; // listings $(".biz-listing-large").each (function (i,v) { if ($("h3 a", v).length > 0) { var row = {}; row.company = $(".biz-name", v).text().trim(); row.reviews =$(".review-count", v).text().trim(); row.companyLink = $(".biz-name", v)[0].href; row.location = $(".secondary-attributes address", v).text().trim(); row.phone = $(".biz-phone", v).text().trim(); rows.push (row); } }); emit ("yelp", rows); // paging if ($(".next").length === 1) { next ($(".next")[0].href, "start"); } done(); };