1. Setup tools
Install Chrome browser. Download from here.
Install Web Robots Chrome extension. Get it here
Sign-up to our portal portal.webrobots.io
2. Create new robot on portal.
Click ‘Robots‘ in top menu
Click ‘New robot‘ at the bottom of robots list
Chose any name and start url.
In this example we’ll use demo1 for named use http://careers.stackoverflow.com/jobs?sort=p as start url
3. Hello world robot
Open Web Robots extension and select you new robot from drop down
Enter code:
steps.start = function() { dbg('Hello, world!'); done(); };
Click “Run” button.
Wait until status message says robot is finished;
Run finished in: 1.075: demo_2015-01-09T09_25_44_727ZDone: 2 Que: 0 EmitQue: 0
See
[Log] tab to view log messages
4. Save data to server
Modify code to be:
steps.start = function() { dbg('Hello, world!'); var rows = []; var row = {}; row.price = 1; row.address = '100 main street'; row.city = 'New York'; row.rooms = 5; row.area = 50; rows.push(row); emit ('listings', rows); done(); };
Run robot and check the [Output] tab. It shows data we just published:
[ { 'price': 1, 'address': '100 main street', 'city': 'New York', 'rooms': 5, 'area': 50, 'source_url': 'http://newyork.craigslist.org/' } ]
5. Extract some data from page
Modify the robot code as shown below. This robot will extract every job posting from a page. Ensure that robot’s start URL is set to http://careers.stackoverflow.com/jobs?sort=p for this part.
steps.start = function() { var rows = []; $('.-job').each(function (i, v) { var row = {}; row.company = $('.-company span:first', v).text().trim(); row.location = $(v).find('.-company .fc-black-500').text().split('- \n').pop().trim(); row.position = $(v).find('.-title h2').text().trim(); row.tags = $(v).find('div.-tags').text().trim(); rows.push(row); }); emit ('jobs', rows); done(); };
After executing the robot check that data appeared on portal. Find your robot in portal, open it and see that there is a new run with your data.
6. Iterate over multiple pages
Add code to click ‘Next’ link on each page.
steps.start = function() { var rows = []; $('.-job').each(function (i, v) { var row = {}; row.company = $('.-company span:first', v).text().trim(); row.location = $(v).find('.-company .fc-black-500').text().split('- \n').pop().trim(); row.position = $(v).find('.-title h2').text().trim(); row.tags = $(v).find('div.-tags').text().trim(); rows.push(row); }); if ($('.test-pagination-next:first').length == 1) { next ($('.test-pagination-next')[0].href, 'start'); } emit ('jobs', rows); done(); };
Refresh portal page and you will see a new run with data extracted from all pages.