Shallow and Deep Scan Robot Pairing
Shallow Robot Setup
Robot’s config should have key deepScanRobot which specifies name of paired deep scan robot.
Config may also have an optional key deepScanAjax set to true (default is false). This key is used to feed steps into deep robot without URL, instead URL is passed in as a parameter. Useful when it is not desired to open each product during deep crawl – fetch() or $.get() is used instead.
Robot should emit mandatory fields id,url. id – unique id for a row (product SKU, etc).
Example config:
{ "deepScanRobot": "aruodas_deep", "deepScanAjax": true }
Example robot code:
steps.start = function () { next ('http://www.aruodas.lt/namai/kaune/', 'browseList'); done(); }; // Browse through paging, visiting each position steps.browseList = function () { // paging if ($('a.page-bt:nth-last-child(1)').text() === '»') { next($('.page-bt:nth-last-child(1)')[0].href, 'browseList'); } var properties = []; $('.list-row .list-adress').each(function (i,v) { properties.push({ id : $('a', v).prop('href').split('-').pop().replace('/','').trim(), url: $('a', v).prop('href'), }); }); if(properties.length) emit('data', properties); done(); };
Deep Robot Setup
Robot’s config should have keys serverQueue and deepScan, both set to value true.
Robot’s start step should be present, but do nothing (just done()).
Robot should have step called deep_scan which will process url’s from shallow robot.
Robot should emit mandatory field id.
Example config:
{ "serverQueue": true, "deepScan": true }
Example robot code:
steps.start = function() { done(); }; steps.deep_scan = function(passData) { var oneObject = { id : $('.advert-controls').attr('data-id'), id_passed : passData.id }; emit('objects', [oneObject]); done(); };
Example deep_scan step when deepScanAjax is true:
steps.deep_scan = function(passData) { $.get(passData.url).then(function(resp) { var oneObject = { id : $('.advert-controls', resp).attr('data-id'), id_passed : passData.id, url_passed : passData.url }; emit('objects', [oneObject]); done(); }); };