Shallow and Deep Scan Robot Pairing

Shallow Robot Setup

Robot’s config should have key deepScanRobot which specifies name of paired deep scan robot.
Config may also have an optional key deepScanAjax set to true (default is false). This key is used to feed steps into deep robot without URL, instead URL is passed in as a parameter. Useful when it is not desired to open each product during deep crawl – fetch() or $.get() is used instead.
Robot should emit mandatory fields id,url. id – unique id for a row (product SKU, etc).

Example config:

{
 "deepScanRobot": "aruodas_deep",
 "deepScanAjax": true
}

Example robot code:

steps.start = function () {
 next ('http://www.aruodas.lt/namai/kaune/', 'browseList');
done();
};

// Browse through paging, visiting each position
steps.browseList = function () {
// paging
 if ($('a.page-bt:nth-last-child(1)').text() === '»') {
  next($('.page-bt:nth-last-child(1)')[0].href, 'browseList');
 }

 var properties = [];

 $('.list-row .list-adress').each(function (i,v) {
  properties.push({
   id : $('a', v).prop('href').split('-').pop().replace('/','').trim(),
   url: $('a', v).prop('href'),
  });
 });

  if(properties.length) emit('data', properties);
  done();
};

Deep Robot Setup

Robot’s config should have keys serverQueue and deepScan, both set to value true.
Robot’s start step should be present, but do nothing (just done()).
Robot should have step called deep_scan which will process url’s from shallow robot.
Robot should emit mandatory field id.

Example config:

{
  "serverQueue": true,
  "deepScan": true
}

Example robot code:

steps.start = function() {
  done();
};

steps.deep_scan = function(passData) {

  var oneObject = {
    id : $('.advert-controls').attr('data-id'),
    id_passed : passData.id
  };
  emit('objects', [oneObject]);

  done();
};

Example deep_scan step when deepScanAjax  is true:

steps.deep_scan = function(passData) {
  $.get(passData.url).then(function(resp) {

    var oneObject = {
      id : $('.advert-controls', resp).attr('data-id'),
      id_passed : passData.id,
      url_passed : passData.url
    };

    emit('objects', [oneObject]);
    done();
  });
};