Most Common Robot Writing Mistakes
We compiled a list of the most frequently occurring mistakes in robot creation. This list is based on our own experience and what we encounter when doing support for our customers. This list should help robot developers avoid coding pitfalls and steer towards best practices.
Multiple done() statements
Sometimes developers leave multiple done() instances inside a step. This causes unexpected robot behavior because done() sends a signal to controlling mechanism that the step is done and controlling mechanism can start the next step from queue.
Bad
steps.start = function() { $("a.navlink").each(function(i,v) { next(v.href, "drillMenu"); done(); }); done(); };
Solution – ensure there is always just one done() that can fire inside a step.
Good
steps.start = function() { $("a.navlink").each(function(i,v) { next(v.href, "drillMenu"); }); done(); };
done() does not wait for asynchronous code to finish
A step should have done() placed so it is executed only when all work within a step is finished. Sometimes done() is placed where it executes before asynchronous code can finish, although visually it looks like done() is at the end of a step. Typical scenarios where this mistake occurs are when steps.waitFor or jQuery Ajax are present.
Bad
steps.one = function() { steps.waitFor("#product_price").then(function() { var item = { price : $("#product_price").text() } emit("Products", [item]); }); done(); }; steps.two = function() { $.get("http://example.com/price").done(function(resp) { var item = { price : $("#product_price", resp).text() } emit("Products", [item]); }); done(); };
Solution – place done() where is executes when all necessary work within the step is completed for sure.
Good
steps.one = function() { steps.waitFor("#product_price").then(function() { var item = { price : $("#product_price").text() } emit("Products", [item]); done(); }); }; steps.two = function() { $.get("http://example.com/price").done(function(resp) { var item = { price : $("#product_price", resp).text() } emit("Products", [item]); done(); }); };
Too many emits within a step
Generating many emits from a single step can cause a problem on the robot’s performance and put unnecessary load on Web Robots backend.
Bad
steps.scrapeProducts = function() { $(".product").each(function(i,v) { var product = { name : $(".title", v).text(), price : $(".price", v).text() } emit("Products", [product]); }); done(); }
Solution – declare an array where you will accumulate all data rows. Then emit the whole array just before finishing the step with done().
Good
steps.scrapeProducts = function() { var products = []; $(".product").each(function(i,v) { var product = { name : $(".title", v).text(), price : $(".price", v).text() } products.push(product); }); emit("Products", products); done(); }
Code is placed outside any given step
Placing any code outside a step causes this code to be executed on every single page that robot opens. So placing instructions regarding retries, robots.txt or skipping visited links outside a step will re-execute these instruction changes on every single page load.
Bad
setRetries(25000,10,500); setSettings({ skipVisited : true, respectRobotsTxt : true}); console.log("This code will execute on any given step"); steps.start = function() { // start robot work, etc. done(); }
Solution – place settings instructions in a step which will executed only once during robot run. Usually in a start step.
Good
steps.start = function() { setRetries(25000,10,500); setSettings({ skipVisited : true, respectRobotsTxt : true}); // start robot work, etc. done(); }
Placing a fork() in multiple steps
A robot can fork into separate forked robots only once. Trying to fork repeatedly will cause unexpected behavior.
Bad
steps.start = function() { $("a.main-menu").each(function(i,v) { fork(v.href, "submenu"); }); done(); } steps.submenu = function() { $("a.sub-menu").each(function(i,v) { fork(v.href, "scrapeProducts"); }); done(); }
Solution – ensure that there is only one forking place in robot’s execution.
Good
steps.start = function() { $("a.main-menu").each(function(i,v) { next(v.href, "submenu"); }); done(); } steps.submenu = function() { $("a.sub-menu").each(function(i,v) { fork(v.href, "scrapeProducts"); }); done(); }