Most Common Robot Writing Mistakes

We compiled a list of the most frequently occurring mistakes in robot creation. This list is based on our own experience and what we encounter when doing support for our customers. This list should help robot developers avoid coding pitfalls and steer towards best practices.

Multiple done() statements

Sometimes developers leave multiple done() instances inside a step. This causes unexpected robot behavior because done() sends a signal to controlling mechanism that the step is done and controlling mechanism can start the next step from queue.

Bad

[javascript highlight=”4″] steps.start = function() {
$("a.navlink").each(function(i,v) {
next(v.href, "drillMenu");
done();
});
done();
};[/javascript]

Solution – ensure there is always just one done() that can fire inside a step.

Good

[javascript] steps.start = function() {
$("a.navlink").each(function(i,v) {
next(v.href, "drillMenu");
});
done();
};[/javascript]

done() does not wait for asynchronous code to finish

A step should have done() placed so it is executed only when all work within a step is finished. Sometimes done() is placed where it executes before asynchronous code can finish, although visually it looks like done() is at the end of a step. Typical scenarios where this mistake occurs are when steps.waitFor or jQuery Ajax are present.

Bad

[javascript highlight=”8,18″] steps.one = function() {
steps.waitFor("#product_price").then(function() {
var item = {
price : $("#product_price").text()
}
emit("Products", [item]);
});
done();
};

steps.two = function() {
$.get("http://example.com/price").done(function(resp) {
var item = {
price : $("#product_price", resp).text()
}
emit("Products", [item]);
});
done();
};[/javascript]

Solution – place done() where is executes when all necessary work within the step is completed for sure.

Good

[javascript] steps.one = function() {
steps.waitFor("#product_price").then(function() {
var item = {
price : $("#product_price").text()
}
emit("Products", [item]);
done();
});
};

steps.two = function() {
$.get("http://example.com/price").done(function(resp) {
var item = {
price : $("#product_price", resp).text()
}
emit("Products", [item]);
done();

});
};

[/javascript]

Too many emits within a step

Generating many emits from a single step can cause a problem on the robot’s performance and put unnecessary load on Web Robots backend.

Bad

[javascript highlight=”7″] steps.scrapeProducts = function() {
$(".product").each(function(i,v) {
var product = {
name : $(".title", v).text(),
price : $(".price", v).text()
}
emit("Products", [product]);
});
done();
}[/javascript]

Solution – declare an array where you will accumulate all data rows. Then emit the whole array just before finishing the step with done().

Good

[javascript] steps.scrapeProducts = function() {
var products = [];
$(".product").each(function(i,v) {
var product = {
name : $(".title", v).text(),
price : $(".price", v).text()
}
products.push(product);
});
emit("Products", products);
done();
}[/javascript]

 

Code is placed outside any given step

Placing any code outside a step causes this code to be executed on every single page that robot opens. So placing instructions regarding retries, robots.txt or skipping visited links outside a step will re-execute these instruction changes on every single page load.

Bad

[javascript highlight=”1,2″] setRetries(25000,10,500);
setSettings({ skipVisited : true, respectRobotsTxt : true});
console.log("This code will execute on any given step");

steps.start = function() {
// start robot work, etc.
done();
}[/javascript]

Solution – place settings instructions in a step which will executed only once during robot run. Usually in a start step.

Good

[javascript] steps.start = function() {
setRetries(25000,10,500);
setSettings({ skipVisited : true, respectRobotsTxt : true});
// start robot work, etc.
done();
}[/javascript]

 

Placing a fork() in multiple steps

A robot can fork into separate forked robots only once. Trying to fork repeatedly will cause unexpected behavior.

Bad

[javascript highlight=”3,10″] steps.start = function() {
$("a.main-menu").each(function(i,v) {
fork(v.href, "submenu");
});
done();
}

steps.submenu = function() {
$("a.sub-menu").each(function(i,v) {
fork(v.href, "scrapeProducts");
});
done();
}[/javascript]

Solution – ensure that there is only one forking place in robot’s execution.

Good

[javascript] steps.start = function() {
$("a.main-menu").each(function(i,v) {
next(v.href, "submenu");
});
done();
}

steps.submenu = function() {
$("a.sub-menu").each(function(i,v) {
fork(v.href, "scrapeProducts");
});
done();
}[/javascript]