Most Common Robot Writing Mistakes

Most Common Robot Writing Mistakes 2017-01-09T16:20:30+00:00

Most Common Robot Writing Mistakes

We compiled a list of the most frequently occurring mistakes in robot creation. This list is based on our own experience and what we encounter when doing support for our customers. This list should help robot developers avoid coding pitfalls and steer towards best practices.

Multiple done() statements

Sometimes developers leave multiple done() instances inside a step. This causes unexpected robot behavior because done() sends a signal to controlling mechanism that the step is done and controlling mechanism can start the next step from queue.

Bad

steps.start = function() {
    $("a.navlink").each(function(i,v) {
        next(v.href, "drillMenu");
        done();
    });
    done();
};

Solution – ensure there is always just one done() that can fire inside a step.

Good

steps.start = function() {
    $("a.navlink").each(function(i,v) {
        next(v.href, "drillMenu");
    });
    done();
};

done() does not wait for asynchronous code to finish

A step should have done() placed so it is executed only when all work within a step is finished. Sometimes done() is placed where it executes before asynchronous code can finish, although visually it looks like done() is at the end of a step. Typical scenarios where this mistake occurs are when steps.waitFor or jQuery Ajax are present.

Bad

steps.one = function() {
    steps.waitFor("#product_price").then(function() {
        var item = {
            price : $("#product_price").text()
        }
        emit("Products", [item]);
    });
    done();
};

steps.two = function() {
    $.get("http://example.com/price").done(function(resp) {
        var item = {
            price : $("#product_price", resp).text()
        }
        emit("Products", [item]);
    });
    done();
};

Solution – place done() where is executes when all necessary work within the step is completed for sure.

Good

steps.one = function() {
    steps.waitFor("#product_price").then(function() {
        var item = {
            price : $("#product_price").text()
        }
        emit("Products", [item]);
        done();
    });
};

steps.two = function() {
    $.get("http://example.com/price").done(function(resp) {
        var item = {
            price : $("#product_price", resp).text()
        }
        emit("Products", [item]);
        done();
        
    });
};

Too many emits within a step

Generating many emits from a single step can cause a problem on the robot’s performance and put unnecessary load on Web Robots backend.

Bad

steps.scrapeProducts = function() {
    $(".product").each(function(i,v) {
        var product = {
            name : $(".title", v).text(),
            price : $(".price", v).text()
        }
        emit("Products", [product]);
    });
    done();
}

Solution – declare an array where you will accumulate all data rows. Then emit the whole array just before finishing the step with done().

Good

steps.scrapeProducts = function() {
    var products = [];
    $(".product").each(function(i,v) {
        var product = {
            name : $(".title", v).text(),
            price : $(".price", v).text()
        }
        products.push(product);
    });
    emit("Products", products);
    done();
}

 

Code is placed outside any given step

Placing any code outside a step causes this code to be executed on every single page that robot opens. So placing instructions regarding retries, robots.txt or skipping visited links outside a step will re-execute these instruction changes on every single page load.

Bad

setRetries(25000,10,500);
setSettings({ skipVisited : true, respectRobotsTxt : true});
console.log("This code will execute on any given step");

steps.start = function() {
    // start robot work, etc.
    done();
}

Solution – place settings instructions in a step which will executed only once during robot run. Usually in a start step.

Good

steps.start = function() {
    setRetries(25000,10,500);
    setSettings({ skipVisited : true, respectRobotsTxt : true});
    // start robot work, etc.
    done();
}

 

Placing a fork() in multiple steps

A robot can fork into separate forked robots only once. Trying to fork repeatedly will cause unexpected behavior.

Bad

steps.start = function() {
    $("a.main-menu").each(function(i,v) {
        fork(v.href, "submenu"); 
    });
    done();
}

steps.submenu = function() {
    $("a.sub-menu").each(function(i,v) {
        fork(v.href, "scrapeProducts"); 
    });
    done();
}

Solution – ensure that there is only one forking place in robot’s execution.

Good

steps.start = function() {
    $("a.main-menu").each(function(i,v) {
        next(v.href, "submenu"); 
    });
    done();
}

steps.submenu = function() {
    $("a.sub-menu").each(function(i,v) {
        fork(v.href, "scrapeProducts"); 
    });
    done();
}