changelog – Web Scraping Service

21 Jun, 2017

New IDE Extension Release

By nicerobot|2019-03-04T15:26:12+02:00June 21st, 2017|Web Scraping|2 Comments

Today we are releasing an update to our main extension – Web Robots Scraper IDE. This release has a version number 2017.6.20 and has several improvements in UI, proxy settings control, handling hash symbols in URLs.

Version 2017.6.20 RELEASE NOTES

UI: Robot run statistics is displayed in the same place and no longer “jumping”
UI: when robot finishes it’s status is a direct link to robot run list on portal. Run link is a direct link to data preview and download on portal.
setProxy() functionality has been expanded. See documentation for details.
Bugfix: fixed a bug where subsequent steps with URLs having identical address before # symbol were not loading correctly (Example: http://foobar.com#a and after that go to http://foobar.com#b).
Other internal engine improvements […]

24 Feb, 2017

Scraping Extension Update – version 2017.2.23

By nicerobot|2019-03-04T15:35:16+02:00February 24th, 2017|Web Scraping|0 Comments

Recently we rolled out an updated version of our main web scraping extension which contains several important updates and new features. This update allows our users to develop and debug robots even faster than before. So what exactly is new?

jQuery has been upgraded from version 1.10.2 to 2.2.4
done() now can take a milliseconds delay parameter. For example done(1000); will delay step finish by 1 second.
New tab Selectors which allows testing selectors inline and generates robot code. Selectors are immediately tested on browser’s active tab so developer can see if they work correctly. Copy code button copies Javascript code to clipboard which can be pasted directly into robot’s step.

[…]

29 Sep, 2016

Announcement: Robot Naming Change

By nicerobot|2019-03-04T15:53:34+02:00September 29th, 2016|Web Scraping|0 Comments

Recently we started enforcing that robot names can have only alphanumeric, underscore (_) and dash (-) characters and must be at least 3 characters long. The reason for this move is that robot names are used in generating run_id and later file names. Some nontypical characters in robot names were causing problems when processing files using various ETL tools, storing in file systems. All existing robots were modified by replacing non-compliant characters with underscores.

-Web Robots Team

3 Dec, 2015

New Features

By nicerobot|2019-03-04T16:16:49+02:00December 3rd, 2015|Web Scraping|0 Comments

We are happy to announce some new features in our robot writing framework. These features are:

Fork() – split robot into many parallel robots and run them simultaneously. This feature shortens long scraping jobs by parallelising them. Cloud autoscaling handles necessary instance capacity so our customers can run 100s of instances on-demand.
skipVisited – allows robot to intelligently skip steps to links that were already visited. Avoid data duplication, save robot running time.
respectRobotsTxt – crawl target sources with compliance to their robots.txt file.

These features are explained in detail and examples added to our framework documentation page.