nicerobot

About nicerobot

This author has not yet filled in any details.
So far nicerobot has created 21 blog entries.

PostgreSQL as a Service options comparison and benchmark

Background

PostgresSQL is great. But administering it can suck up a lot of time and for small teams using SaaS service is great value. We use and love Amazon RDS.  Until recently it was the only reasonable choice in the market. But in 2017 new options ore on the verge of becoming available.  Both Google and Azure clouds announced support and Amazon is also launching their Aurora service with PostgreSQL compatibility.

We did a quick comparison of those options.

TLDR:  Google and Aurora are a bit faster for the same money but it’s no free lunch. Azure tests are not yet done.

The Test

Use case we care about is “mid size” database and queries […]

By | 2017-06-27T14:38:39+00:00 June 27th, 2017|Uncategorized|0 Comments

New IDE Extension Release

Today we are releasing an update to our main extension – Web Robots Scraper IDE. This release has a version number 2017.6.20 and has several improvements in UI, proxy settings control, handling hash symbols in URLs.

Version 2017.6.20 RELEASE NOTES

  • UI: Robot run statistics is displayed in the same place and no longer “jumping”
  • UI: when robot finishes it’s status is a direct link to robot run list on portal. Run link is a direct link to data preview and download on portal.
  • setProxy() functionality has been expanded. See documentation for details.
  • Bugfix: fixed a bug where subsequent steps with URLs having identical address before # symbol were not loading correctly (Example: http://foobar.com#a and after that go to http://foobar.com#b).
  • Other internal engine improvements […]
By | 2017-06-27T14:14:12+00:00 June 21st, 2017|Uncategorized|0 Comments

Email And Social Media Links Crawling From Websites

At Web Robots we often get inquiries on projects to crawl social media links and emails from specific list of small websites. Such data is sought after by growth hackers and sales people for lead generation purposes. In this blog post we show an example robot which does exactly that and anyone can run such web scraping project using Web Robots Chrome extension on their own computer.

To start you will need account on Web Robots portal, Chrome extension and thats it. We placed a robot called leads_crawler in our portal’s Demo space so anyone can use it. In case robot’s code is changed below is complete source code for this robot. You […]

By | 2017-03-14T13:20:59+00:00 March 2nd, 2017|Uncategorized|0 Comments

Scraping Extension Update – version 2017.2.23

Recently we rolled out an updated version of our main web scraping extension which contains several important updates and new features. This update allows our users to develop and debug robots even faster than before. So what exactly is new?

  1. jQuery has been upgraded from version 1.10.2 to 2.2.4
  2. done() now can take a milliseconds delay parameter. For example done(1000); will delay step finish by 1 second.
  3. New tab Selectors which allows testing selectors inline and generates robot code. Selectors are immediately tested on browser’s active tab so developer can see if they work correctly. Copy code button copies Javascript code to clipboard which can be pasted directly into robot’s step.
  4. […]

By | 2017-06-27T14:18:45+00:00 February 24th, 2017|Uncategorized|0 Comments

Writing Better Data Collection Robots

At Web Robots we have a fanatical customer support. Large part of this is doing technical support for robot developers. For this we maintain live chatrooms, often do screenshares, joint code writing sessions with each of our customers’ development teams. This helps us solve most of the problems out customers encounter in minutes, difficult ones in several hours. This is not an exaggeration.

Writing web scraping robot

Based on the accumulated experience we were able to identify a list of the most common mistakes that robot writers can make. We published a list with specific code examples that illustrate the mistakes. Then each mistake has an explanation and a code example with solution. This list is highly recommended read for new to […]

By | 2017-01-09T16:20:30+00:00 November 10th, 2016|Uncategorized|0 Comments

Migrating PostgreSQL Databases From AWS RDS To Standalone

Intro

AWS RDS is very convenient and takes care of almost all DBA tasks. It just works as long as you stay inside AWS. But if you want to have a local copy of your database or need to move data to another host it can be tricky.

TL;DR – For our solution skip to part last section

What doesn’t work

AWS daily backups

AWS RDS by default creates daily backups of your data. First thought would be get such backup and restore it locally. But those are not regular Postgres backups. They probably are VM image copies, but no way to know, as you cannot copy or see them.  The only option is to restore them to another RDS instance.

Postgresql replication

AWS uses replication to maintain […]

By | 2017-01-09T16:20:30+00:00 October 14th, 2016|Uncategorized|0 Comments

Announcement: Robot Naming Change

Recently we started enforcing that robot names can have only alphanumeric, underscore (_) and dash (-) characters and must be at least 3 characters long. The reason for this move is that robot names are used in generating run_id and later file names. Some nontypical characters in robot names were causing problems when processing files using various ETL tools, storing in file systems. All existing robots were modified by replacing non-compliant characters with underscores.

-Web Robots Team

By | 2016-09-29T11:45:47+00:00 September 29th, 2016|Uncategorized|0 Comments

Scrape Instagram Followers

Our platform is often used by growth hackers for lead generation in social media networks. One such use case is building a list of Instagram followers from interestingprofiles. Today we placed one such robot into our portal‘s demo space for anyone to use. Robot is only 30 lines of Javascript code and works quite fast. We tested it with IBM’s Instagram which has 78k followers and it took only 14 minutes to scrape them.

instagram_robot

How to use this robot:

  1. Login to Web Robots portal on Chrome browser.
  2. Make sure you have Web Robots Chrome extension to run the robot.
  3. Open robot instagram_followers in our extension.
  4. Make sure you are logged in on […]
By | 2017-03-27T16:25:50+00:00 June 8th, 2016|Uncategorized|20 Comments

Scraping Yelp Data

We get a lot of requests to scrape data from Yelp. These requests come in on a daily basis, sometimes several times a day. At the same time we have not seen a good business case for a commercial project with scraping Yelp.

We have decided to release a simple example Yelp robot which anyone can run on Chrome inside your computer, tune to your own requirements and collect some data. With this robot you can save business contact information like address, postal code, telephone numbers, website addresses etc.  Robot is placed in our Demo space on Web Robots portal for anyone to use, just sign up, find the robot and use it.

Screen Shot 2016-03-01 at 3.22.41 PM

By | 2017-01-09T16:20:31+00:00 March 1st, 2016|Uncategorized|0 Comments

New Dataset – UK LPA Search

We are excited to announce UK LPA Search – it is a search engine for all UK’s local planning authorities. Until now there was no possibility to search LPA databases from one place. One had to find each LPA’s website and search inside it. Considering there are few hundred of them – this would not be an easy task for a human. Our robots have no problems indexing all databases and providing them as a single dataset.

A bonus point – we geocoded all requests and display them on a map. Therefore anyone can see what building permits are being issues around them. Example: Map of building permits in London […]

By | 2017-06-27T14:20:18+00:00 January 27th, 2016|Uncategorized|0 Comments