Analysis of different web scraping approaches, best practices and challenges. Overview of Web Robots scraper features.
Recently we started enforcing that robot names can have only alphanumeric, underscore (_) and dash (-) characters and must be at least 3 characters long. The reason for this move is that robot names are used in generating run_id and later file names. Some nontypical characters in robot names were causing problems when processing files using various ETL tools, storing in file systems. All existing robots were modified by replacing non-compliant characters with underscores.
-Web Robots Team