<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>nicerobot &#8211; Web Scraping Service</title>
	<atom:link href="https://webrobots.io/author/nicerobot/feed/" rel="self" type="application/rss+xml" />
	<link>https://webrobots.io</link>
	<description>We do web scraping service better!</description>
	<lastBuildDate>Wed, 15 Feb 2023 08:39:14 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.5.8</generator>
	<item>
		<title>New Functions Added</title>
		<link>https://webrobots.io/new-functions-added/</link>
					<comments>https://webrobots.io/new-functions-added/#respond</comments>
		
		<dc:creator><![CDATA[nicerobot]]></dc:creator>
		<pubDate>Wed, 15 Feb 2023 08:39:14 +0000</pubDate>
				<category><![CDATA[Web Scraping]]></category>
		<guid isPermaLink="false">https://webrobots.io/?p=6294</guid>

					<description><![CDATA[Web Robots scraping framework documentation has been updated with new functions: blockImages() - changes browser settings regarding image downloading. This function is useful in scenarios when bandwidth is a concern. Sometimes it results in faster crawling speeds. allowImages() - reverses browser settings changes made by blockImages(). closeSocket() - closes all idle socket connections in browser. [...]]]></description>
										<content:encoded><![CDATA[<p><a href="https://webrobots.io/werobots-documentation/">Web Robots scraping framework documentation</a> has been updated with new functions:</p>
<ul>
<li><strong>blockImages()</strong> &#8211; changes browser settings regarding image downloading. This function is useful in scenarios when bandwidth is a concern. Sometimes it results in faster crawling speeds.</li>
<li><strong>allowImages()</strong> &#8211; reverses browser settings changes made by blockImages().</li>
<li><strong>closeSocket()</strong> &#8211; closes all idle socket connections in browser.</li>
</ul>
<p>Web Robots have been using these functions on the internal platform for over 6 months and they proved to be great help in some scenarios. Now they are available in our public extension.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://webrobots.io/new-functions-added/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Instant Data Scraper v1.0.7 released</title>
		<link>https://webrobots.io/instant-data-scraper-v1-0-7-released/</link>
					<comments>https://webrobots.io/instant-data-scraper-v1-0-7-released/#respond</comments>
		
		<dc:creator><![CDATA[nicerobot]]></dc:creator>
		<pubDate>Thu, 26 May 2022 12:40:34 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<guid isPermaLink="false">https://webrobots.io/?p=6235</guid>

					<description><![CDATA[Today we are releasing an update to our Instant Data Scraper. Version 1.0.7 has the following improvements: Performance improvement for websites with large HTML structure (Google Maps for example). Improved "Next" page button behaviour. Migrated to manifest version 3 and manifest version 2 will be phased out by Google soon. We are also working [...]]]></description>
										<content:encoded><![CDATA[<div class="fusion-fullwidth fullwidth-box fusion-builder-row-1 nonhundred-percent-fullwidth non-hundred-percent-height-scrolling"  style='background-color: #ffffff;background-position: center center;background-repeat: no-repeat;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-bottom-width:0px;border-color:#eae9e9;border-top-style:solid;border-bottom-style:solid;'><div class="fusion-builder-row fusion-row "><div  class="fusion-layout-column fusion_builder_column fusion_builder_column_1_1 fusion-builder-column-0 fusion-one-full fusion-column-first fusion-column-last 1_1"  style='margin-top:0px;margin-bottom:20px;'><div class="fusion-column-wrapper" style="padding: 0px 0px 0px 0px;background-position:left top;background-repeat:no-repeat;-webkit-background-size:cover;-moz-background-size:cover;-o-background-size:cover;background-size:cover;"   data-bg-url=""><div class="fusion-text"><p>Today we are releasing an update to our <a href="https://chrome.google.com/webstore/detail/instant-data-scraper/ofaokhiedipichpaobibbnahnkdoiiah">Instant Data Scraper</a>. Version 1.0.7 has the following improvements:</p>
<ul>
<li>
<div dir="auto">Performance improvement for websites with large HTML structure (Google Maps for example).</div>
</li>
<li>
<div dir="auto">Improved &#8220;Next&#8221; page button behaviour.</div>
</li>
<li>
<div dir="auto">Migrated to manifest version 3 and manifest version 2 will be phased out by Google soon.</div>
</li>
</ul>
<p>We are also working on some other features for Instant Data, they will be released in the near future!</p>
</div><div class="fusion-clearfix"></div></div></div></div></div><style type="text/css">.fusion-fullwidth.fusion-builder-row-1 a:not(.fusion-button):not(.fusion-builder-module-control):not(.fusion-social-network-icon):not(.fb-icon-element):not(.fusion-countdown-link):not(.fusion-rollover-link):not(.fusion-rollover-gallery):not(.fusion-button-bar):not(.add_to_cart_button):not(.show_details_button):not(.product_type_external):not(.fusion-quick-view):not(.fusion-rollover-title-link):not(.fusion-breadcrumb-link) , .fusion-fullwidth.fusion-builder-row-1 a:not(.fusion-button):not(.fusion-builder-module-control):not(.fusion-social-network-icon):not(.fb-icon-element):not(.fusion-countdown-link):not(.fusion-rollover-link):not(.fusion-rollover-gallery):not(.fusion-button-bar):not(.add_to_cart_button):not(.show_details_button):not(.product_type_external):not(.fusion-quick-view):not(.fusion-rollover-title-link):not(.fusion-breadcrumb-link):before, .fusion-fullwidth.fusion-builder-row-1 a:not(.fusion-button):not(.fusion-builder-module-control):not(.fusion-social-network-icon):not(.fb-icon-element):not(.fusion-countdown-link):not(.fusion-rollover-link):not(.fusion-rollover-gallery):not(.fusion-button-bar):not(.add_to_cart_button):not(.show_details_button):not(.product_type_external):not(.fusion-quick-view):not(.fusion-rollover-title-link):not(.fusion-breadcrumb-link):after {color: #03a9f4;}.fusion-fullwidth.fusion-builder-row-1 a:not(.fusion-button):not(.fusion-builder-module-control):not(.fusion-social-network-icon):not(.fb-icon-element):not(.fusion-countdown-link):not(.fusion-rollover-link):not(.fusion-rollover-gallery):not(.fusion-button-bar):not(.add_to_cart_button):not(.show_details_button):not(.product_type_external):not(.fusion-quick-view):not(.fusion-rollover-title-link):not(.fusion-breadcrumb-link):hover, .fusion-fullwidth.fusion-builder-row-1 a:not(.fusion-button):not(.fusion-builder-module-control):not(.fusion-social-network-icon):not(.fb-icon-element):not(.fusion-countdown-link):not(.fusion-rollover-link):not(.fusion-rollover-gallery):not(.fusion-button-bar):not(.add_to_cart_button):not(.show_details_button):not(.product_type_external):not(.fusion-quick-view):not(.fusion-rollover-title-link):not(.fusion-breadcrumb-link):hover:before, .fusion-fullwidth.fusion-builder-row-1 a:not(.fusion-button):not(.fusion-builder-module-control):not(.fusion-social-network-icon):not(.fb-icon-element):not(.fusion-countdown-link):not(.fusion-rollover-link):not(.fusion-rollover-gallery):not(.fusion-button-bar):not(.add_to_cart_button):not(.show_details_button):not(.product_type_external):not(.fusion-quick-view):not(.fusion-rollover-title-link):not(.fusion-breadcrumb-link):hover:after {color: #0074a2;}.fusion-fullwidth.fusion-builder-row-1 .pagination a.inactive:hover, .fusion-fullwidth.fusion-builder-row-1 .fusion-filters .fusion-filter.fusion-active a {border-color: #0074a2;}.fusion-fullwidth.fusion-builder-row-1 .pagination .current {border-color: #0074a2; background-color: #0074a2;}.fusion-fullwidth.fusion-builder-row-1 .fusion-filters .fusion-filter.fusion-active a, .fusion-fullwidth.fusion-builder-row-1 .fusion-date-and-formats .fusion-format-box, .fusion-fullwidth.fusion-builder-row-1 .fusion-popover, .fusion-fullwidth.fusion-builder-row-1 .tooltip-shortcode {color: #0074a2;}#main .fusion-fullwidth.fusion-builder-row-1 .post .blog-shortcode-post-title a:hover {color: #0074a2;}</style>
]]></content:encoded>
					
					<wfw:commentRss>https://webrobots.io/instant-data-scraper-v1-0-7-released/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Instant Data Scraper is now available on Micorosoft Edge</title>
		<link>https://webrobots.io/instant-data-scraper-is-now-available-on-micorosoft-edge/</link>
					<comments>https://webrobots.io/instant-data-scraper-is-now-available-on-micorosoft-edge/#respond</comments>
		
		<dc:creator><![CDATA[nicerobot]]></dc:creator>
		<pubDate>Fri, 18 Dec 2020 06:45:10 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<guid isPermaLink="false">https://webrobots.io/?p=6166</guid>

					<description><![CDATA[We received an invitation from Microsoft to publish our Chrome extensions to Microsoft Edge webstore. Microsoft Edge browser is Chrome bases, so porting extensions to it should be easy. Actually it was even easier than expected - Edge's developer dashboard accepted exact same zip file as we use for Chrome webstore. Extension just works [...]]]></description>
										<content:encoded><![CDATA[<div class="fusion-fullwidth fullwidth-box fusion-builder-row-2 nonhundred-percent-fullwidth non-hundred-percent-height-scrolling"  style='background-color: #ffffff;background-position: center center;background-repeat: no-repeat;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-bottom-width:0px;border-color:#eae9e9;border-top-style:solid;border-bottom-style:solid;'><div class="fusion-builder-row fusion-row "><div  class="fusion-layout-column fusion_builder_column fusion_builder_column_1_1 fusion-builder-column-1 fusion-one-full fusion-column-first fusion-column-last 1_1"  style='margin-top:0px;margin-bottom:20px;'><div class="fusion-column-wrapper" style="padding: 0px 0px 0px 0px;background-position:left top;background-repeat:no-repeat;-webkit-background-size:cover;-moz-background-size:cover;-o-background-size:cover;background-size:cover;"   data-bg-url=""><div class="fusion-text"><p>We received an invitation from Microsoft to publish our Chrome extensions to Microsoft Edge webstore. Microsoft Edge browser is Chrome bases, so porting extensions to it should be easy. Actually it was even easier than expected &#8211; Edge&#8217;s developer dashboard accepted exact same zip file as we use for Chrome webstore. Extension just works without any changes.</p>
<p>Anyone can download Instant Data for Microsoft Edge <a href="https://microsoftedge.microsoft.com/addons/detail/instant-data-scraper/onnjkaofddpgfmbcnbnfacjacjamelfa">here</a>.</p>
</div><div class="imageframe-align-center"><span style="-webkit-box-shadow: 3px 3px 7px rgba(0,0,0,0.3);box-shadow: 3px 3px 7px rgba(0,0,0,0.3);" class="fusion-imageframe imageframe-dropshadow imageframe-1 hover-type-none"><a class="fusion-no-lightbox" href="https://microsoftedge.microsoft.com/addons/detail/instant-data-scraper/onnjkaofddpgfmbcnbnfacjacjamelfa" target="_blank" aria-label="Instant Data on Microsoft Edge" rel="noopener noreferrer"><img fetchpriority="high" decoding="async" src="https://webrobots.io/wp-content/uploads/2020/12/Instant-Data-on-Microsoft-Edge.png" data-orig-src="https://webrobots.io/wp-content/uploads/2020/12/Instant-Data-on-Microsoft-Edge-1024x791.png" width="1024" height="791" alt="Instant Data on Microsoft Edge" class="lazyload img-responsive wp-image-6167" srcset="data:image/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%271350%27%20height%3D%271043%27%20viewBox%3D%270%200%201350%201043%27%3E%3Crect%20width%3D%271350%27%20height%3D%2731043%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E" data-srcset="https://webrobots.io/wp-content/uploads/2020/12/Instant-Data-on-Microsoft-Edge-200x155.png 200w, https://webrobots.io/wp-content/uploads/2020/12/Instant-Data-on-Microsoft-Edge-400x309.png 400w, https://webrobots.io/wp-content/uploads/2020/12/Instant-Data-on-Microsoft-Edge-600x464.png 600w, https://webrobots.io/wp-content/uploads/2020/12/Instant-Data-on-Microsoft-Edge-800x618.png 800w, https://webrobots.io/wp-content/uploads/2020/12/Instant-Data-on-Microsoft-Edge-1200x927.png 1200w, https://webrobots.io/wp-content/uploads/2020/12/Instant-Data-on-Microsoft-Edge.png 1350w" data-sizes="auto" data-orig-sizes="(max-width: 800px) 100vw, 1024px" /></a></span></div><div class="fusion-clearfix"></div></div></div></div></div><style type="text/css">.fusion-fullwidth.fusion-builder-row-2 a:not(.fusion-button):not(.fusion-builder-module-control):not(.fusion-social-network-icon):not(.fb-icon-element):not(.fusion-countdown-link):not(.fusion-rollover-link):not(.fusion-rollover-gallery):not(.fusion-button-bar):not(.add_to_cart_button):not(.show_details_button):not(.product_type_external):not(.fusion-quick-view):not(.fusion-rollover-title-link):not(.fusion-breadcrumb-link) , .fusion-fullwidth.fusion-builder-row-2 a:not(.fusion-button):not(.fusion-builder-module-control):not(.fusion-social-network-icon):not(.fb-icon-element):not(.fusion-countdown-link):not(.fusion-rollover-link):not(.fusion-rollover-gallery):not(.fusion-button-bar):not(.add_to_cart_button):not(.show_details_button):not(.product_type_external):not(.fusion-quick-view):not(.fusion-rollover-title-link):not(.fusion-breadcrumb-link):before, .fusion-fullwidth.fusion-builder-row-2 a:not(.fusion-button):not(.fusion-builder-module-control):not(.fusion-social-network-icon):not(.fb-icon-element):not(.fusion-countdown-link):not(.fusion-rollover-link):not(.fusion-rollover-gallery):not(.fusion-button-bar):not(.add_to_cart_button):not(.show_details_button):not(.product_type_external):not(.fusion-quick-view):not(.fusion-rollover-title-link):not(.fusion-breadcrumb-link):after {color: #03a9f4;}.fusion-fullwidth.fusion-builder-row-2 a:not(.fusion-button):not(.fusion-builder-module-control):not(.fusion-social-network-icon):not(.fb-icon-element):not(.fusion-countdown-link):not(.fusion-rollover-link):not(.fusion-rollover-gallery):not(.fusion-button-bar):not(.add_to_cart_button):not(.show_details_button):not(.product_type_external):not(.fusion-quick-view):not(.fusion-rollover-title-link):not(.fusion-breadcrumb-link):hover, .fusion-fullwidth.fusion-builder-row-2 a:not(.fusion-button):not(.fusion-builder-module-control):not(.fusion-social-network-icon):not(.fb-icon-element):not(.fusion-countdown-link):not(.fusion-rollover-link):not(.fusion-rollover-gallery):not(.fusion-button-bar):not(.add_to_cart_button):not(.show_details_button):not(.product_type_external):not(.fusion-quick-view):not(.fusion-rollover-title-link):not(.fusion-breadcrumb-link):hover:before, .fusion-fullwidth.fusion-builder-row-2 a:not(.fusion-button):not(.fusion-builder-module-control):not(.fusion-social-network-icon):not(.fb-icon-element):not(.fusion-countdown-link):not(.fusion-rollover-link):not(.fusion-rollover-gallery):not(.fusion-button-bar):not(.add_to_cart_button):not(.show_details_button):not(.product_type_external):not(.fusion-quick-view):not(.fusion-rollover-title-link):not(.fusion-breadcrumb-link):hover:after {color: #0074a2;}.fusion-fullwidth.fusion-builder-row-2 .pagination a.inactive:hover, .fusion-fullwidth.fusion-builder-row-2 .fusion-filters .fusion-filter.fusion-active a {border-color: #0074a2;}.fusion-fullwidth.fusion-builder-row-2 .pagination .current {border-color: #0074a2; background-color: #0074a2;}.fusion-fullwidth.fusion-builder-row-2 .fusion-filters .fusion-filter.fusion-active a, .fusion-fullwidth.fusion-builder-row-2 .fusion-date-and-formats .fusion-format-box, .fusion-fullwidth.fusion-builder-row-2 .fusion-popover, .fusion-fullwidth.fusion-builder-row-2 .tooltip-shortcode {color: #0074a2;}#main .fusion-fullwidth.fusion-builder-row-2 .post .blog-shortcode-post-title a:hover {color: #0074a2;}</style>
]]></content:encoded>
					
					<wfw:commentRss>https://webrobots.io/instant-data-scraper-is-now-available-on-micorosoft-edge/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Our Chrome extension has been updated</title>
		<link>https://webrobots.io/our-chrome-extension-has-been-updated/</link>
					<comments>https://webrobots.io/our-chrome-extension-has-been-updated/#respond</comments>
		
		<dc:creator><![CDATA[nicerobot]]></dc:creator>
		<pubDate>Thu, 01 Oct 2020 12:24:45 +0000</pubDate>
				<category><![CDATA[Web Scraping]]></category>
		<category><![CDATA[web scraping service]]></category>
		<guid isPermaLink="false">https://webrobots.io/?p=6143</guid>

					<description><![CDATA[Our public developer extension (IDE) has been untouched since March 2019. It may looks like Web Robots were stagnating, but actually we were constantly working on our internal systems like portal, cloud workers, cloud orchestration. We also has several internal releases of IDE for our staff. So the IDE published to the Chrome webstore [...]]]></description>
										<content:encoded><![CDATA[<div class="fusion-fullwidth fullwidth-box fusion-builder-row-3 nonhundred-percent-fullwidth non-hundred-percent-height-scrolling"  style='background-color: #ffffff;background-position: center center;background-repeat: no-repeat;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-bottom-width:0px;border-color:#eae9e9;border-top-style:solid;border-bottom-style:solid;'><div class="fusion-builder-row fusion-row "><div  class="fusion-layout-column fusion_builder_column fusion_builder_column_1_1 fusion-builder-column-2 fusion-one-full fusion-column-first fusion-column-last 1_1"  style='margin-top:0px;margin-bottom:20px;'><div class="fusion-column-wrapper" style="padding: 0px 0px 0px 0px;background-position:left top;background-repeat:no-repeat;-webkit-background-size:cover;-moz-background-size:cover;-o-background-size:cover;background-size:cover;"   data-bg-url=""><div class="fusion-text"><p>Our public developer extension (IDE) has been untouched since March 2019. It may looks like Web Robots were stagnating, but actually we were constantly working on our internal systems like portal, cloud workers, cloud orchestration. We also has several internal releases of IDE for our staff.</p>
<p>So the IDE published to the Chrome webstore is just the tip of the iceberg.</p>
<p>It is now September 2020 and time has come to release the <a href="https://chrome.google.com/webstore/detail/web-robots-scraper/pmagfjeddlknbohojnepcplpgjlincak?hl=en">new version to Chrome webstore</a>. We are glad that our extension passed webstore&#8217;s permission audit from the first time as our extension requires access to quite a few Chrome APIs in order to work properly and people at Google are getting ever stricter in their review process for extension permissions.</p>
<p>All changes and new features are listed in the <a href="https://webrobots.io/changelog/">changelog here</a>.</p>
<p><em>PS our IDE extension works only for users who have an approved account on Web Robots portal.</em></p>
</div><style type="text/css">.fusion-gallery-1 .fusion-gallery-image {border:0px solid #f6f6f6;}</style><div class="fusion-gallery fusion-gallery-container fusion-grid-3 fusion-columns-total-0 fusion-gallery-layout-grid fusion-gallery-1" style="margin:-5px;"><div style="padding:5px;" class="fusion-grid-column fusion-gallery-column fusion-gallery-column-3 hover-type-zoomin"><div class="fusion-gallery-image"><a href="https://webrobots.io/wp-content/uploads/2020/10/Screenshot-2020-10-01-at-15.18.32.png" data-title="Web scraping extension" title="Web scraping extension" data-caption="Robot editor and debugger" rel="noreferrer" data-rel="iLightbox[gallery_image_1]" class="fusion-lightbox" target="_self"><img loading="lazy" decoding="async" src="https://webrobots.io/wp-content/uploads/2020/10/Screenshot-2020-10-01-at-15.18.32.png" data-orig-src="https://webrobots.io/wp-content/uploads/2020/10/Screenshot-2020-10-01-at-15.18.32.png" width="716" height="564" alt="Web scraping extension" title="Web scraping extension" aria-label="Web scraping extension" class="lazyload img-responsive wp-image-6144" srcset="data:image/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27716%27%20height%3D%27564%27%20viewBox%3D%270%200%20716%20564%27%3E%3Crect%20width%3D%27716%27%20height%3D%273564%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E" data-srcset="https://webrobots.io/wp-content/uploads/2020/10/Screenshot-2020-10-01-at-15.18.32-200x158.png 200w, https://webrobots.io/wp-content/uploads/2020/10/Screenshot-2020-10-01-at-15.18.32-400x315.png 400w, https://webrobots.io/wp-content/uploads/2020/10/Screenshot-2020-10-01-at-15.18.32-600x473.png 600w, https://webrobots.io/wp-content/uploads/2020/10/Screenshot-2020-10-01-at-15.18.32.png 716w" data-sizes="auto" data-orig-sizes="(min-width: 2200px) 100vw, (min-width: 784px) 541px, (min-width: 712px) 784px, (min-width: 640px) 712px, " /></a></div></div><div style="padding:5px;" class="fusion-grid-column fusion-gallery-column fusion-gallery-column-3 hover-type-zoomin"><div class="fusion-gallery-image"><a href="https://webrobots.io/wp-content/uploads/2020/10/Screenshot-2020-10-01-at-15.18.46.png" data-title="Web scraping extension" title="Web scraping extension" data-caption="Reset Proxy and Allow images rescue buttons." rel="noreferrer" data-rel="iLightbox[gallery_image_1]" class="fusion-lightbox" target="_self"><img loading="lazy" decoding="async" src="https://webrobots.io/wp-content/uploads/2020/10/Screenshot-2020-10-01-at-15.18.46.png" data-orig-src="https://webrobots.io/wp-content/uploads/2020/10/Screenshot-2020-10-01-at-15.18.46.png" width="708" height="562" alt="Web scraping extension" title="Web scraping extension" aria-label="Web scraping extension" class="lazyload img-responsive wp-image-6145" srcset="data:image/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27708%27%20height%3D%27562%27%20viewBox%3D%270%200%20708%20562%27%3E%3Crect%20width%3D%27708%27%20height%3D%273562%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E" data-srcset="https://webrobots.io/wp-content/uploads/2020/10/Screenshot-2020-10-01-at-15.18.46-200x159.png 200w, https://webrobots.io/wp-content/uploads/2020/10/Screenshot-2020-10-01-at-15.18.46-400x318.png 400w, https://webrobots.io/wp-content/uploads/2020/10/Screenshot-2020-10-01-at-15.18.46-600x476.png 600w, https://webrobots.io/wp-content/uploads/2020/10/Screenshot-2020-10-01-at-15.18.46.png 708w" data-sizes="auto" data-orig-sizes="(min-width: 2200px) 100vw, (min-width: 784px) 541px, (min-width: 712px) 784px, (min-width: 640px) 712px, " /></a></div></div><div style="padding:5px;" class="fusion-grid-column fusion-gallery-column fusion-gallery-column-3 hover-type-zoomin"><div class="fusion-gallery-image"><a href="https://webrobots.io/wp-content/uploads/2020/10/Screenshot-2020-10-01-at-15.19.27.png" data-title="Web scraping extension" title="Web scraping extension" data-caption="Data preview pane - Excel style table and JSON preview." rel="noreferrer" data-rel="iLightbox[gallery_image_1]" class="fusion-lightbox" target="_self"><img loading="lazy" decoding="async" src="https://webrobots.io/wp-content/uploads/2020/10/Screenshot-2020-10-01-at-15.19.27.png" data-orig-src="https://webrobots.io/wp-content/uploads/2020/10/Screenshot-2020-10-01-at-15.19.27.png" width="748" height="586" alt="Web scraping extension" title="Web scraping extension" aria-label="Web scraping extension" class="lazyload img-responsive wp-image-6146" srcset="data:image/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27748%27%20height%3D%27586%27%20viewBox%3D%270%200%20748%20586%27%3E%3Crect%20width%3D%27748%27%20height%3D%273586%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E" data-srcset="https://webrobots.io/wp-content/uploads/2020/10/Screenshot-2020-10-01-at-15.19.27-200x157.png 200w, https://webrobots.io/wp-content/uploads/2020/10/Screenshot-2020-10-01-at-15.19.27-400x313.png 400w, https://webrobots.io/wp-content/uploads/2020/10/Screenshot-2020-10-01-at-15.19.27-600x470.png 600w, https://webrobots.io/wp-content/uploads/2020/10/Screenshot-2020-10-01-at-15.19.27.png 748w" data-sizes="auto" data-orig-sizes="(min-width: 2200px) 100vw, (min-width: 784px) 541px, (min-width: 712px) 784px, (min-width: 640px) 712px, " /></a></div></div><div class="clearfix"></div></div><div class="fusion-clearfix"></div></div></div></div></div><style type="text/css">.fusion-fullwidth.fusion-builder-row-3 a:not(.fusion-button):not(.fusion-builder-module-control):not(.fusion-social-network-icon):not(.fb-icon-element):not(.fusion-countdown-link):not(.fusion-rollover-link):not(.fusion-rollover-gallery):not(.fusion-button-bar):not(.add_to_cart_button):not(.show_details_button):not(.product_type_external):not(.fusion-quick-view):not(.fusion-rollover-title-link):not(.fusion-breadcrumb-link) , .fusion-fullwidth.fusion-builder-row-3 a:not(.fusion-button):not(.fusion-builder-module-control):not(.fusion-social-network-icon):not(.fb-icon-element):not(.fusion-countdown-link):not(.fusion-rollover-link):not(.fusion-rollover-gallery):not(.fusion-button-bar):not(.add_to_cart_button):not(.show_details_button):not(.product_type_external):not(.fusion-quick-view):not(.fusion-rollover-title-link):not(.fusion-breadcrumb-link):before, .fusion-fullwidth.fusion-builder-row-3 a:not(.fusion-button):not(.fusion-builder-module-control):not(.fusion-social-network-icon):not(.fb-icon-element):not(.fusion-countdown-link):not(.fusion-rollover-link):not(.fusion-rollover-gallery):not(.fusion-button-bar):not(.add_to_cart_button):not(.show_details_button):not(.product_type_external):not(.fusion-quick-view):not(.fusion-rollover-title-link):not(.fusion-breadcrumb-link):after {color: #03a9f4;}.fusion-fullwidth.fusion-builder-row-3 a:not(.fusion-button):not(.fusion-builder-module-control):not(.fusion-social-network-icon):not(.fb-icon-element):not(.fusion-countdown-link):not(.fusion-rollover-link):not(.fusion-rollover-gallery):not(.fusion-button-bar):not(.add_to_cart_button):not(.show_details_button):not(.product_type_external):not(.fusion-quick-view):not(.fusion-rollover-title-link):not(.fusion-breadcrumb-link):hover, .fusion-fullwidth.fusion-builder-row-3 a:not(.fusion-button):not(.fusion-builder-module-control):not(.fusion-social-network-icon):not(.fb-icon-element):not(.fusion-countdown-link):not(.fusion-rollover-link):not(.fusion-rollover-gallery):not(.fusion-button-bar):not(.add_to_cart_button):not(.show_details_button):not(.product_type_external):not(.fusion-quick-view):not(.fusion-rollover-title-link):not(.fusion-breadcrumb-link):hover:before, .fusion-fullwidth.fusion-builder-row-3 a:not(.fusion-button):not(.fusion-builder-module-control):not(.fusion-social-network-icon):not(.fb-icon-element):not(.fusion-countdown-link):not(.fusion-rollover-link):not(.fusion-rollover-gallery):not(.fusion-button-bar):not(.add_to_cart_button):not(.show_details_button):not(.product_type_external):not(.fusion-quick-view):not(.fusion-rollover-title-link):not(.fusion-breadcrumb-link):hover:after {color: #0074a2;}.fusion-fullwidth.fusion-builder-row-3 .pagination a.inactive:hover, .fusion-fullwidth.fusion-builder-row-3 .fusion-filters .fusion-filter.fusion-active a {border-color: #0074a2;}.fusion-fullwidth.fusion-builder-row-3 .pagination .current {border-color: #0074a2; background-color: #0074a2;}.fusion-fullwidth.fusion-builder-row-3 .fusion-filters .fusion-filter.fusion-active a, .fusion-fullwidth.fusion-builder-row-3 .fusion-date-and-formats .fusion-format-box, .fusion-fullwidth.fusion-builder-row-3 .fusion-popover, .fusion-fullwidth.fusion-builder-row-3 .tooltip-shortcode {color: #0074a2;}#main .fusion-fullwidth.fusion-builder-row-3 .post .blog-shortcode-post-title a:hover {color: #0074a2;}</style>
]]></content:encoded>
					
					<wfw:commentRss>https://webrobots.io/our-chrome-extension-has-been-updated/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Instant Data Users Group on Facebook</title>
		<link>https://webrobots.io/instant-data-users-group-on-facebook/</link>
					<comments>https://webrobots.io/instant-data-users-group-on-facebook/#respond</comments>
		
		<dc:creator><![CDATA[nicerobot]]></dc:creator>
		<pubDate>Tue, 28 Apr 2020 11:02:25 +0000</pubDate>
				<category><![CDATA[Datasets]]></category>
		<category><![CDATA[Web Scraping]]></category>
		<guid isPermaLink="false">https://webrobots.io/?p=6110</guid>

					<description><![CDATA[We have launched a Facebook group where Instant Data Scraper users will be able to find support for the extension which currently has 65k users. This extension is wildly popular, but at the same time it is completely free, hence Web Robots has limited capacity to answer questions arising from users. We hope that [...]]]></description>
										<content:encoded><![CDATA[<div class="fusion-fullwidth fullwidth-box fusion-builder-row-4 nonhundred-percent-fullwidth non-hundred-percent-height-scrolling"  style='background-color: #ffffff;background-position: center center;background-repeat: no-repeat;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-bottom-width:0px;border-color:#eae9e9;border-top-style:solid;border-bottom-style:solid;'><div class="fusion-builder-row fusion-row "><div  class="fusion-layout-column fusion_builder_column fusion_builder_column_1_1 fusion-builder-column-3 fusion-one-full fusion-column-first fusion-column-last 1_1"  style='margin-top:0px;margin-bottom:20px;'><div class="fusion-column-wrapper" style="padding: 0px 0px 0px 0px;background-position:left top;background-repeat:no-repeat;-webkit-background-size:cover;-moz-background-size:cover;-o-background-size:cover;background-size:cover;"   data-bg-url=""><div class="fusion-text"><p>We have launched a <a href="https://www.facebook.com/groups/instantdata/">Facebook group</a> where <a href="https://chrome.google.com/webstore/detail/instant-data-scraper/ofaokhiedipichpaobibbnahnkdoiiah">Instant Data Scraper</a> users will be able to find support for the extension which currently has 65k users. This extension is wildly popular, but at the same time it is completely free, hence Web Robots has limited capacity to answer questions arising from users.</p>
<p>We hope that new Facebook group will grow into a community where users can support each other.</p>
</div><div class="imageframe-align-center"><div class="fusion-image-frame-bottomshadow image-frame-shadow-2"><style>.fusion-image-frame-bottomshadow.image-frame-shadow-2{display:inline-block}.element-bottomshadow.imageframe-2:before, .element-bottomshadow.imageframe-2:after{-webkit-box-shadow: 0 17px 10px rgba(0,0,0,0.4);box-shadow: 0 17px 10px rgba(0,0,0,0.4);}</style><span class="fusion-imageframe imageframe-bottomshadow imageframe-2 element-bottomshadow hover-type-none"><a class="fusion-no-lightbox" href="https://www.facebook.com/groups/instantdata/" target="_blank" aria-label="Community Support Group" rel="noopener noreferrer"><img loading="lazy" decoding="async" src="https://webrobots.io/wp-content/uploads/2020/04/unnamed.png" data-orig-src="https://webrobots.io/wp-content/uploads/2020/04/unnamed.png" width="500" height="228" alt="Community Support Group" class="lazyload img-responsive wp-image-6111" srcset="data:image/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%27500%27%20height%3D%27228%27%20viewBox%3D%270%200%20500%20228%27%3E%3Crect%20width%3D%27500%27%20height%3D%273228%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E" data-srcset="https://webrobots.io/wp-content/uploads/2020/04/unnamed-200x91.png 200w, https://webrobots.io/wp-content/uploads/2020/04/unnamed-400x182.png 400w, https://webrobots.io/wp-content/uploads/2020/04/unnamed.png 500w" data-sizes="auto" data-orig-sizes="(max-width: 800px) 100vw, 500px" /></a></span></div></div><div class="fusion-clearfix"></div></div></div></div></div><style type="text/css">.fusion-fullwidth.fusion-builder-row-4 a:not(.fusion-button):not(.fusion-builder-module-control):not(.fusion-social-network-icon):not(.fb-icon-element):not(.fusion-countdown-link):not(.fusion-rollover-link):not(.fusion-rollover-gallery):not(.fusion-button-bar):not(.add_to_cart_button):not(.show_details_button):not(.product_type_external):not(.fusion-quick-view):not(.fusion-rollover-title-link):not(.fusion-breadcrumb-link) , .fusion-fullwidth.fusion-builder-row-4 a:not(.fusion-button):not(.fusion-builder-module-control):not(.fusion-social-network-icon):not(.fb-icon-element):not(.fusion-countdown-link):not(.fusion-rollover-link):not(.fusion-rollover-gallery):not(.fusion-button-bar):not(.add_to_cart_button):not(.show_details_button):not(.product_type_external):not(.fusion-quick-view):not(.fusion-rollover-title-link):not(.fusion-breadcrumb-link):before, .fusion-fullwidth.fusion-builder-row-4 a:not(.fusion-button):not(.fusion-builder-module-control):not(.fusion-social-network-icon):not(.fb-icon-element):not(.fusion-countdown-link):not(.fusion-rollover-link):not(.fusion-rollover-gallery):not(.fusion-button-bar):not(.add_to_cart_button):not(.show_details_button):not(.product_type_external):not(.fusion-quick-view):not(.fusion-rollover-title-link):not(.fusion-breadcrumb-link):after {color: #03a9f4;}.fusion-fullwidth.fusion-builder-row-4 a:not(.fusion-button):not(.fusion-builder-module-control):not(.fusion-social-network-icon):not(.fb-icon-element):not(.fusion-countdown-link):not(.fusion-rollover-link):not(.fusion-rollover-gallery):not(.fusion-button-bar):not(.add_to_cart_button):not(.show_details_button):not(.product_type_external):not(.fusion-quick-view):not(.fusion-rollover-title-link):not(.fusion-breadcrumb-link):hover, .fusion-fullwidth.fusion-builder-row-4 a:not(.fusion-button):not(.fusion-builder-module-control):not(.fusion-social-network-icon):not(.fb-icon-element):not(.fusion-countdown-link):not(.fusion-rollover-link):not(.fusion-rollover-gallery):not(.fusion-button-bar):not(.add_to_cart_button):not(.show_details_button):not(.product_type_external):not(.fusion-quick-view):not(.fusion-rollover-title-link):not(.fusion-breadcrumb-link):hover:before, .fusion-fullwidth.fusion-builder-row-4 a:not(.fusion-button):not(.fusion-builder-module-control):not(.fusion-social-network-icon):not(.fb-icon-element):not(.fusion-countdown-link):not(.fusion-rollover-link):not(.fusion-rollover-gallery):not(.fusion-button-bar):not(.add_to_cart_button):not(.show_details_button):not(.product_type_external):not(.fusion-quick-view):not(.fusion-rollover-title-link):not(.fusion-breadcrumb-link):hover:after {color: #0074a2;}.fusion-fullwidth.fusion-builder-row-4 .pagination a.inactive:hover, .fusion-fullwidth.fusion-builder-row-4 .fusion-filters .fusion-filter.fusion-active a {border-color: #0074a2;}.fusion-fullwidth.fusion-builder-row-4 .pagination .current {border-color: #0074a2; background-color: #0074a2;}.fusion-fullwidth.fusion-builder-row-4 .fusion-filters .fusion-filter.fusion-active a, .fusion-fullwidth.fusion-builder-row-4 .fusion-date-and-formats .fusion-format-box, .fusion-fullwidth.fusion-builder-row-4 .fusion-popover, .fusion-fullwidth.fusion-builder-row-4 .tooltip-shortcode {color: #0074a2;}#main .fusion-fullwidth.fusion-builder-row-4 .post .blog-shortcode-post-title a:hover {color: #0074a2;}</style>
]]></content:encoded>
					
					<wfw:commentRss>https://webrobots.io/instant-data-users-group-on-facebook/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Instant Data Scraper Update</title>
		<link>https://webrobots.io/instant-data-scraper-update/</link>
					<comments>https://webrobots.io/instant-data-scraper-update/#comments</comments>
		
		<dc:creator><![CDATA[nicerobot]]></dc:creator>
		<pubDate>Tue, 17 Dec 2019 13:58:47 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[dynamic website]]></category>
		<category><![CDATA[web scraping]]></category>
		<guid isPermaLink="false">https://webrobots.io/?p=6089</guid>

					<description><![CDATA[In October and November of this year we decided to survey Instant Data Scraper extension users to see where Web Robots team should focus for the next update. We already had some ideas from user emails that we received over last couple years, but we needed a more scientific proof to see which features [...]]]></description>
										<content:encoded><![CDATA[<div class="fusion-fullwidth fullwidth-box fusion-builder-row-5 nonhundred-percent-fullwidth non-hundred-percent-height-scrolling"  style='background-color: #ffffff;background-position: center center;background-repeat: no-repeat;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-bottom-width:0px;border-color:#eae9e9;border-top-style:solid;border-bottom-style:solid;'><div class="fusion-builder-row fusion-row "><div  class="fusion-layout-column fusion_builder_column fusion_builder_column_1_1 fusion-builder-column-4 fusion-one-full fusion-column-first fusion-column-last 1_1"  style='margin-top:0px;margin-bottom:20px;'><div class="fusion-column-wrapper" style="padding: 0px 0px 0px 0px;background-position:left top;background-repeat:no-repeat;-webkit-background-size:cover;-moz-background-size:cover;-o-background-size:cover;background-size:cover;"   data-bg-url=""><div class="fusion-text"><p>In October and November of this year we decided to survey <a href="https://chrome.google.com/webstore/detail/instant-data-scraper/ofaokhiedipichpaobibbnahnkdoiiah"><strong>Instant Data Scraper</strong></a> extension users to see where Web Robots team should focus for the next update. We already had some ideas from user emails that we received over last couple years, but we needed a more scientific proof to see which features would be most desired. Among features we consider things like infinite scroll support, running jobs on cloud, processing batches of URLs, proxy support, etc.</p>
<p>Before the end of the survey it became clear that infinite scroll support is by far most desired feature and decided to release it as soon as possible. One December 11th we published a 0.2.0 version to Chrome Webstore. Enjoy it!</p>
<p>Other features will follow as well. We are happy to see that our web scraping tool is growing through 40k users and has excellent reviews!</p>
<div id="attachment_6090" style="width: 1034px" class="wp-caption aligncenter"><img loading="lazy" decoding="async" aria-describedby="caption-attachment-6090" class="lazyload size-large wp-image-6090" src="https://webrobots.io/wp-content/uploads/2019/12/Screenshot-2019-12-17-at-15.55.11-1024x149.png" data-orig-src="https://webrobots.io/wp-content/uploads/2019/12/Screenshot-2019-12-17-at-15.55.11-1024x149.png" alt="Instant Data Scraper Installs" width="1024" height="149" srcset="data:image/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%271024%27%20height%3D%27149%27%20viewBox%3D%270%200%201024%20149%27%3E%3Crect%20width%3D%271024%27%20height%3D%273149%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E" data-srcset="https://webrobots.io/wp-content/uploads/2019/12/Screenshot-2019-12-17-at-15.55.11-200x29.png 200w, https://webrobots.io/wp-content/uploads/2019/12/Screenshot-2019-12-17-at-15.55.11-300x44.png 300w, https://webrobots.io/wp-content/uploads/2019/12/Screenshot-2019-12-17-at-15.55.11-400x58.png 400w, https://webrobots.io/wp-content/uploads/2019/12/Screenshot-2019-12-17-at-15.55.11-600x88.png 600w, https://webrobots.io/wp-content/uploads/2019/12/Screenshot-2019-12-17-at-15.55.11-768x112.png 768w, https://webrobots.io/wp-content/uploads/2019/12/Screenshot-2019-12-17-at-15.55.11-800x117.png 800w, https://webrobots.io/wp-content/uploads/2019/12/Screenshot-2019-12-17-at-15.55.11-1024x149.png 1024w, https://webrobots.io/wp-content/uploads/2019/12/Screenshot-2019-12-17-at-15.55.11-1200x175.png 1200w, https://webrobots.io/wp-content/uploads/2019/12/Screenshot-2019-12-17-at-15.55.11.png 1350w" data-sizes="auto" data-orig-sizes="(max-width: 1024px) 100vw, 1024px" /><p id="caption-attachment-6090" class="wp-caption-text">Installs per day over the lifetime of our extension.</p></div>
</div><div class="fusion-clearfix"></div></div></div></div></div>
]]></content:encoded>
					
					<wfw:commentRss>https://webrobots.io/instant-data-scraper-update/feed/</wfw:commentRss>
			<slash:comments>3</slash:comments>
		
		
			</item>
		<item>
		<title>Web Scraping vs Web Crawling</title>
		<link>https://webrobots.io/web-scraping-vs-web-crawling/</link>
					<comments>https://webrobots.io/web-scraping-vs-web-crawling/#comments</comments>
		
		<dc:creator><![CDATA[nicerobot]]></dc:creator>
		<pubDate>Mon, 06 May 2019 09:37:58 +0000</pubDate>
				<category><![CDATA[Web Scraping]]></category>
		<guid isPermaLink="false">https://webrobots.io/?p=6019</guid>

					<description><![CDATA[The internet is growing exponentially, and the amount of data available for extraction and analysis is growing along side it. It is no wonder then that many new and confusing terms are created and used every day, such as Data Science, Data mining, Data harvesting, Web scraping, Web crawling, etc. But what do they [...]]]></description>
										<content:encoded><![CDATA[<div class="fusion-fullwidth fullwidth-box fusion-builder-row-6 nonhundred-percent-fullwidth non-hundred-percent-height-scrolling"  style='background-color: #ffffff;background-position: center center;background-repeat: no-repeat;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-bottom-width:0px;border-color:#eae9e9;border-top-style:solid;border-bottom-style:solid;'><div class="fusion-builder-row fusion-row "><div  class="fusion-layout-column fusion_builder_column fusion_builder_column_1_1 fusion-builder-column-5 fusion-one-full fusion-column-first fusion-column-last 1_1"  style='margin-top:0px;margin-bottom:20px;'><div class="fusion-column-wrapper" style="padding: 0px 0px 0px 0px;background-position:left top;background-repeat:no-repeat;-webkit-background-size:cover;-moz-background-size:cover;-o-background-size:cover;background-size:cover;"   data-bg-url=""><div class="fusion-text"><p>The internet is growing exponentially, and the amount of data available for extraction and analysis is growing along side it. It is no wonder then that many new and confusing terms are created and used every day, such as Data Science, Data mining, Data harvesting, Web scraping, Web crawling, etc. But what do they mean? Is it important to understand the subtle differences, or is it all just fancy lingo? Let&#8217;s look at a couple of terms to try and answer these questions: <em><strong>Web Scraping</strong></em> and <em><strong>Web Crawling</strong></em>.</p>
<h2>Formal Answer</h2>
<p>Lets start with the formal definitions:</p>
<p><strong>Web crawling</strong> &#8211; A process where a program or automated script browses the World Wide Web in a methodical, automated manner.<br />
<strong>Web scraping</strong> &#8211; extracting specific data from the websites.</p>
<p>As you can see the terms have quite clear definitions, and some people suggest that it is crucial to understand the minute differences if you want to succeed in the industry. But is that true?</p>
<h2>Real World Answer</h2>
<p>We are a company that has been specializing in <strong>Web Scraping</strong> services for years. We talk to our present and prospective clients on daily basis, sometimes several times a day. And in these real world conversations the terms Web Scraping and Web Crawling are often used interchangeably without being precise at all. The reality is &#8211; there are websites out there that have valuable data that needs to be extracted in a structured format, and how you define the process is not important at all.</p>
<h2>What We Actually Do?</h2>
<p>When looking in retrospect at the projects we did during these years, a simple pattern emerges. Vast majority of our projects are about creating robots that do<strong> targeted web crawling</strong> (crawling not the entire internet, but only specific websites) and immediately do<strong> web scraping</strong> as the web page is retrieved. So both processes occur simultaneously in real time. Most often we discard almost the entire retrieved HTML document and save only the bits of information that are needed for our clients. In some cases we will save the entire HTML for traceability, or for further analysis. So the lines between <strong>web crawling</strong> and <strong>web scraping</strong> become somewhat blurred as the amount of data extracted varies.</p>
<p>In the end we found that the essential thing is clear communications about what needs to be done, rather than how to define it. However, this is just our opinion based on our experience, and depending on the project you might be working on, or the business model you might implement, you might reach a different conclusion. In any case, we can all agree &#8211; Web Scraping on scale is cool!</p>
</div><div class="fusion-clearfix"></div></div></div></div></div>
]]></content:encoded>
					
					<wfw:commentRss>https://webrobots.io/web-scraping-vs-web-crawling/feed/</wfw:commentRss>
			<slash:comments>3</slash:comments>
		
		
			</item>
		<item>
		<title>Advanced AJAX Techniques for Web Scraping</title>
		<link>https://webrobots.io/advanced-ajax-techniques-for-web-scraping/</link>
					<comments>https://webrobots.io/advanced-ajax-techniques-for-web-scraping/#respond</comments>
		
		<dc:creator><![CDATA[nicerobot]]></dc:creator>
		<pubDate>Wed, 10 Apr 2019 07:32:21 +0000</pubDate>
				<category><![CDATA[Web Scraping]]></category>
		<guid isPermaLink="false">https://webrobots.io/?p=5974</guid>

					<description><![CDATA[Basic AJAX usage within Web Robots scraper Best and simplest way to perform AJAX calls with the scraper is to use JQuery $.ajax() or the simplified $.get(), $.post() and $.getJSON() methods. [javascript] // Standard JQuery AJAX call $.ajax({ url:'https://webrobots.io', method: 'GET' }).done( function(resp){ console.log(resp); }); // Simplified AJAX call $.get('https://webrobots.io').done( function(resp){ console.log(resp); }); [/javascript] [...]]]></description>
										<content:encoded><![CDATA[<div class="fusion-fullwidth fullwidth-box fusion-builder-row-7 nonhundred-percent-fullwidth non-hundred-percent-height-scrolling"  style='background-color: #ffffff;background-position: center center;background-repeat: no-repeat;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-bottom-width:0px;border-color:#eae9e9;border-top-style:solid;border-bottom-style:solid;'><div class="fusion-builder-row fusion-row "><div  class="fusion-layout-column fusion_builder_column fusion_builder_column_1_1 fusion-builder-column-6 fusion-one-full fusion-column-first fusion-column-last 1_1"  style='margin-top:0px;margin-bottom:20px;'><div class="fusion-column-wrapper" style="padding: 0px 0px 0px 0px;background-position:left top;background-repeat:no-repeat;-webkit-background-size:cover;-moz-background-size:cover;-o-background-size:cover;background-size:cover;"   data-bg-url=""><div class="fusion-text"><h2><strong>Basic AJAX usage within Web Robots scraper</strong></h2>
<p>Best and simplest way to perform AJAX calls with the scraper is to use JQuery <a href="http://api.jquery.com/jquery.ajax/">$.ajax()</a> or the simplified <a href="https://api.jquery.com/jquery.get/">$.get()</a>, <a href="https://api.jquery.com/jquery.post/">$.post()</a> and <a href="https://api.jquery.com/jquery.getjson/">$.getJSON()</a> methods.</p>
<pre class="brush: jscript;">
// Standard JQuery AJAX call
$.ajax({
    url:'https://webrobots.io',
    method: 'GET'
}).done( function(resp){
    console.log(resp); 
});

// Simplified AJAX call
$.get('https://webrobots.io').done( function(resp){
   console.log(resp); 
});

</pre>
<p>Since AJAX is asynchronous, step done() should always be placed inside the AJAX callback function. Also, multiple AJAX calls shouldn&#8217;t be made inside a loop, instead a new step for the AJAX should be created and queued up with next() inside the loop.</p>
<h3><strong>Example incorrect and correct done() placement in AJAX:</strong></h3>
<h3><span style="color: #f03030;">INCORRECT</span></h3>
<pre class="brush: jscript; highlight: &#091;5&#093;;">
steps.start = function(){
    $.get('https://webrobots.io').done( function(resp){
        // some code
    });
    done(); 
}
</pre>
<h3><span style="color: #339966;">CORRECT</span></h3>
<pre class="brush: jscript; highlight: &#091;4&#093;;"> 
steps.start = function(){
    $.get('https://webrobots.io').done( function(resp){
        // some code;
        done(); 
    });
}
</pre>
<h3><strong>Example incorrect and correct AJAX looping:</strong></h3>
<h3><span style="color: #f03030;">INCORRECT</span></h3>
<pre class="brush: jscript;"> 
steps.start = function(){
   for( let url of urls){
       $.get(url).done( function(resp){
           // some code 
       });
   }
   done(); 
}
</pre>
<h3><span style="color: #339966;">CORRECT</span></h3>
<pre class="brush: jscript;">
 
steps.start = function(){
    for( let url of urls){
        next('','getUrl',url);
    }
    done(); 
}

steps.getUrl = function(url){
    $.get(url).done( function(resp){
         // some code;
         done(); 
    }); 
} 
</pre>
<hr />
<h2></h2>
<h2><strong>AJAX timeout</strong></h2>
<p>One issue with AJAX requests inside a step function is that the step global retry timeout and the AJAX timeout are independent, and in certain scenarios this can cause problems.</p>
<p>Consider this example. A GET request is performed, and since it is asynchronous, step done() function is placed inside the GET done block. If the GET fails, we can either call a done() function inside the .fail() block and move along with our scraping, or omit the .fail() block and force a step retry after our preset retry timeout.</p>
<pre class="brush: jscript;">
steps.start = function(){

    $.get('https://webrobots.io').done( function(response){
        // some code;
        done();
    })
    //.fail(done);

}
</pre>
<p>It works fine when the server returns a failed response (E.g. status code 404) or fails to respond whatsoever. However, depending on how the server is configured, it might return a valid response after a significant delay, sometimes above our locally set step retry timeout.  This means that even though the step has already finished, the code inside the GET done block will run and trigger a done(). Depending on the specific code, this can cause instability to the robot and unnecessary error logging. To avoid such a scenario a local AJAX timeout should be set up to be just below the step retry timeout (default is 60000 ms). In the example below, if any response is not received from the server within 55000ms, AJAX call will timeout and code will proceed to run as normal.</p>
<pre class="brush: jscript;">
steps.start = function(){

    // default retry timer is 60000ms, AJAX timeout should be a few seconds lower.
    $.ajaxSetup({timeout:55000});
    $.get('https://webrobots.io').done( function(response){
        // some code;
        done();
    })
    //.fail(done);
}
</pre>
<hr />
<h2></h2>
<h2><strong>Multiple simultaneous AJAX calls using $.when()</strong></h2>
<p>Performing several simultaneous AJAX calls is a very efficient way to handle certain scraping situations. One such situation is a website that loads parts of its content as static html, and other parts dynamically through various APIs. Consider an example website that performs a separate AJAX call to get the post content, one to get the post image, and another one for post reviews.  A simple approach could be to just stack all three AJAX calls to start as soon as the previous one finishes. We will use <a href="https://jsonplaceholder.typicode.com/">jsonplaceholder.typicode.com</a> to construct our example:</p>
<pre class="brush: jscript;">
steps.start = function(){
    $.get('https://jsonplaceholder.typicode.com/posts/1').done( function(r1){
        console.log( r1 ); 
        $.get('https://jsonplaceholder.typicode.com/photos/1').done( function(r2){
            console.log( r2 ); 
            $.get('https://jsonplaceholder.typicode.com/comments/1').done( function(r3){
                 console.log( r3 ); 
                 done();
            });
        });
    });
};
</pre>
<p>The downside of this approach is that a new AJAX call cannot start until the previous one ends, wasting valuable time. The solution is to use <a href="https://api.jquery.com/jquery.when/">JQuery.when()</a> method. It takes multiple Deferred objects as arguments, in this case $.get() methods, and will resolve its master Deferred as soon as all the Deferreds resolve, or reject the master Deferred as soon as one of the Deferreds is rejected. The arguments passed to the doneCallbacks provide the resolved values for each of the Deferreds, and matches the order the Deferreds were passed to $.when() method. Our example remade with $.when would look like this:</p>
<pre class="brush: jscript;">
steps.start = function(){

    let a1 = () =&gt; $.get('https://jsonplaceholder.typicode.com/posts/1');
    let a2 = () =&gt; $.get('https://jsonplaceholder.typicode.com/photos/1');
    let a3 = () =&gt; $.get('https://jsonplaceholder.typicode.com/comments/1');
    $.when( a1(), a2(), a3() ).then(function ( r1, r2, r3 ) {
        // r1, r2 and r3 are arguments resolved for the a1, a2 and a3 ajax requests, respectively.
        // Each argument is an array with the following structure: [ data, statusText, jqXHR ]
        console.log( r1 ); 
        console.log( r2 ); 
        console.log( r3 ); 
        done();
    });
}
</pre>
<p>This way all AJAX requests are started simultaneously and code proceeds when all responses are resolved. Depending on how many simultaneous requests are made and the response times from the server, this method has potential to significantly increase the speed of a robot.</p>
<hr />
<h2></h2>
<h2><strong>Dynamic number of simultaneous AJAX calls</strong></h2>
<p>During our web scraping journey, we came across a couple instances where it is useful to be able to make multiple AJAX calls when the number of calls is not known in advance. One such example would be taking links from multiple sitemaps and distributing them evenly between forks.  Unfortunately this cannot be accomplished using $.when() because it accepts a fixed number of arguments and returns the same amount of responses that each have to be specified individually. We can solve this by using ES6 <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/all">Promise.all()</a> method, which returns a single Promise that resolves when all of the promises passed as an array have resolved or when the array contains no promises. It rejects with the first promise that rejects. Here is an example using <a href="https://www.rottentomatoes.com">rottentomatoes.com </a>sitemap:</p>
<pre class="brush: jscript;">
steps.start = function(){
    $.get('https://www.rottentomatoes.com/sitemap.xml').done( function(response){
        let sitemaps = $('loc', response).map((i, v) =&gt; $(v).text() ).get()
        next('','distribute',sitemaps);
        done();
    })
}

steps.sitemaps = function( sitemaps ){
    // Creating an array of promises
    let promises = urls.map( url =&gt; $.get(url) );

    // Waiting for all AJAX promises to resolve before executing further code
    Promise.all( promises ).then( function( responses ){
        for( let r of responses ){
            // logging the number of links in each sitemap
            console.log( $('loc', r).length );
        }
        done();
    });
}
</pre>
<p><span style="color: #ff0000;">IMPORTANT: </span> This method should only be used when absolutely necessary because excessive amount of constant simultaneous requests could strain the target server or be identified as unwanted traffic and trigger blocking. So always use proper delays and follow robots.txt rules for each website you scrape.</p>
<hr />
<h2><strong>Vanilla JS AJAX use cases.</strong></h2>
<p>While JQuery.ajax() is handy, it has one disadvantage in that it always sets the <strong>x-requested-with : XMLHttpRequest</strong> header, and on very rare cases this affects the content of the response that is sent by the server.  To circumvent this, use Vanilla JS <a href="https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest">XMLHttpRequest</a> object or the modern <a href="https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API">fetch API.</a> Refer to their respective documentation pages for more info how to use them. Here are a couple of simple examples.</p>
<h3><strong>Example using XMLHttpRequest: </strong></h3>
<pre class="brush: jscript;">
var xhttp = new XMLHttpRequest();
xhttp.onreadystatechange = function() {
   if (this.readyState == 4 &amp;&amp; this.status == 200) {
       console.log( this.responseText );
   }
};

xhttp.open('GET', 'cookies.php', true);
xhttp.send();
</pre>
<h3><strong>Example using fetch: </strong></h3>
<pre class="brush: jscript;">
fetch('https://webrobots.io/').then(function(response){
    return response.text();
}).then(function(text){
    console.log(text);
});
</pre>
</div><div class="fusion-clearfix"></div></div></div></div></div>
]]></content:encoded>
					
					<wfw:commentRss>https://webrobots.io/advanced-ajax-techniques-for-web-scraping/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Using Sitemaps in Web Scraping Robots</title>
		<link>https://webrobots.io/using-sitemaps-in-web-scraping-robots/</link>
					<comments>https://webrobots.io/using-sitemaps-in-web-scraping-robots/#comments</comments>
		
		<dc:creator><![CDATA[nicerobot]]></dc:creator>
		<pubDate>Mon, 25 Mar 2019 09:41:15 +0000</pubDate>
				<category><![CDATA[Web Scraping]]></category>
		<category><![CDATA[sitemap]]></category>
		<category><![CDATA[web scraping]]></category>
		<guid isPermaLink="false">https://webrobots.io/?p=5876</guid>

					<description><![CDATA[We often use spidering through categories technique and pagination/infinite scroll when we need to discover and crawl all items of interest on a website. However there is a simpler and more straightforward approach for this -  just using sitemaps. Sitemap based robots are easier to maintain than a mix of category drilling, pagination and [...]]]></description>
										<content:encoded><![CDATA[<div class="fusion-fullwidth fullwidth-box fusion-builder-row-8 nonhundred-percent-fullwidth non-hundred-percent-height-scrolling"  style='background-color: #ffffff;background-position: center center;background-repeat: no-repeat;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-bottom-width:0px;border-color:#eae9e9;border-top-style:solid;border-bottom-style:solid;'><div class="fusion-builder-row fusion-row "><div  class="fusion-layout-column fusion_builder_column fusion_builder_column_1_1 fusion-builder-column-7 fusion-one-full fusion-column-first fusion-column-last 1_1"  style='margin-top:0px;margin-bottom:20px;'><div class="fusion-column-wrapper" style="padding: 0px 0px 0px 0px;background-position:left top;background-repeat:no-repeat;-webkit-background-size:cover;-moz-background-size:cover;-o-background-size:cover;background-size:cover;"   data-bg-url=""><div class="fusion-text"><p><span style="font-weight: 400;">We often use spidering through categories technique and pagination/infinite scroll when we need to discover and crawl all items of interest on a website. However there is a simpler and more straightforward approach for this &#8211;  just using </span><a href="https://en.wikipedia.org/wiki/Sitemaps"><b>sitemaps</b></a><span style="font-weight: 400;">. Sitemap based robots are easier to maintain than a mix of category drilling, pagination and dynamic content loading imitation.</span></p>
<p><span style="font-weight: 400;">After all, sitemaps are designed for robots to find all resources on a particular domain.</span></p>
<p><b>Example of a sitemap:</b></p>
</div><span style="width:100%;max-width:600px;" class="fusion-imageframe imageframe-none imageframe-3 hover-type-none"><img loading="lazy" decoding="async" src="https://webrobots.io/wp-content/uploads/2019/03/Screenshot-2019-02-22-at-15.37.54.png" data-orig-src="https://webrobots.io/wp-content/uploads/2019/03/Screenshot-2019-02-22-at-15.37.54-1024x392.png" width="1024" height="392" alt="" title="sitemap-example" class="lazyload img-responsive wp-image-5877" srcset="data:image/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%271384%27%20height%3D%27530%27%20viewBox%3D%270%200%201384%20530%27%3E%3Crect%20width%3D%271384%27%20height%3D%273530%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E" data-srcset="https://webrobots.io/wp-content/uploads/2019/03/Screenshot-2019-02-22-at-15.37.54-200x77.png 200w, https://webrobots.io/wp-content/uploads/2019/03/Screenshot-2019-02-22-at-15.37.54-400x153.png 400w, https://webrobots.io/wp-content/uploads/2019/03/Screenshot-2019-02-22-at-15.37.54-600x230.png 600w, https://webrobots.io/wp-content/uploads/2019/03/Screenshot-2019-02-22-at-15.37.54-800x306.png 800w, https://webrobots.io/wp-content/uploads/2019/03/Screenshot-2019-02-22-at-15.37.54-1200x460.png 1200w, https://webrobots.io/wp-content/uploads/2019/03/Screenshot-2019-02-22-at-15.37.54.png 1384w" data-sizes="auto" data-orig-sizes="(max-width: 800px) 100vw, 1024px" /></span><div class="fusion-text"><h1><span style="font-weight: 400;">Finding Sitemaps</span></h1>
<ul>
<li style="font-weight: 400;"><span style="font-weight: 400;">The fastest way to find a sitemap URL is to check </span><i><span style="font-weight: 400;">robots.txt</span></i><span style="font-weight: 400;"> file. For example </span><a href="https://www.rottentomatoes.com/robots.txt"><span style="font-weight: 400;">https://www.rottentomatoes.com/robots.txt</span></a></li>
<li style="font-weight: 400;"><span style="font-weight: 400;">We can also probe typical sitemap URLs like </span><i><span style="font-weight: 400;">domain.com/sitemap</span></i><span style="font-weight: 400;"> or </span><i><span style="font-weight: 400;">domain.com/sitemap.xml</span></i></li>
<li style="font-weight: 400;"><span style="font-weight: 400;">Sometimes just going to the homepage and searching for the keyword “sitemap” works</span></li>
<li style="font-weight: 400;"><span style="font-weight: 400;">If all above bear no fruit, google search can help (example: “target.com sitemap&#8221;).</span></li>
</ul>
<p><b>Example of domain/robots.txt:</b></p>
</div><span style="width:100%;max-width:600px;" class="fusion-imageframe imageframe-none imageframe-4 hover-type-none"><img loading="lazy" decoding="async" src="https://webrobots.io/wp-content/uploads/2019/03/sitemap-example-2-1.png" data-orig-src="https://webrobots.io/wp-content/uploads/2019/03/sitemap-example-2-1-1024x212.png" width="1024" height="212" alt="" title="sitemap-example-2" class="lazyload img-responsive wp-image-5885" srcset="data:image/svg+xml,%3Csvg%20xmlns%3D%27http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%27%20width%3D%271024%27%20height%3D%27212%27%20viewBox%3D%270%200%201024%20212%27%3E%3Crect%20width%3D%271024%27%20height%3D%273212%27%20fill-opacity%3D%220%22%2F%3E%3C%2Fsvg%3E" data-srcset="https://webrobots.io/wp-content/uploads/2019/03/sitemap-example-2-1-200x41.png 200w, https://webrobots.io/wp-content/uploads/2019/03/sitemap-example-2-1-400x83.png 400w, https://webrobots.io/wp-content/uploads/2019/03/sitemap-example-2-1-600x124.png 600w, https://webrobots.io/wp-content/uploads/2019/03/sitemap-example-2-1-800x166.png 800w, https://webrobots.io/wp-content/uploads/2019/03/sitemap-example-2-1.png 1024w" data-sizes="auto" data-orig-sizes="(max-width: 800px) 100vw, 1024px" /></span><div class="fusion-text"><h1><span style="font-weight: 400;">Working With Large Sitemaps</span></h1>
<p><span style="font-weight: 400;">Sitemaps usually have many thousands of records and opening them directly will freeze Chrome browser for several minutes while browser renders XML. Our best practice is to make $.get request to get a sitemap and process it.</span></p>
<p><b>example of getting a sitemap using an ajax</b> <b>request and filtering URLs:</b></p>
<pre class="brush: jscript;">

$.get('https://www.rottentomatoes.com/sitemap_0.xml').then( function(response){
    $('url loc',response).each( function(i, v){
        var url = $(v).text();

        // filtering: we only need URLs that have no further path after film name
        // we can filter out URLs with longer URL paths than film page has

        if(url.split('/').length &lt; 6) next(url,'getFilmInfo');

    });
    done();
});

</pre>
<h1><span style="font-weight: 400;">Downsides of Sitemap Approach</span></h1>
<ul>
<li style="font-weight: 400;"><span style="font-weight: 400;">A sitemap can be outdated (old URLs leading to 404 pages) and the site owner might not even notice that their sitemaps are incorrect. It is necessary to do spot checks to see if an URL works.</span></li>
<li style="font-weight: 400;"><span style="font-weight: 400;"> Sitemap might not have all the items listed in the normal website interface. Best practice is to spot check that items found on a website are present in the sitemap as well.</span></li>
<li style="font-weight: 400;"><span style="font-weight: 400;"> Sitemaps do not allow filtering items based on certain criteria. For example if we need only electronics from a large eshop, we still have to crawl all products and do filtering in the back-end.</span></li>
<li style="font-weight: 400;"><span style="font-weight: 400;">Sitemaps do show how popular an item is &#8211; for example we cannot infer if a particular item is on the first page in it’s category or somewhere near the end.</span></li>
</ul>
</div><div class="fusion-clearfix"></div></div></div></div></div>
]]></content:encoded>
					
					<wfw:commentRss>https://webrobots.io/using-sitemaps-in-web-scraping-robots/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
		<item>
		<title>Scraping Dynamic Websites Using The wait() Function</title>
		<link>https://webrobots.io/scraping-dynamic-websites-using-the-wait-function/</link>
					<comments>https://webrobots.io/scraping-dynamic-websites-using-the-wait-function/#respond</comments>
		
		<dc:creator><![CDATA[nicerobot]]></dc:creator>
		<pubDate>Mon, 04 Mar 2019 07:00:48 +0000</pubDate>
				<category><![CDATA[Web Scraping]]></category>
		<category><![CDATA[dynamic website]]></category>
		<category><![CDATA[web scraping]]></category>
		<guid isPermaLink="false">https://webrobots.io/?p=5842</guid>

					<description><![CDATA[Dynamic websites are one of the biggest headaches of every developer who works with web scraping robots. Data extraction becomes complicated when it cannot be found in the initial HTML of the website. For example, walmart.com loads product data via AJAX call after the initial DOM is rendered. Therefore we must wait and then extract [...]]]></description>
										<content:encoded><![CDATA[<div class="fusion-fullwidth fullwidth-box fusion-builder-row-9 nonhundred-percent-fullwidth non-hundred-percent-height-scrolling"  style='background-color: #ffffff;background-position: center center;background-repeat: no-repeat;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-bottom-width:0px;border-color:#eae9e9;border-top-style:solid;border-bottom-style:solid;'><div class="fusion-builder-row fusion-row "><div  class="fusion-layout-column fusion_builder_column fusion_builder_column_1_1 fusion-builder-column-8 fusion-one-full fusion-column-first fusion-column-last 1_1"  style='margin-top:0px;margin-bottom:20px;'><div class="fusion-column-wrapper" style="padding: 0px 0px 0px 0px;background-position:left top;background-repeat:no-repeat;-webkit-background-size:cover;-moz-background-size:cover;-o-background-size:cover;background-size:cover;"   data-bg-url=""><div class="fusion-text"><p>Dynamic websites are one of the biggest headaches of every developer who works with web scraping robots. Data extraction becomes complicated when it cannot be found in the initial HTML of the website. For example, walmart.com loads product data via AJAX call after the initial DOM is rendered. Therefore we must wait and then extract data from the DOM.For example Walmart’s product page https://grocery.walmart.com/ip/Viva-Paper-Towels-Choose-A-Sheet-1-Big-Roll/52291575 product data appears in selector $(&#8216;div[class^=&#8221;ProductPage__details&#8221;]&#8217;).</p>
<pre class="brush: jscript;">

steps.start = function() {

    console.log($('div[class^=&quot;ProductPage__details&quot;]').length);

    done();

};

</pre>
<p>Logged result is 0, as our code executes as soon as the DOM is ready, but before the element appears. There are several ways we can fix this.Simple waiting strategy &#8211; use setTimeout()We can use setTimeout() where we specify the number of milliseconds to wait before executing a piece of code. This way the browser has some time to process dynamic data and insert it into the DOM. In this example we introduce a simple 3 second wait:</p>
<pre class="brush: jscript;">

steps.start = function() {

    setTimeout(function() {

        console.log($('div[class^=&quot;ProductPage__details&quot;]').length);

        done();

    }, 3000);

};

</pre>
<p>Logged result is 1, which indicates that we found the expected data in the DOM. However, there are some drawbacks in this method, as the code will be delayed the same amount of time regardless of how much the website actually takes to handle its dynamic requests. This means we are wasting time when the product data appears sooner and missing data when product data loads slower.Dynamic pages have a tendency to load inconsistently, therefore the exact timeout duration for each page load is impossible to know in advance. The maximum observed delay time is usually chosen when using setTimeout(). If we are waiting for 3 seconds, average time for data to appear is 1.5 seconds, and we have to process 50,000 products &#8211; then 20.83 hours are wasted. This is 625 hours per month if we run this robot every day!Better waiting strategy &#8211; use wait()Web Robots system wait() function enables the user to wait for a particular HTML element to load and then execute the code right after the element appears. wait(string or array selector[], int maxWaitTime)Default maxWaitTime = 10000;Usable callbacks: then, always, fails (Similar as with JQuery deferred https://api.jquery.com/jquery.deferred/)Example:</p>
<pre class="brush: jscript;">

steps.start = function() {

    wait('div[class^=&quot;ProductPage__details&quot;]').then(function() {

        console.log($('div[class^=&quot;ProductPage__details&quot;]').length);

        done();

    })

};

</pre>
<p>wait() can have multiple callbacks for scenarios when an element appears, does not appear, or always: wait(selector, time_to_wait*).then(callback) &#8211; callback function will be executed immediately when selector appears. If the selector doesn’t appear, the function will never be executed.wait(selector, time_to_wait*).always(callback) &#8211; callback function is executed when element appears or when time_to_wait is reached. wait(selector, time_to_wait*).then(callback).fail(callback2) &#8211; callback function will be executed when element appears. Callback2 will be executed if element does not appear.wait([selector1, selector2, …], time_to_wait*).then(callback) &#8211; callback function is executed only when all of the selectors (selector1, selector2, …) appeared on the website.*Time_to_wait &#8211; is an optional parameter that allows the user to choose the amount of milliseconds to wait for a specified selector. Default amount (if not specified in the function) is 10000 ms.wait() function makes scraping of dynamic pages much easier, more efficient and more reliable.</p>
</div><div class="fusion-clearfix"></div></div></div></div></div>
]]></content:encoded>
					
					<wfw:commentRss>https://webrobots.io/scraping-dynamic-websites-using-the-wait-function/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>

<!--
Performance optimized by W3 Total Cache. Learn more: https://www.boldgrid.com/w3-total-cache/

Page Caching using Disk: Enhanced 
Minified using Disk
Database Caching 74/117 queries in 0.055 seconds using Disk

Served from: webrobots.io @ 2026-04-08 19:23:06 by W3 Total Cache
-->