How to Safely Scrape Google Results?
-
I've built a couple of small tools that I use personally, maybe 2 or 3 times per day.
Both tools scrape the top 10 results from Google and provide more details about each domain (like the SEOMoz Keyword Difficulty Tool).
Google seem to have banned my IP address for automated searches... can anyone tell me a safe way of scraping the google results? Is there a suitable API for this?
How do SEO Moz do this on such a huge scale?
-
As I doubt that the APIs have considerably improved since this blog post http://www.seomoz.org/blog/the-nasty-problem-with-scraping-results-from-the-engines, google scraping is still a big issue and necessary for our daily seo work.
Scraping savely can only work if you succeed in convincing Google that you're a "natural" user and not a scarping robot. How can you do that?
- Search with alternating IPs, from different locations using proxies from the countries where you'd like to scrape from
- don't send too many requests at once from the same source
Consider that, when requesting a URL, the browser sends various information elements to the server, containing, for example, your Operating System, browser version, referer, etc. - every element can and should be changed to virtually change your identity when executing a new search.
- change browsers, browser versions, operating system information, etc.
- take care when changing browser localization values (en-GB, en-US probably don't return the same results)
- have a good network of proxy servers ready to send the different requests with your different identities to
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Possible google sandbox issue? Organically ranking page 1 for our number 1 keyword, but page 5 sometimes 6 on google?
What are some things I can look into to figure out why google is ranking us on page 5 sometimes page 6, with some slight rank boosts to rank 36 from 48 but then falls right back. While Yahoo and Bing rank us page 1 consistently, without these big drops back. I use google search ( webmaster tools ) daily, fix 404s and make sure to fetch new content I create. Our site is within the Sandbox issue time frame, google 1st indexed the site about a year and a half ago, the site has been through various SEO service checks, and has had those issues fixed ( some bigcommerce won't allow, such as full sitewide ssl and a few other small factors ) but all the big stuff was handled or will be 100% handled after this redesign is complete. But just still seems we're stuck in the google hole again. We do use adwords, but no clear signs as to why we'd suffer such hardships with google ranking, only thing we don't have optimized in terms of on page optimization from moz is keyword in url, of which will be changing within the next month or two, as we're rolling our a new redesign with SEO 100% at the forefront, nice url paths, with keywords in their url, much more responsive site that uses less resources. But before we release this redesign, I'd like to find out what toe we stubbed of google's to give us such a ranking blackeye... We don't have that many backlinks, and I know these are a huge factor, however, building quality backlinks it's harder than walking on water at time and on the same level as spinning hay into gold. Any ideas community...
Competitive Research | | Deacyde0 -
Tool to scrape data from Homedepot,ebay,amazon.
Does anybody know about a tool that you can use to aggregate data about the best seller products in categories for major retailers in USA? I did find a couple not so good tools for amazon and even ebay but retailers like walmart and homedepot are left out.
Competitive Research | | Harveyspecter0 -
So What's Up With Those Crappy Search Results?
I used to rank for some keywords now I've been outranked by crappy websites. But what amazes me most is that among the top 10 results for a particular keyphrase, 3 of these results point to websites that are no longer online! Worst than that, these websites have to backlinks! So how come 404 pages / non-existing websites rank higher than I do? Is Google loosing it or are they trying to create so much confusion in the hope that website owners will turn to Adwords?
Competitive Research | | sbrault740 -
Google Listings EMD Bias
I've been looking at 60+ location based searches for the base two months and noticed a big issue I can't explain. I know EMD was hit hard in the general SERPs but it obviously has not effected the location SERPs. The main way I'm finding these situations is by seeing the 7 pack and it shows a site with only a quarter amount of the citations the other sites have and jumps to the top very quickly. It appears to be working because of the EMD bias in the Local SERPs algorithm. From what I understand you are not suppose to add a TLD domain into a G+ listing and then 301 redirect it to your real domain but Google doesn't seem to mind at this point. I'm wondering if this tactic is a valid Local tactic at this time or if, from what I understand, it is a shady tactic that will end up hurting brand and have a strong chance of penalizing the real domain. 2012-12-13_10-45-39.png
Competitive Research | | BenRWoodard0 -
Quick question about country specific organic results
Do you think that if your website is from your home country. You will rank better for some keyword even when you dont have much page authority when compared to other websites having much higher page authorities from other countries.
Competitive Research | | ksbnok0 -
The starter crawl is going on 2 days and no results
Does the starter crawl work in the first 30 days? Mine has been going 2 days and still no results, has finished yet??
Competitive Research | | WalterW0 -
How do i improve my site's google ranking against my competitors?
the searches that i would like to improve my ranking on are
Competitive Research | | oswaldjr
st kitts real estate
nevis real estate
st kitts and nevis real estate. my website is www.sknlistings.com0 -
When providing search results for SEO purposes to you use the exact results in Google Adwords
Hi Mozzers Just quick question When an SEO company are supplying their testimonials for example
Competitive Research | | mcliddy
Keyword Search Term has 33,000 visits a month
Keywords is in position 1 but the search volume they are showing is broad, i was always brought up to do research on exact results unless im using the reserch for a PPC campagin? Has anyone got any ideas?? should it be braod im looking at or exact?? Many Thanks Matt0