How to Safely Scrape Google Results?
-
I've built a couple of small tools that I use personally, maybe 2 or 3 times per day.
Both tools scrape the top 10 results from Google and provide more details about each domain (like the SEOMoz Keyword Difficulty Tool).
Google seem to have banned my IP address for automated searches... can anyone tell me a safe way of scraping the google results? Is there a suitable API for this?
How do SEO Moz do this on such a huge scale?
-
As I doubt that the APIs have considerably improved since this blog post http://www.seomoz.org/blog/the-nasty-problem-with-scraping-results-from-the-engines, google scraping is still a big issue and necessary for our daily seo work.
Scraping savely can only work if you succeed in convincing Google that you're a "natural" user and not a scarping robot. How can you do that?
- Search with alternating IPs, from different locations using proxies from the countries where you'd like to scrape from
- don't send too many requests at once from the same source
Consider that, when requesting a URL, the browser sends various information elements to the server, containing, for example, your Operating System, browser version, referer, etc. - every element can and should be changed to virtually change your identity when executing a new search.
- change browsers, browser versions, operating system information, etc.
- take care when changing browser localization values (en-GB, en-US probably don't return the same results)
- have a good network of proxy servers ready to send the different requests with your different identities to
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is there any update on Google Search Results
I am following some keywords for my website on google. About a month, on the first page of these keywords, there are a lot of changes on ranking. 3-4 website has been falling to 2.3.page and new 3-4 website are shown on 1.page. But these new sites has 0 pagerank and there are no backlinks..These are new websites. What is the reason is there any update on Google search results ?
Competitive Research | | fikhir0 -
I Got A Scraper Delisted From Google ...
I have an electronics niche news website. A scraper who had an online store selling products in my niche copied every one of my articles and posted them on his site under the heading "News" ... generally within 1/2 hour of me posting them on my site. His site was even showing up in the rankings before mine. I filed a copyright infringement claim with Google two weeks ago via their online form explaining what he was doing. Today, I received an email from Google saying that they have reviewed his site and have delisted it from the search engine. I just checked, and he is GONE ... completely delisted, no trace. My site traffic has also jumped at least 25% today. It pays to complain! Just sharing 😉
Competitive Research | | Humanovation3 -
Competitor research: No data / results displaying on Keyword tools, Aexa
Hi there! I'm trying to research a few competitors using various Keyword tools (SEM Rush, Compete, Keyword Spy -- even Alexa for high level insight). While the bulk of the competitors generate expected results through these tools (a smattering of their top organic and paid search keywords, some traffic estimations through Alexa), ONE of these competitors lists "No results" across all categories and all tools: http://www.bgstar.ca Despite this, we know that they invest heavily in search -- and my SEM Rush toolbar indicates that they have a Google PR of 5 (though I recognize that that should be taken with a grain of salt). So I'm stumped! Has anyone encountered this before? Is there something structural that they might be doing, that's blocking not only Google-based platforms, but Alexa too? Thanks for your help!
Competitive Research | | MACJ0 -
Google Listings EMD Bias
I've been looking at 60+ location based searches for the base two months and noticed a big issue I can't explain. I know EMD was hit hard in the general SERPs but it obviously has not effected the location SERPs. The main way I'm finding these situations is by seeing the 7 pack and it shows a site with only a quarter amount of the citations the other sites have and jumps to the top very quickly. It appears to be working because of the EMD bias in the Local SERPs algorithm. From what I understand you are not suppose to add a TLD domain into a G+ listing and then 301 redirect it to your real domain but Google doesn't seem to mind at this point. I'm wondering if this tactic is a valid Local tactic at this time or if, from what I understand, it is a shady tactic that will end up hurting brand and have a strong chance of penalizing the real domain. 2012-12-13_10-45-39.png
Competitive Research | | BenRWoodard0 -
Duplicate content for www & non-www results
why would my campaign show duplicate content entries for www & non-www versions of my url? Here's an example I have a page called 'mydomain.com/resources/', and the campaign analysis shows it as being duplicate content, with the duplicate being 'www.mydomain.com/resources'. I don't know where I can adjust this or if it is perhaps related to some other setting, like Google Analytics or something else. /G
Competitive Research | | swdmedia0 -
Sending autmated queries to Google hurting SEO?
Anyone have any ideas whether there could be a chance that a site might get penalized if it is sending automated queries to Google (ie, to check rankings)? I was reading the recently updated Google Webmaster Guidelines and saw on the section - "Quality guidelines - specific guidelines" that mentioned about sending automated queries to Google... Just wondering what are the chances that Google will actually penalize a site that sends automated queries (if they are able to identify which site is doing so in the first place)..
Competitive Research | | globalsources.com0 -
Quick question about country specific organic results
Do you think that if your website is from your home country. You will rank better for some keyword even when you dont have much page authority when compared to other websites having much higher page authorities from other countries.
Competitive Research | | ksbnok0 -
Is it valuable for a local business to build links into its Google Place?
G'Day All, Almost all of my clients are geo-based small service-based businesses. I've noticed during my research that the google places for our competitors in 3 separate niches (3 different clients) seem to be the dominating results for almost all relevant keyword terms. I'm curious to see if anyone has actively tried to increase the ranking of a google place by building links into it. Is this something that anyone else sees value to for a local small business? I would love to get some thoughts. And for that matter I'm also curious to see if anyone thinks there might be value to optimizing a Facebook Fan Page or Yelp Business page. They all seem to be key drivers of traffic our client websites so I'm wondering how difficult it is to make them rank as opposed to a website. Thanks!
Competitive Research | | blahblahblah20150