A suggestion to help with linkscape crawling and data processing
-
Since you guys are understandably struggling with crawling and processing the sheer number of URLs and links, I came up with this idea:
In a similar way to how SETI@Home (is that still a thing? Google says yes: http://setiathome.ssl.berkeley.edu/) works, could SEOmoz use distributed computing amongst SEO moz users to help with the data processing? Would people be happy to offer up their idle processor time and (optionally) internet connections to get more accurate, broader data?
Are there enough users of the data to make distributed computing worthwhile?
Perhaps those who crunched the most data each month could receive moz points or a free month of Pro.
I have submitted this as a suggestion here:
http://seomoz.zendesk.com/entries/20458998-crowd-source-linkscape-data-processing-and-crawling-in-a-similar-way-to-seti-home -
Sean - I share Rand' sentiments, thanks so much for the suggestion!
We have considered distributed crawling in the past (or even distributed rank checking because then it would be in that user's locale) but there are a whole different set of challenges. For example, you have to handle all the edge cases: what if a user's computer isn't on, or loses connectivity, what if we crawl too fast and the user gets blocked from a site, how do you write all that data securely?
Of course all of these concerns can be overcome, but right now we feel like we have a good handle on the problems, and it will be much faster for us to just fix what we have
Although, I know all of us are so appreciative of the ideas and support, and we will have something really great soon!
-
Thanks a ton Sean! We have considered distributed computing as a way to help crawl, index, process, etc. It's so flattering and humbling to hear that you'd be willing to help out and that the community would, too
For now, we believe we can get to the index size/quality/freshness using our hosted system, but the engineering team will certainly be encouraged to hear that folks in our community might contribute to this. Distributed systems present their own challenges, and we'd have to write that code from scratch, but if we find that we can't do what we want with our existing network, we might reach out.
BTW - I wanted to let folks know that the team here does feel very confident that come December/January, we're going to be producing indices that reach exceptional quality bars. The problems we face are largely known, and we now have the team and the solutions to tackle it, so we're pretty excited.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
WEbsite cannot be crawled
I have received the following message from MOZ on a few of our websites now Our crawler was not able to access the robots.txt file on your site. This often occurs because of a server error from the robots.txt. Although this may have been caused by a temporary outage, we recommend making sure your robots.txt file is accessible and that your network and server are working correctly. Typically errors like this should be investigated and fixed by the site webmaster. I have spoken with our webmaster and they have advised the below: The Robots.txt file is definitely there on all pages and Google is able to crawl for these files. Moz however is having some difficulty with finding the files when there is a particular redirect in place. For example, the page currently redirects from threecounties.co.uk/ to https://www.threecounties.co.uk/ and when this happens, the Moz crawler cannot find the robots.txt on the first URL and this generates the reports you have been receiving. From what I understand, this is a flaw with the Moz software and not something that we could fix form our end. _Going forward, something we could do is remove these rewrite rules to www., but these are useful redirects and removing them would likely have SEO implications. _ Has anyone else had this issue and is there anything we can do to rectify, or should we leave as is?
Moz Pro | | threecounties0 -
New to Moz and wanted a bit of help with my report
Hi, I have used the MOZ report to analyse one of my friends sites and I wanted to query a few warnings it highlighted and I just wanted people's thought on how important they thought these were: The first is dupliate descriptions/titles. This is mainly down the e-commerce pages. Fist duplicate content:
Moz Pro | | dannylancs
On some pages the description is identical and all that is different is the title and picture, is this an issue? Duplicate pages:
Due to the way the website folder structure/catergories has been created some pages are identical but because the product comes under 2 cetergories there is 2 seperate pages, should we use the canonical on one of the pages? Also regarding the canonical tag, they have put link rel="canonical" on every page and got it to point at itself, so not really being used in the way it is meant to be. Could something like this cause any harm? The final thing is internal linking back to the homepage. If for example the homepage is http://www.test.com, when linking back is it best to put the full URL over "index.html" even though they are the same page? Any help really appreciated Dan0 -
Moz Data Issues?
Since the launch of Moz something or other has been wrong with my data. Is everyone having these issues? Or is it just me?
Moz Pro | | EcommerceSite0 -
Crawl Diagnostics : Problem of display in Excell.
Hi Mozers, I've just finished watching the Crawl Diagnostics Webinar and when I try to export one of my campaign into the CSV format, I've a display problem into Microsoft Excell. Every headtitles are into the "A" column so, I can't do anything with that : I can't organize the data,... It's totally unreadable. What can I do? Thank you for yours answers. Jonathan
Moz Pro | | JonathanLeplang0 -
When is the next linkscape update?
Hi, Can anyone on the SEOmoz team tell me when the next linkscape index update is going to be? Rand mentioned a few weeks ago that the aim was to launch it before 02/29. Just wondering if anyone had some updated info? Thanks, Rob.
Moz Pro | | 87ROB0 -
Where is the keyword difficulty tool data sourced from?
I also use Market Samurai, and I've noticed what seem to be big discrepancies with the keyword data presented by this (data comes from Majestic SEO) and the Keyword Difficulty Tool. To take just one example, I analyze the term "how to remove tea stains" In the Keyword Difficulty Tool, this returns the following: Root Domain Linking Root Domains: 2,233 Page Linking Root Domains: 4 When I use Market Samurai, however, the data returned is: RDD (Domains linking to this domain): 19,911 RDP (Domains linking to this page): 19 I thought that these two metrics were the same for both tools, but I've written them out in case someone sees a difference. As I say, Market Samurai data is sourced from Majestic SEO - a reputable SEO company - but I have no idea where the Keyword Difficulty Tool data is from, nor why these differences are so pronounced? Are they indeed the same metrics in both cases, or am I missing something? Any insight would be much appreciated.
Moz Pro | | ZakGottlieb710 -
Sub-domain not crawled
One of our sites was recently re-designed. The home page is a landing page (www.labadieauto.com) and I moved the blog to this domain (labadieauto.com/blog/) and put a link is the bottom left of the home page. Since the change the SEOMOZ campaign overview is showing only 1 page crawled. This is not setup as a sub-domain so why isn't it showing in the crawl? Help!
Moz Pro | | LabadieAuto0 -
Is there any way to manually initiate a crawl through SEOMoz?
... or do you actually have to wait a week for the next scheduled crawl date on a particular campaign? We've just made a ton of changes to our site, and it would be helpful to know if they will generate any warnings or errors sooner rather than later. Thanks!
Moz Pro | | jadeinteractive1