Moz crawler finding my homepage multiple times
-
Hi and thank you in advance for your help!
I have a Moz Pro campaign running (I am a complete Moz novice by the way) for one of my websites (balloonsutah.com). After crawling my site, the Moz crawler informed me that I have 3 pages with duplicate content. While I am not sure why exactly this is happening, the crawler indexed my homepage 3 times under different url's.
-balloonsutah.com
-balloonsutah.com/
-balloonsutah.com/index.htmlI checked my FTP server and I cannot figure out for the life of me why the crawler is finding anything other than the index.html file.
I suppose I need to do something regarding a rel="Canonical" but I am not terribly familiar with that either.
Any suggestions would be greatly appreciated!
Keenan -
You're welcome!
-
Great answer! I appreciate the time you spent spelling everything out in detail. Thank you!
-
First things first, I did check all web addresses. They all exist. You probably need to provide more detail whether or not you are using a CMS for your web pages.
All 3 pages have different page authority. That is, one of the version is ranking higher than the other versions. I did a quick check of that via Moz toolbar. Looks like the index.html has the highest authority.
Note that all 3 versions you listed, has 2 other versions. The one with the www, and the one without the www. Judging from the moz toolbar, looks like you rank better for the one without the 'www' . Rel canonical is is good option, but in this case I would try to do a 301 redirect from the server side first. Again, not sure how much access you have to the server side. You might need to contact your web admin.host company etc.
You can read about redirects more over here. --> http://moz.com/learn/seo/redirection. If you don't have access to the server you can try doing the rel canonical. Read more here --> http://moz.com/learn/seo/duplicate-content
Example. you have www.example.com/page1.htm, /page2.htm, page3.htm. They all have same exact content. Lets say that pag1.htm is your main version. You can do the following in the header section of page2, and page 3.htm
"This tag tells Bing and Google that the given page should be treated as though it were a copy of the URL www.example.com/pag1.htm/ and that all of the links and content metrics the engines apply should actually be credited toward the provided URL."
I would recommend not to delete all the other version, but instead do a 301 redirect, or a rel canonical, as they all of some kind of page authority, except index.html has the highest. (the non www version). But you need to make that decision. But looks like that's what you want to be the main one anyway.
ALSO,
You can tell google which version you prefer to google in GWT. This informs google which version you prefer. You can read more here.
https://support.google.com/webmasters/answer/44231?hl=en
"Once you tell us your preferred domain name, we use that information for all future crawls of your site and indexing refreshes. For instance, if you specify your preferred domain as http://www.example.com and we find a link to your site that is formatted as http://example.com, we follow that link as http://www.example.com instead. In addition, we'll take your preference into account when displaying the URLs. If you don't specify a preferred domain, we may treat the www and non-www versions of the domain as separate references to separate pages."
"Note: Once you've set your preferred domain, you may want to use a 301 redirect to redirect traffic from your non-preferred domain, so that other search engines and visitors know which version you prefer."
You cannot control the www and non www versons of your website, but you can control, making duplicate pages, especially of your home page. I am guessing that that is something that was done by your CMS. Index.html was probably done by you. FURTHERMORE, I think .com/ & .com is the one and the same thing. and you probably had to decide, when you were making a new campaign in moz. They probably asked you to put down your web address for your domain, and your probably put something like, "balloonsutah.com"Not exactly sure, why it showed you .com & .com/, but it makes sense that they would show you, .com, and /index.html, as they are two different pages, even though it has the same content. It still is two different URL's.
I probably wouldn't worry too much about it. But I'll let one of the moz members answer about .com &.com/. I would perhaps concern myself more about 301 redirects, and rel canonicals.
Hope I helped.
-
Thank you for the help!
-
Hello Keenan-price,
Welcome to the Moz community!
Moz is reporting these duplicates correctly. Each of the listed URLs are seen as unique URLs and unique pages. This is a common problem when a website does not have the proper canonical tags and 301 redirects in place for these URLs.
You'll want to decide on how your website should be displayed (which URL you prefer) and implement the canonical tag and 301 redirects.
the 301 redirects could be done with your .htaccess file, depending on your site environment. The canonical tags would depend on your site's environment (wordpress, custom development, ect).
Also, make sure to go into your Google Webmaster Tools account and specify a single page as being the correct page, once you've decided on how you want the URL to be displayed.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Need help fixing a duplicate content issue for my website. The moz crawl is show OMG my website with https:// and https://www. But I have never used the url https:// so I don’t understand why moz is showing this
Moz is showing my url with two different starts. Https:// and then the one I use https://www. The problem is I don’t think I have ever used the url without the www. at the start. How do I fix this?
Moz Bar | | jdp_uk0 -
How do you use Moz to research related topics?
Like most of the folks here I'm a pretty big fan of the content that comes out through Whiteboard Fridays, and I try to apply the things I learn, but one of the WBF videos that I'm following along with does not do a stellar job of detailing execution using Moz KW Explorer. https://mza.seotoolninja.com/blog/related-topics-in-seo-whiteboard-friday Now granted, this came out in 2016, but I still feel the core principle and strategy results in a higher quality piece of content and is still relevant to discovering and understanding searcher task completion requirements, and drafting content that fulfills those requirements. Towards the end Rand sort of mentions that you'll be able to do this with KW explorer, but I'm not really seeing the functionality. The steps I followed were to enter in the keyword in kw explorer, went to keyword suggestions, and selected "based on closely related topics" and ran it, but received no suggestions - came up blank. I then selected "based on broadly related topics" and the same thing happened. I tried this out with the keyword r22, keeping it very broad to start but that didn't seem to work. So what do you all do to perform this sort of research within Moz? Or do you even feel it's relevant in today's Rank Brain driven world?
Moz Bar | | brettmandoes0 -
Does "Disallow: /xmlrpc.php" in robots.txt affect moz tools ability to fetch DA?
Just checked a website for Domain Authority using Moz' tool, however it returned 1 for DA, which should be unlikely. I have been trying to find the problem and found "Disallow: /xmlrpc.php" in robots.txt. Could this affect Moz' tools ability to get the required data?
Moz Bar | | Foli0 -
Moz Crawler fails on the first page
Hi guys, Can anybody shed some light, I'm running a crawl on a client's website but it's failing on the homepage and not crawling any other pages. It appears to be throwing an 804: HTTPS (SSL) error and then terminating the crawl. Now, the page in question was serving up mixed content up until about 4 days ago, but has since been fixed. I read that we should wait at least 48 hours before initiating another crawl to avoid hitting a cached version - which I did, but it still appears to be having issues. Is there anything specific I can do to get around this issue? I'm on a trial account and this feature is one I'm keen to test, so there is a bit of a time constraint. Any help is greatly appreciated! Thanks in advance!
Moz Bar | | philipdanielhayton0 -
Rogerbot will not crawl my site! Site URL is https but keep getting and error that homepage (http) can not be accessed. I set up a second campaign to alter the target url to the newer https version but still getting the same error! What can I do?
Site URL is https but keep getting and error that homepage (http://www.flogas.co.uk/) can not be accessed. I set up a second campaign to alter the target url to the newer https://www.flogas.co.uk/ version but still getting the same error! What can I do? I want to use Moz for everything rather than continuing to use a separate auditing tool!
Moz Bar | | digitalascend0 -
500 errors showing up differently on moz and google wmt
Lately, I've been having the issue of a large increase in 500 errors. These errors seem to be intermittent, in other words, Google and Moz are showing that I have server 500 errors for many pages but, when I actually check the links, everything's fine. I've run tests to see if there is any virus on the server or if I have any corrupt files and as far as I can tell, there are none. I'm left with the possibility that maybe one of my plugins is causing this issue (I'm built on top of Wordpress). Moz is showing that I had nearly five hundred 500 server errors on the 12th or the 11th. On the other hand, Google shows that on the 13th I had 179 server errors and then an additional 200 for the 15th. I'm assuming Google is slow to find or report these things? I would like to know which is more reliable so that I can try to figure out which of these plugins may be causing the problem, if any or if I'm investigating this the wrong way, I'd love to have more suggestions. Thanks in advance! Sorry, the url is http://www.heartspm.com if you'd like to take a look.
Moz Bar | | GerryWeitz0 -
Moz crawler
I have a site which is in a non production status. Crawlers are blocked vis robot txt. User-agent: *
Moz Bar | | Emanuele_Ricci
Disallow: / I WANT TO MAKE A CRAWLING TEST WITH MOZ CRAWLER (RogerBot) ,
how can I allow your crawler to get in and prevent other crawlers from indexing the site? Thanks memok0 -
Moz On-page is not working
My on-page is not working....I do have 3 keywords in the 1º position of Google.pt and the Moz is not reporting nothing....bug?
Moz Bar | | Popbox0