Sitemap url's not being indexed
-
There is an issue on one of our sites regarding many of the sitemap url's not being indexed. (at least 70% is not being indexed)
The url's in the sitemap are normal url's without any strange characters attached to them, but after looking into it, it seems a lot of the url's get a #. + a number sequence attached to them once you actually go to that url. We are not sure if the "addthis" bookmark could cause this, or if it's another script doing it.
For example
Url in the sitemap: http://example.com/example-category/0246
Url once you actually go to that link: http://example.com/example-category/0246#.VR5a
Just for further information, the XML file does not have any style information associated with it and is in it's most basic form.
Has anyone had similar issues with their sitemap not being indexed properly ?...Could this be the cause of many of these url's not being indexed ?
Thanks all for your help.
-
Anders,
Thanks for the reply. I definitely agree a self referring canonical might just be a good extra addition on these product pages, so I'm definitely adding that to our list of to do's if it does not improve.
In terms of indexing pages - We have not restricted crawl frequency, we have it set to "allow google to determine the optimal crawl rate". No other warnings found within the search console either.
Thanks for your help.
-
I agree - i probably would ignore everything after the "#".
But have you tried added a <link rel="canonical" href="http://example.com/page-url" /> to your pages and see if this will update it? Also: Add the sitemap to your robots.txt if not allready done.
Regarding indexing pages - have you restricted crawl frequency in Google Search Console, or is it set to be determined by GoogleBot? Any other warnings or messages in Search Console?
Best regards,
Anders -
Lesley,
Thanks for the confirmation on that one and the article. Since it doesn't seem like a lot of people on the site are using that address share function, I do not think it would do any harm to remove it.
At least we know the root cause of why it's doing it to the url's. Now the real question is...could it be getting in the way of indexing those url's ?...one would think not, as from what I've read, google would simply ignore what comes after the #.
Thoughts ?
Appreciate the help.
-
Patrick,
We'd prefer to keep the actual url's private, however I can provide further information to help hopefully allow the community to dissect this further:
- It's an E-commerce website, meaning many facets, filters, and possible duplicate content angles
- It seems many of the static pages (/products main page, /contact,etc) are indexed, however it seems the individual products are mostly not being indexed through the sitemap
- While the url's found in webmaster tools under "index" has also steadily been going down, it definitely doesn't correspond with the lack of pages indexed vs submitted within the sitemap
- We have checked robots.txt, and it is not blocking any important pages. (I also had them allow robots to crawl css and js so google could have full access)
- The individual product pages all have the "addthis" feature, meaning they all have a #. + number sequence added to the url's. However one would think this wouldn't be the cause of this lack of indexation ?
Thanks for your help.
-
Yes, add this is doing this to your url. I hate it, that is one reason why I do not use them.
Here is an article on how to remove them, http://support.addthis.com/customer/portal/articles/1013558-removing-all-hashtags-anchors-weird-codes-from-your-urls
-
Hi there
Could you provide you website's URL? It would help the community take a deeper look - thanks!
Good luck!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can adding thousands of new indexable URLs to my site at once be a problem?
Hi everyone, I am currently working on a project that will quickly add thousands of new indexable URLs to my site. For context, the site currently has over a million indexable pages. Is there any danger of adding a few thousand URLs at once to the site? Could it potentially affect crawlability/SEO/other pages? Thank you!
Technical SEO | | StevenLevine0 -
Home Page Being Indexed / Referral URLs /
I have a few questions related to home page URLs being indexed, canonicalization, and GA reporting... 1. I can view the home page by typing in domain.com , domain.com/ and domain.com/index.htm There are no redirects and it's canonicalized to point to domain.com/index.htm -- how important is it to have redirects? I don't want unnecessary redirects or canonical tags, but I noticed the trailing slash can sometimes be typed in manually on other pages, sometimes not. 2. When I do a site search (site:domain.com), sometimes the HP shows up as "domain.com/", never "domain.com/index.htm" or "domain.com", and sometimes the HP doesn't show up period. This seems to change several times a day, sometimes within 15 minutes. I have no idea what is causing it and I don't know if it has anything to do with #1. In a perfect world, I would ask for the /index.htm to be dropped and redirected to .com/, and the canonical to point to .com/ 3. I've noticed in GA I see / , /index.htm, and a weird Google referral URL (/index.htm?referrer=https://www.google.com/) all showing up as top pages. I think the / and /index.htm is because I haven't setup a default URL in GA, but I'm not sure what would cause the referrer. I tracked back when the referrer URL started to show up in the top pages, and it was right around the time they moved over to https://, so I'm not sure what the best option is to remove that. I know this is a lot - I appreciate any insight anyone can provide.
Technical SEO | | DigMS0 -
Question on Google's Site: Search
A client currently has two domains with the same content on each. When I pull up a Cached version of the site, I noticed that it has a Cache of the correct page on it. However, when I do a site: in Google, I am seeing the domain that we don't want Google indexing. Is this a problem? There is no canonical tag and I'm not sure how Google knows to cache the correct website but it does. I'm assuming they have this set in webmaster tools? Any help is much appreciated! Thanks!
Technical SEO | | jeff_46mile0 -
Why are these URL's suddenly appearing in WMT?
One of our clients has suddenly experienced a sudden increase in crawl errors for smart phones overnight for pages which no longer exist and there are no links to these pages according to Google. There is no evidence as to why Google would suddenly start to crawl these pages as they have not existed for over 5 years, but it does come after a new site design has been put live. Pages do not appear to be in the index when a site search is used. There was a similar increase in crawl errors on desktop initially after the new site went live, but these quickly returned to normal. Mobile crawl errors only became apparent after this. There are some URL's showing which have no linking page detected so we don't know where these URL's are being found. WMT states "Googlebot couldn't crawl this URL because it points to a non-existent page". Those that do have a linking page are showing an internal page which also doesn't exist so it can't possibly link to any page. Any insight is appreciated. Andy and Mark at Click Consult.
Technical SEO | | ClickConsult0 -
Is it good practice to still pay for Best of the Web Directory (BOTW) and other similar one's you have to pay for?
I know that paid for links are hit by Google, but in the past these directories were okay. What about now? Thank you.
Technical SEO | | RoxBrock0 -
No confirmation page on Google's Disavow links tool?
I've been going through and doing some spring cleaning on some spammy links to my site. I used Google's Disavow links tool, but after I submit my text file, nothing happens. Should I be getting some sort of confirmation page? After I upload my file, I don't get any notifications telling me Google has received my file or anything like that. It just takes me back to this page: http://cl.ly/image/0S320q46321R/Image 2013-04-26 at 11.15.25 AM.png Am I doing something wrong or is this what everyone else is seeing too?
Technical SEO | | shawn810 -
Canonical tags pointing at old URLs that have been 301'd
I have a site which has various white label sites with the same content on each. I have canonical tags on the white label sites pointing to the main site. I have changed some URLs on the main site and 301'd the previous URL to the new ones. Is it ok to have the canonicals pointing to the old URLs that now have a 301 redirect on them.
Technical SEO | | BeattieGroup0 -
Handling '?' in URLs.
Adios! (or something), I've noticed in my SEOMoz campaign that I am getting duplicate content warnings for URLs with extensions. For example: /login.php?action=lostpassword /login.php?action=register etc. What is the best way to deal with these type of URLs to avoid duplicate content penelties in search engines? Thanks 🙂
Technical SEO | | craigycraig0