Blocking https from being crawled
-
I have an ecommerce site where https is being crawled for some pages. Wondering if the below solution will fix the issue
www.example.com will be my domain
In the nav there is a login page www.example.com/login which is redirecting to the https://www.example.com/login
If I just disallowed /login in the robots file wouldn't it not follow the redirect and index that stuff?
The redirect part is what I am questioning.
-
Correct once /login gets redirected to https://www.example.com/login all nav links etc are https
What I ended up doing was blocking /login in robots and now doing canonicals on https as well as nofollow the /login link that is in the nav that redirects
Willl see what happens now.
-
So, the "/login" page gets redirected to https: and then every link on that page goes secure and Google crawls them all? I think blocking the "/login" page is a perfectly good way to go here - cut the crawl path, and you'll cut most of the problem.
You could request removal of "/login" in Google Webmaster Tools, too. Sometimes, I find that Robots.txt isn't great at removing pages that are already indexed. I would definitely add the canonical as well, if it's feasible. Cutting the path may not cut the pages that have already been indexed with https:.
Sorry, I'd actually reverse that:
(1) Add the canonicals, and let Google sweep up the duplicates
(2) A few weeks later, block the "/login" page
Sounds counter-intuitive, but if you block the crawl path to the https: pages first, then Google won't crawl the canonical tags on those versions. Use canonical to clean up the index, and then block the page to prevent future problems.
-
Gotcha. Yea I commented above how I was going to add a canonical as well as a noindex in the meta but was curious how it handled the redirect that was happening.
thanks for your help
-
Yea I was going to nofollow the link in the nav and add a meta tag but was curious how the robots file would handle this since the url is a redirect.
Thanks for your input
-
The pages that are being crawled under https, are the same pages available under http as well ? If yes, can you just add a canonical tag on these pages to go to the http version. That should fix it. And if your login page is the entry point, your fix will help as well. But then as Rebekah said, what if somebody is linking to your https page. I would suggest you look into making a canonical tag on these pages to http if that makes sense and is doable.
-
You can disallow the https portion in robots.txt, but remember robots.txt isn't always a sure fire way of not getting an area of your site crawled. If you have other important content to crawl from the secured page, be careful you are not blocking robots from there.
If this is linked to other places on the web, and the link doesn't include no-follow, search engines may still crawl the page. Can you change the link in your navigation to no-follow as well? I would also add a meta noindex tag to the page itself, and a canonical tag to the https version.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to stop crawls for product review pages? Volusion site
Hi guys, I have a new Volusion website. the template we are using has its own product review page for EVERY product i sell (1500+) When a customer purchases a product a week later they receive a link back to review the product. This link sends them to my site, but its own individual page strictly for reviewing the product. (As oppose to a page like amazon, where you review the product on the same page as the actual listing.) **This is creating countless "duplicate content" and missing "title" errors. What is the most effective way to block a bot from crawling all these pages? Via robots txt.? a meta tag? ** Here's the catch, i do not have access to every individual review page, so i think it will need to be blocked by a robot txt file? What code will i need to implement? i need to do this on my admin side for the site? Do i also have to do something on the Google analytics side to tell google about the crawl block? Note: the individual URLs for these pages end with: *****.com/ReviewNew.asp?ProductCode=458VB Can i create a block for all url's that end with /ReviewNew.asp etc. etc.? Thanks! Pardon my ignorance. Learning slowly, loving MOZ community 😃 1354bdae458d2cfe44e0a705c4ec38dd
Technical SEO | | Jerrion0 -
Linking domains on the same C Block together
Hey, I have an online store selling dj equipment, sound & light products such as speakers, lasers, decks, pa systems, karaoke systems etc. I just bought a new domain but I registered it under a different name and address (my personal details). And I plan on hosting the website on a seperate server so it has no connection with my eCommerce store. The main purpose of the website will be to review the products I sell, write detailed how to guides for DJ's, party planners, mobile DJ's etc. There will be links on the current ecommerce website (which currently gets around anything from 500 to 1000 unique hits a day) going to the new blog website. But would I be better off keeping it on the same C Block even though they are going to be two very different websites and the blog may not always necessarily be about the products on my ecommerce website and may be products on say eBay, Amazon, etc. (In otherwords, it's going to be it's own website with an unbiased opinion, but the ecommerce site will be linking to it on certain products that are reviewed on there). Any help is appreciated 🙂
Technical SEO | | tomhall900 -
Mysterious drop in the Number of Pages Crawled
The # of crawled pages on my campaign dashboard has been 90 for months. Approximate a week ago it dropped down to 25 crawled pages, and many links went with it. I have checked with my web master, and he said no changes have been made which would cause this to happen. I am looking for suggestions on how I can go about trouble shooting this issue, and possible solutions. Thanks in advance!
Technical SEO | | GladdySEO0 -
Pagination/Crawl Errors
Hi, Ive only just joined SEO moz and after they crawled my site they came up with 3600 crawl errors mostly being duplicate content and duplicate urls. After researching this it soon became clear it was due to on page pagination and after speaking with Abe from SEO mozhe advised me to take action by getting our developers to implement rel=”next” & rel=”prev” to review. soon after our developers implemented this code ( I have no understanding of this what so ever) 90% of my keywords I had been ranking for in the top 10 have dropped out the top 50! Can anyone explain this or help me with this? Thanks Andy
Technical SEO | | beck3980 -
Googlebot take 5 times longer to crawl each page
Hello All From about mid September my GWMT has show that the average time to crawl a page on my site has shot up from an average of 130ms to an average of 700ms and peaks at 4000ms. I have checked my server error logs and found nothing there, I have checked with the hosting comapny and there are no issues with the server or other sites on the same server. Two weeks after this my ranking fell by about 950 places for most of my keywords etc.I am really just trying to eliminate this as a possible cause, of these ranking drops. Or was it the Pand/ EMD algo that has done it. Many Thanks Si
Technical SEO | | spes1230 -
Https indexed - though a no index no follow tag has been added
Hi, The https-pages of our booking section are being indexed by Google. We added But the pages are still being indexed. What can I do to exclude these URL's from the Google index? Thank you very much in advance! Kind regards, Dennis Overbeek ACSI Publishing | [email protected]
Technical SEO | | SEO_ACSI0 -
HTTPS attaching to home page
Hi!! Okay - weird tech question. Domain is http://hiphound.com. I have SSL attaching to checkout and my account pages. Tested and works well. Issue - I am able to reach the home page at https://hiphound.com AND http://hiphound.com. If I access the home page via HTTPS and click on a link (any link) then the site is redirected to HTTP again which is good. My concern is the home page displaying via HTTPS and HTTP. Is this is an issue that can be resolved or is it expected behavior I have to live with.? I am being told by DEV there is nothing they can do about it but want to understand why and if they are correct. Thoughts? Thank you!! Lynn
Technical SEO | | hiphound0