Crawling/indexing of near duplicate product pages
-
Hi,
Hope someone can help me out here. This is the current situation:
We sell stones/gravel/sand/pebbles etc. for gardens. I will take a type of pebbles and the corresponding pages/URL's to illustrate my question --> black beach pebbles.
- We have a 'top' product page for black beach pebbles on which you can find different types of quantities (differing from 20kg untill 1600 kg).
- There is not any search volume related to the different quantities
- The 'top' page does not link to the pages for the different quantities
- The content on the pages for the different quantities is not exactly the same (different price + slightly different content). But a lot of the content is the same.
Current situation:
- Most pages for the different quantities do not have internal links (about 95%)- But the sitemap does contain all of these pages.
- Because the sitemap contains all these URL's, google frequently crawls them (I checked the logfiles) and has indexed them.
Problems:
- Google spends its time crawling irrelevant pages --> our entire website is not that big, so these quantity URL's kind of double the total number of URL's.
- Having url's in the sitemap that do not have an internal link is a problem on its own
- All these pages are indexed so all sorts of gravel/pebbles have near duplicates.
My solution:
- remove these URL's from the sitemap --> that will probably stop Google from regularly crawling these pages
- Putting a canonical on the quantity pages pointing to the top-product page. --> that will hopefully remove the irrelevant (no search volume) near duplicates from the index
My questions:
- To be able to see the canonical, google will need to crawl these pages. Will google still do that after removing them from the sitemap?
- Do you agree that these pages are near duplicates and that it is best to remove them from the index?
- A few of these quantity pages do have intenral links (a few procent of them) because of a sale campaign. So there will be some (not much) internal links pointing to non-canonical pages. Would that be a problem?
Thanks a lot in advance for your help!
Best!
-
Hi Joseph, thanks for your reply, really helpful! 301 is not really an option, because these quantity URL's are sometimes used for promotions and need to be reachable. Therefore I guess canonicals are the second best solution.
We will implement the solution I described and see what will happen. Thanks again!
-
Hello there,
To answer your questions,
1. Google will still crawl your pages even if it's not from the sitemap unless you specify disallow from your robots.txt
2. If they are similar content with the main difference at "quantities" couldn't you consolidate them into one single page that lists all the quantities your company sell in and then 301 redirect the other pages to the consolidated one?
3. It doesn't seem like going to be causing any problem nor hurting your SEO performance, but you could always change these link to the canonical link.
Hope this helps,
Joseph Yap
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google only indexing the top 2/3 of my page?
HI, I have a page that is about 5000 lines of code total. I was having difficulty figuring out why the addition of a lot of targeted, quality content to the bottom of the pages was not helping with rankings. Then, when fetching as Google, I noticed that only about 3300 lines were getting indexed for some reason. So naturally, that content wasn't going to have any effect if Google in not seeing it. Has anyone seen this before? Thoughts on what may be happening? I'm not seeing any errors begin thrown by the page....and I'm not aware of a limit of lines of code Google will crawl. Pages load under 5 seconds so loading speed shouldn't be the issue. Thanks, Kevin
Intermediate & Advanced SEO | | yandl1 -
Our Web Site Is candere.com. Its PA and back link status are different for https://www.candere.com, http://www.candere.com, https://candere.com, and http://candere.com. Recently, we have completely move from http to https.
How can we fix it, so that we may mot lose ranking and authority.
Intermediate & Advanced SEO | | Dhananjayukumar0 -
Will have /index in my url hurt?
I am trying to setup permalinks on a wordpress blog that is installed on iis. I can't update the web.config file so I have to make every page /index/pagetitle. as shown here-http://codex.wordpress.org/Using_Permalinks#PATHINFO:_.22Almost_Pretty.22 How much of a difference is there between no /index and having the /index in there?
Intermediate & Advanced SEO | | EcommerceSite0 -
What do you do with the page of a product that has been deleted?
As anyone know with an ecommerce website, products are constantly being added and removed. Once products are removed, the corresponding product pages are not reachable. Currently, I am redirecting to the Search page, if a product page is reached, whose corresponding product has been deleted. I am not sure if that is the correct, recommended technique from a SEO perspective. Should I try to show related products on the redirected page? Does anyone here know what is the best thing to do with this product page?
Intermediate & Advanced SEO | | amitramani0 -
How Long Does it Take for Rel Canonical to De-Index / Re-Index a Page?
Hi Mozzers, We have 2 e-commerce websites, Website A and Website B, sharing thousands of pages with duplicate product descriptions. Currently only the product pages on Website B are indexing, and we want Website A indexed instead. We added the rel canonical tag on each of Website B's product pages with a link towards the matching product on Page A. How long until Website B gets de-indexed and Website A gets indexed instead? Did we add the rel canonical tag correctly? Thanks!
Intermediate & Advanced SEO | | Travis-W0 -
Is it possible to get a list of pages indexed in Google?
Is there a tool that will give me a list of pages on my site that are indexed in Google?
Intermediate & Advanced SEO | | rise10 -
Sitemaps / Google Indexing / Submitted
We just submitted a new sitemap to google for our new rails app - http://www.thesquarefoot.com/sitemap.xml Which has over 1,400 pages, however Google is only seeing 114. About 1,200 are in the listings folder / 250 blog posts / and 15 landing pages. Any help would be appreciated! Aron sitemap.png
Intermediate & Advanced SEO | | TheSquareFoot0 -
Category Pages up - Product Pages down... what would help?
Hi I mentioned yesterday how one of our sites was losing rank on product pages. What steps do you take to improve the SERPS of product pages, in this case home/category/product is the tree. There isn't really any internal linking, except one link from the category page to each product, would setting up a host of internal links perhaps "similar products" linking them together be a place to start? How can I improve my ranking of these more deeply internal pages? Not just internal links?
Intermediate & Advanced SEO | | xoffie0