Crawling/indexing of near duplicate product pages
-
Hi,
Hope someone can help me out here. This is the current situation:
We sell stones/gravel/sand/pebbles etc. for gardens. I will take a type of pebbles and the corresponding pages/URL's to illustrate my question --> black beach pebbles.
- We have a 'top' product page for black beach pebbles on which you can find different types of quantities (differing from 20kg untill 1600 kg).
- There is not any search volume related to the different quantities
- The 'top' page does not link to the pages for the different quantities
- The content on the pages for the different quantities is not exactly the same (different price + slightly different content). But a lot of the content is the same.
Current situation:
- Most pages for the different quantities do not have internal links (about 95%)- But the sitemap does contain all of these pages.
- Because the sitemap contains all these URL's, google frequently crawls them (I checked the logfiles) and has indexed them.
Problems:
- Google spends its time crawling irrelevant pages --> our entire website is not that big, so these quantity URL's kind of double the total number of URL's.
- Having url's in the sitemap that do not have an internal link is a problem on its own
- All these pages are indexed so all sorts of gravel/pebbles have near duplicates.
My solution:
- remove these URL's from the sitemap --> that will probably stop Google from regularly crawling these pages
- Putting a canonical on the quantity pages pointing to the top-product page. --> that will hopefully remove the irrelevant (no search volume) near duplicates from the index
My questions:
- To be able to see the canonical, google will need to crawl these pages. Will google still do that after removing them from the sitemap?
- Do you agree that these pages are near duplicates and that it is best to remove them from the index?
- A few of these quantity pages do have intenral links (a few procent of them) because of a sale campaign. So there will be some (not much) internal links pointing to non-canonical pages. Would that be a problem?
Thanks a lot in advance for your help!
Best!
-
Hi Joseph, thanks for your reply, really helpful! 301 is not really an option, because these quantity URL's are sometimes used for promotions and need to be reachable. Therefore I guess canonicals are the second best solution.
We will implement the solution I described and see what will happen. Thanks again!
-
Hello there,
To answer your questions,
1. Google will still crawl your pages even if it's not from the sitemap unless you specify disallow from your robots.txt
2. If they are similar content with the main difference at "quantities" couldn't you consolidate them into one single page that lists all the quantities your company sell in and then 301 redirect the other pages to the consolidated one?
3. It doesn't seem like going to be causing any problem nor hurting your SEO performance, but you could always change these link to the canonical link.
Hope this helps,
Joseph Yap
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My product category pages are not being indexed on google can someone help?
My website has been indexed on google and all of its pages can be found on google except for the product category pages - which are where we want our traffic heading to, so this is a big problem for us. Our website is www.skirtinguk.com And an example of a page that isn't being indexed is https://www.skirtinguk.com/product-category/mdf-skirting-board/
Intermediate & Advanced SEO | | chelseaskirtinguk0 -
Is there a way to no index no follow sections on a page to avoid duplicative text issues?
I'm working on an event-related site where every blog post starts with an introductory header about the event and then a Call To Action at the end which gives info about the Registration Deadline. I'm wondering if there is something we can and should do to avoid duplicative content penalties. Should these go in a widget or is there some way to No Index, No Follow a section of text? Thanks!
Intermediate & Advanced SEO | | Spiral_Marketing0 -
Glossary index and individual pages create duplicate content. How much might this hurt me?
I've got a glossary on my site with an index page for each letter of the alphabet that has a definition. So the M section lists every definition (the whole definition). But each definition also has its own individual page (and we link to those pages internally so the user doesn't have to hunt down the entire M page). So I definitely have duplicate content ... 112 instances (112 terms). Maybe it's not so bad because each definition is just a short paragraph(?) How much does this hurt my potential ranking for each definition? How much does it hurt my site overall? Am I better off making the individual pages no-index? or canonicalizing them?
Intermediate & Advanced SEO | | LeadSEOlogist0 -
Product Pages & Panda 4.0
Greeting MOZ Community: I operate a real estate web site in New York City (www.nyc-officespace-leader.com). Of the 600 pages, about 350 of the URLs are product pages, written about specific listings. The content on these pages is quite short, sometimes only 20 words. My ranking has dropped very much since mid-May, around the time of the new Panda update. I suspect it has something to do with the very short product pages, the 350 listing pages. What is the best way to deal with these pages so as to recover ranking. I am considering these options: 1. Setting them to "no-index". But I am concerned that removing product pages is sending the wrong message to Google. 2. Enhancing the content and making certain that each page has at least 150-200 words. Re-writing 350 listings would be a real project, but if necessary to recover I will bite the bullet. What is the best way to address this issue? I am very surprised that Google does not understand that product URLs can be very brief and yet have useful content. Information about a potential office rental that lists location, size, price per square foot is valuable to the visitor but can be very brief. Especially listings that change frequently. So I am surprised by the penalty. Would I be better off not having separate URLs for the listings, and for instance adding them as posts within building pages? Is having separate URLs for product pages with minimal content a bad idea from an SEO perspective? Does anyone have any suggestions as to how I can recover from this latest Panda penalty? Thanks, Alan
Intermediate & Advanced SEO | | Kingalan10 -
Links from non-indexed pages
Whilst looking for link opportunities, I have noticed that the website has a few profiles from suppliers or accredited organisations. However, a search form is required to access these pages and when I type cache:"webpage.com" the page is showing up as non-indexed. These are good websites, not spammy directory sites, but is it worth trying to get Google to index the pages? If so, what is the best method to use?
Intermediate & Advanced SEO | | maxweb0 -
Product with two common names: A separate page for each name, or both on one page?
This is a real-life problem on my ecommerce store for the drying rack we manufacture: Some people call it a Clothes Drying Rack, while others call it a Laundry Drying Rack, but it's really the same thing. Search volume is higher for the clothes version, so give it the most attention. I currently have 2 separate pages with the On-Page optimization focused on each name (URL, Title, h1, img alts, etc) Here the two drying rack pages: clothes focused page and laundry focused page But the ranking of both pages is terrible. The fairly generic homepage shows up instead of the individual pages in Google searches for the clothes drying rack and for laundry drying rack. But I can get the individual page to appear in a long-tail search like this: round wooden clothes drying rack So my thought is maybe I should just combine both of these pages into one page that will hopefully be more powerful. We would have to set up the On-Page optimization to cover both "clothes & laundry drying rack" but that seems possible. Please share your thoughts. Is this a good idea or a bad idea? Is there another solution? Thanks for your help! Greg
Intermediate & Advanced SEO | | GregB1230 -
Duplicate content within sections of a page but not full page duplicate content
Hi, I am working on a website redesign and the client offers several services and within those services some elements of the services crossover with one another. For example, they offer a service called Modelling and when you click onto that page several elements that build up that service are featured, so in this case 'mentoring'. Now mentoring is common to other services therefore will feature on other service pages. The page will feature a mixture of unique content to that service and small sections of duplicate content and I'm not sure how to treat this. One thing we have come up with is take the user through to a unique page to host all the content however some features do not warrant a page being created for this. Another idea is to have the feature pop up with inline content. Any thoughts/experience on this would be much appreciated.
Intermediate & Advanced SEO | | J_Sinclair0 -
Why Is Google Indexing These Product Pages On Shopify?
How can we communicate to Google the exact product pages we'd like indexed on our site? We're an apparel company that uses Shopify as our ecommerce platform. Website is sportiqe.com. Currently, Google is indexing all types of different pages on our site. **Example of a product page we want indexed: ** Product Page: sportiqe.com/products/PRODUCT-TITLE (Like This) **Examples of product pages being indexed: ** sportiqe.myshopify.com/products/PRODUCT-TITLE sportiqe.com/collections/COLLECTION-NAME/products/PRODUCT-TITLE See attached for an example of how two different "Boston Celtics Grateful Dead" shirts are being indexed. Any suggestions? We've used both Shopify and Google Webmaster tools to set our preferred domain (sportiqe.com). We've also added this snippet of code to our site three months ago thinking that would do the trick... {% if template == 'product' %}{% if collection %} {% endif %}{% endif %} sKwNZOl
Intermediate & Advanced SEO | | farmiloe0