[Advice] Dealing with an immense URl structure full of canonicals with Budget & Time constraint
-
Good day to you Mozers,
I have a website that sells a certain product online and, once bought, is specifically delivered to a point of sale where the client's car gets serviced.
This website has a shop, products and informational pages that are duplicated by the number of physical PoS. The organizational decision was that every PoS were supposed to have their own little site that could be managed and modified.
Examples are:
- Every PoS could have a different price on their product
- Some of them have services available and some may have fewer, but the content on these service page doesn't change.
I get over a million URls that are, supposedly, all treated with canonical tags to their respective main page. The reason I use "supposedly" is because verifying the logic they used behind canonicals is proving to be a headache, but I know and I've seen a lot of these pages using the tag.
i.e:
- https:mysite.com/shop/ <-- https:mysite.com/pointofsale-b/shop
- https:mysite.com/shop/productA <-- https:mysite.com/pointofsale-b/shop/productA
The problem is that I have over a million URl that are crawled, when really I may have less than a tenth of them that have organic trafic potential.
Question is:
For products, I know I should tell them to put the URl as close to the root as possible and dynamically change the price according to the PoS the end-user chooses. Or even redirect all shops to the main one and only use that one.I need a short term solution to test/show if it is worth investing in development and correct all these useless duplicate pages. Should I use Robots.txt and block off parts of the site I do not want Google to waste his time on?
I am worried about: Indexation, Accessibility and crawl budget being wasted.
Thank you in advance,
-
Hey Chris!
Thanks a lot for your time. I did send you a PM the day after your original post, I will send you another :).
Thanks a lot for your additionnal advice. You're right about managing client's expectations and its crucial. You're pointing out some valid points and I will have to ponder about how I approach this whole situation.
Charles,
-
Hey Charles,
No problem, I've been out of the office most of the past week so I'm trying to catch up on a few of these now, sorry! I don't recall seeing any PMs either.
I feel weird to recommend shaving 3/4 of their site on which they put a lot of money in.
That's perfectly normal and I'd have the same reservations. If you do decide to go ahead with it though (and I'm absolutely not looking to push you into a decision either way, just providing the info) you can highlight the fact that paying a lot of money for a website doesn't make it inherently good. If those extra pages are providing no unique value then they're just a hindrance to their long-term goal of earning a return from that site via organic traffic.
It's a conversation we have semi-regularly with new clients. They think that because they just spent $20k on a new site, making changes to it is silly and a waste of the money they invested in the first place. "Sure it's broken but it was expensive"... I don't think search engines or users really care how much it cost
in the eyes of the client, it may come off as bold.
It certainly is bold and don't be fooled, there is a reasonable chance their rankings will get worse before they get better. In some cases when we perform a cleanup like this we'll see a brief drop before a steady improvement.
This doesn't happen all the time by any means, in fact we did a smaller scale version of this last week for two new clients and both have already started moving ahead over the weekend without a drop in rankings prior. It's really just about managing expectations and pitching the long term benefit over the short term fear.
Just be very careful in the way you project-manage it - be meticulous with updating internal links and 301 any pages that have external links pointing to them as well. You want to end up with a clean, efficient and crawlable website that retains as much value as possible.
You understand many sets of eyes are directed at them and a lot is to gain.
Also a very valid concern!
I'm probably not telling you anything you don't already know anyhow so don't think I'm trying to lecture you on how to do your job, just sharing my knowledge and anecdotal evidence on similar things.
-
Hey Chris!
Thanks for that lenghty response. It is very much appreciated and so is your offer for help. Let me check with some people to see if I can share the company's name.
[EDIT] Sent you a private msgOne of the reason I want to test the waters is, to be real honest, I feel weird to recommend shaving 3/4 of their site on which they put a lot of money in. I guess it comes down to reassuring them that these changes will be positive, but in the eyes of the client, it may come off as bold.
Another thing is, it is an international business that have different teams for different country. For more than 20 countries, they are the only one to try and sell their product online. You understand many sets of eyes are directed at them and a lot is to gain.
-
Hi Charles,
That's a tough one! I definitely see the motivation to test the waters here first before you go spending time on it but it will likely take less time than you think and either way, the user experience will be significantly better once you're done so I'd expect that either way, your time/dev investment would likely be viable.
I suppose you could block certain sections via Robots and wait to measure the results but I'd be more inclined to throw on the gloves and get elbow deep!
You've already mentioned the issues the current structure causes so you are aware of them which is great. With those in mind, focus on the user experience. What is it they're looking for on your site? How would they expect to find it? Can they find the solution with as few clicks as practical?
Rand did a Whiteboard Friday recently on Cleaning up the Cruft which was a great overview of the broader areas you can often trim your site back down to size. For me anyway, the aim is to have as few pages on the site as practical. If a page(s), category, tag etc doesn't need to exist then just remove it!
It's hard to say or to give specific advice here without seeing your site but chances are if you were to sit down and physically map out your website you'd find a lot of redundancy that, once fixed, would cut your million pages down to a significantly more manageable number. A recent example of this for us was a client who had a bunch of redundant blog categories and tags as well as multiple versions of some URLs due to poor internal linking. We cut their total URL volume from over 300 to just 78 and that alone was enough to significantly improve their search visibility.
I'd be happy to take a closer look at this one if you're willing to share your URL, though I understand if you're not. Either way, the best place to start here will be reviewing your site structure and seeing if it truly makes sense.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Trailing Slashes on URLs
Hi we currently have a site on Wordpress which has two version of each URL trailing slash on URLs and one without it. Example: www.domain.com/page (preferred version - based on link data) www.domain.com/page**/** The non-slash version of the URL has most of the external links pointing to them, so we are going to pick that as the preferred version. However, currently, each version of every URL has rel canonical tag pointing to the non-preferred version. E.g. www.domain.com/page the rel canonical tag is: www.domain.com/page/ What would be the best way to clean up this setup? Cheers.
Intermediate & Advanced SEO | | cathywix0 -
Canonical or No-index
Just a quick question really. Say I have a Promotions page where I list all current promotions for a product, and update it regularly to reflect the latest offer codes etc. On top of that I have Offer announcement posts for specific promotions for that product, highlighting very briefly the promotion, but also linking back to the main product promotion page which has a the promotion duplicated. So main page is 1000+ words with half a dozen promotions, the small post might be 200 words, and quickly become irrelevant as it is a limited time news article. Now, I don't want the promotion page indexed (unless it has a larger news story attached to the promotion, but for this purpose presume it is doesn't). Initially the core essence of the post will be duplicated in the main Promotion page, but later as the offer expires it wouldn't be. Therefore would you Rel Canonical or just simply No-index?
Intermediate & Advanced SEO | | TheWebMastercom0 -
Rel=canonical
My website is built around a template, the hosting site say I can only add code into the body of the webpage not the header, will this be ok for rel=canonical If it is my next question is redundant but as there is only one place to put it which urls do I need to place in the code http://domain.com, www.domain.com or http://www.domain.com the /default.asp option for my website does not seem to exist, so I guess is not relevant thanks
Intermediate & Advanced SEO | | singingtelegramsuk0 -
Canonical OR redirect
Hi, i've a site about sport which cover matches. for each match i've a page. last week there was a match between: T1 v T2 so a page was created: www.domain.com/match/T1vT2 - Page1 this week T2 host T1, so there's a new page www.domain.com/match/T2vT1 - Page2 each page has a unique content with Authorship, but the URL, Title, Description, H1 look very similar cause the only difference is T2 word before T1. though Page2 is available for a few days, on site links & sitemap, for the search query "T2 T1 match" Page1 appears on the SERP (high location). of course i want Page2 to be on SERP for the above query cause it's the relevant match. i even don't see Page2 anywhere on the SERP and i think it wasn't indexed. Questions: 1. do you think google see both pages as duplicated though the content is different? 2. is there a difference when you search for T1 vs T2 OR T2 vs T1 ? 3. should i redirect 301 Page1 to Page2? consider that all content for Page1 and the Authorship G+ will be lost. 4. should i make rel=canonical on Page1 to Page2? 5. should i let google sort it out? i know it's a long one, thanks for your patience. Thanks, Assaf
Intermediate & Advanced SEO | | stassaf0 -
Google News URL Structure
Hi there folks I am looking for some guidance on Google News URLs. We are restructuring the site. A main traffic driver will be the traffic we get from Google News. Most large publishers use: www.site.com/news/12345/this-is-the-title/ Others use www.example.com/news/celebrity/12345/this-is-the-title/ etc. www.example.com/news/celebrity-news/12345/this-is-the-title/ www.example.com/celebrity-news/12345/this-is-the-title/ (Celebrity is a channel on Google News so should we try and follow that format?) www.example.com/news/celebrity-news/this-is-the-title/12345/ www.example.com/news/celebrity-news/this-is-the-title-12345/ (unique ID no at the end and part of the title URL) www.example.com/news/celebrity-news/celebrity-name/this-is-the-title-12345/ Others include the date. So as you can see there are so many combinations and there doesnt seem to be any unity across news sites for this format. Have you any advice on how to structure these URLs? Particularly if we want to been seen as an authority on the following topics: fashion, hair, beauty, and celebrity news - in particular "celebrity name" So should the celebrity news section be www.example.com/news/celebrity-news/celebrity-name/this-is-the-title-12345/ or what? This is for a completely new site build. Thanks Barry
Intermediate & Advanced SEO | | Deepti_C0 -
No index, follow vs. canonical url
We have a site that consists almost entirely as a directory of videos. Example here: http://realtree.tv/channels/realtreeoutdoorsclassics We're trying to figure out the best way to handle pagination and utility features such as sort for most recent, most viewed, etc. We've been reading countless articles on this topic, but so far have been unable to determine what might be considered the industry standard. Two solutions seem to stand out... Using the canonical url on all the sorted and paginated pages. However, after reading many blog posts, it seems that you should NEVER use the canonical url to solve the issue of paginated, and thus duplicated content because the search bots will never crawl past the first page leaving many results not in the index. (We are considering ruling this method out.) Another solution seems to be using the meta tag for noindex, follow so that a search engine like Google will crawl your directory pages but not add them to the index themselves. All links are followed so content is crawled and any passing link juice remains unchanged. However, I did see a few articles skeptical of this solution as well saying that there are always better alternatives, or that there is no verification that search engines obey this meta tag. This has placed some doubt in our minds. I was hoping to get some expert advice on these methods as it would pertain to our site. Thank you.
Intermediate & Advanced SEO | | grayloon0 -
Rel=Canonical URLs?
If I had two pages: PageA about Cats PageB about Dogs If PageA had a link rel=canonical to PageB, but the content is different, how would Google resolve this and what would users see if they searched "Cats" or "Dogs?" If PageA 301 redirected to PageB, (no content in PageA since it's 301 redirected), how would Google resolve this and what would users see if they searched "Cats" or "Dogs?"
Intermediate & Advanced SEO | | visionnexus0 -
URL Length or Exact Breadcrumb Navigation URL? What's More Important
Basically my question is as follows, what's better: www.romancingdiamonds.com/gemstone-rings/amethyst-rings/purple-amethyst-ring-14k-white-gold (this would fully match the breadcrumbs). or www.romancingdiamonds.com/amethyst-rings/purple-amethyst-ring-14k-white-gold (cutting out the first level folder to keep the url shorter and the important keywords are closer to the root domain). In this question http://www.seomoz.org/qa/discuss/37982/url-length-vs-url-keywords I was consulted to drop a folder in my url because it may be to long. That's why I'm hesitant to keep the bradcrumb structure the same. To the best of your knowldege do you think it's best to drop a folder in the URL to keep it shorter and sweeter, or to have a longer URL and have it match the breadcrumb structure? Please advise, Shawn
Intermediate & Advanced SEO | | Romancing0