Are there any negative side effects of having millions of URLs on your site?
-
After a site upgrade, we found that we have over 3.7 million URLs on our site. Many of these URLs are due to the facet options. Each facet combination yields a different URL. However, we need to do a deeper analysis into these URLs to see if this is the only reason why so many are returning.
Does anyone know if there are any negatives of having so many URLs crawled, other than the fact that Google only spends so much time crawling a site? Is the number of URLs something that should be concerning?
Any insight appreciated!
-
Agree with the points above with one exception. Yes, you have to find a way to deal with duplicate and quality content at scale. Yes, Robots.txt, nofollow links and index sitemaps are your friends. I would not use rel=canonical unless I had to. Better to get those extra pages de-indexed and then not let Google crawl the urls with the extra parameters to start with. Why waste Google's time in crawling pages that are just resorted versions of another? If you use the directives wisely you probably "only" have 200,000 pages worth crawling if you have that many sort parameters.
Good luck!
-
I'll echo Robert's concern about duplicate content. If those facet combinations are creating many pages with very similar content, that could be an issue for you.
If, let's say, there are 100 facet combinations that create essentially the same basic page content, then consider taking facet elements that do NOT substantially change the page content, and use rel=canonical to tell Google that those are all really the same page. For instance, let's say one of the facets is packaging size, and product X comes in boxes of 1, 10, 100, or 500 units. Let's say another facet is color, and it comes in blue, green, or red. Let's say the URLs for these look like this:
www.mysite.com/product.php?pid=12345&color=blue&pkgsize=1
www.mysite.com/product.php?pid=12345&color=green&pkgsize=10
www.mysite.com/product.php?pid=12345&color=red&pkgsize=100
You would want to set the rel=canonical on all of these to:
www.mysite.com/product.php?pid=12345
Be sure that your XML sitemap, your on-page meta robots, and your rel=canonicals are all in agreement. In other words, if a page has meta robots "noindex,follow", it should NOT show up in your XML sitemap. If the pages above have their rel=canonicals set as described, then your sitemap should contain www.mysite.com/product.php?pid=12345 and NONE of the three example URLs with the color and pkgsize parameters above.
-
There are several concerns to be addressed with this scenario:
- Organization
This is going to be very difficult to keep track of. If you are well-organized or the pages will not need much adjusting, this is probably okay.
- Duplicate Content
This is going to be a pain the behind. That being said, most site auditing tools will allow you to make adjustments as necessary.
- Broken Links
With a site of this size, broken links and 404's are going to be inevitable. This could lead to some negative SEO impacts and will have to be kept on top of.
- Hacking
This is a big reason why some sites have enormous numbers of URLs. This would likely be the biggest concern on my mind and worth looking in to. Going through that many pages will be impossible, so it might be worth taking a look at the link profile and determining where most of your links are coming from. If these are coming from spammy sites, you may have a problem there.
All this being said, the size of a website is normally not a cause for concern. Just make sure that your main pages (Home, Landing Pages) are properly handled and optimized and you shouldn't have too much trouble. I would add that unwieldy htaccess files (large ones) can result in slower loading times, which can impact your rankings with Google.
Let me know if there is anything specific concerning you and I will be happy to help. Congrats on the upgrade and hope it works out!
Rob
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Clean URL vs. Parameter URL and Using Canonical URL...That's a Mouthfull!
Hi Everyone, I a currently migrating a Magento site over to Shopify Plus and have a question about best practices for using the canonical URL. There is a competitor that I believe is not doing it the correct way, so I want to make sure my way is the better choice. With 'Vendor Pages' in Shopify, they show up looking like: https://www.campusprotein.com/collections/vendors?q=Cellucor. Not as clean. Problem is that Shopify also creates https://www.campusprotein.com/collections/cellucor. Same products, same page, just a different more clean URL. I am seeing both indexed in Google. What I want to do is basically create a canonical URL from the URL with the parameter that points to the clean URL. The two pages are very similar. The only difference is that the clean URL page has some additional content at the top of the page. I would say the two pages are 90% the same. Do you see any issue with that?
Technical SEO | | vetofunk0 -
Strange URL's for client's site
We just picked up a new client and I've been doing some digging around on their site. They have quite the wide variety of URL's that make for a rather confusing experience. One of the milder examples is their "About" page. Normally I would expect something along the lines of: www.website.com/about I see: www.website.com/default.asp?Page=About I'm typically a graphic designer and know basically nothing about code, but I just assume this has something funky to do with how their website was constructed. I'm assuming this isn't particularly SEO friendly, but it doesn't seem too bad. Until I got to another section of their site. It's a section that logically should look like: www.website.com/training/public-seminars It's: www.website.com/default.asp?Page=MT&Area=Seminars&Sub=MRM Now that's nonsensical to me! Normally if a client has terrible URL's, I'd say let's do some redirects, but I guess I'm a little intimidated by these. Do the URL's have to be structured like this for some reason? Am I missing some important area of coding here? However, the most bizarre example is a link back to their website from yellowpages.com. Where normally I would expect it to lead to their homepage, I get this bizarre-looking thing: http://website1-px.rtrk.com/?utm_source=ReachLocal&utm_medium=PPC&utm_campaign=AssetManagement&reference_id=15&publisher=yellowpages&placement=ypwebsitemip&action_target=listing_website And as you browse through the site, that strange domain stays. For example the About page is now: http://website1-px.rtrk.com/default.asp?Page=About I would try to google this but I have no idea where to even start! What is going on with these links? Will we be able to fix them to something presentable without breaking their website?
Technical SEO | | everestagency0 -
Which URL structure is better?
Quick question - Have a real estate site focused on "apartments", but apartments in not part of my company name. That being said, should which of the following URL structures should I use? http://website.com/city/neighborhood/property-name OR http://website.com/city-apartments/neighborhood/property-name
Technical SEO | | ChaseH0 -
Googlebot cannot access your site
Hello, I have a website http://www.fivestarstoneinc.com/ and earlier today I got an emil from webmaster tools saying "Googlebot cannot access your site" Wondering what the problem could be and how to fix it.
Technical SEO | | Rank-and-Grow0 -
How do I properly use the canonical tag to avoid negative effect from having identical content on 2 url’s?
To illustrate… I have same website uploaded at 2 locations (url’s). Only the domain extensions are different. www.myexample.com
Technical SEO | | swiftseo
www.myexample.org The benefit is that I may run some promos on one location and not the other to help in product surveys/testing. The website content is 98% identical and I understand this content duplication may cause SEO problems. The domain I wish to use for rankings etc is www.myexample.com 1) How do I go about avoiding seo problem? Do I need to place the canonical tag at www.myexample.org ie 2) Do I also place the exact same tag at the .com location or not necessary there? Is there an alternative or more effective option to resolving the problem?0 -
Will training videos available on the "members only" section of a site contribute to the sites ranking?
Hello, I got asked a question recently as to whether training videos on the deeper pages of a website (that you can only access if you are a member and log in) will help with the sites ranking. On the SEOMoz software these deeper pages have been crawled as far as I can tell with errors reported on pages from the "members only" section of the site, leading me to believe the members only pages and their content will contribute to the sites overall ranking profile. I have suggested uploading the informational videos on the main pages of the site for now, making them accessible to all visitors and putting them in a more obvious place to encourage more sharing and views, however I've also said I would check it out with some experts so any information will be greatly appreciated! Many thanks 🙂 Charlotte
Technical SEO | | CharlotteWaller0 -
Is there actual risk to having multiple URLs that frame in main url? Or is it just bad form and waste of money?
Client has many urls that just frame in the main site. It seems like a total waste of money, but if they are frames, is there an actual risk?
Technical SEO | | gravityseo0 -
Re-write of url
Hi, I would like your input on the following dilemma I am wanting to target the keyword "download xml". at the moment Google indexes us on page 2 and indexes the page www.ourdomain.com/download.aspx I would like to rewrite the url to be /download-xml-editor.aspx The current page is a pr5 and is our most trafficked and externally inked to page. My thoughts are quite mixed on how to do this. approach 1: re-write url of "download.aspx" and setup permanent 301 redirect of download.aspx to download-xml-editor.aspx approach 2: create a new page called download-xml-editor and 301 redirect that to the current stronger page which is download.aspx approach 3: create new page called download-xml-editor with unique content and try and get that page to rank over time, allowing it to build up links and not compromise the current page, then later 301 redirect How would you deal with this and what are your recommendations
Technical SEO | | LiquidTech0