Are there any negative side effects of having millions of URLs on your site?
-
After a site upgrade, we found that we have over 3.7 million URLs on our site. Many of these URLs are due to the facet options. Each facet combination yields a different URL. However, we need to do a deeper analysis into these URLs to see if this is the only reason why so many are returning.
Does anyone know if there are any negatives of having so many URLs crawled, other than the fact that Google only spends so much time crawling a site? Is the number of URLs something that should be concerning?
Any insight appreciated!
-
Agree with the points above with one exception. Yes, you have to find a way to deal with duplicate and quality content at scale. Yes, Robots.txt, nofollow links and index sitemaps are your friends. I would not use rel=canonical unless I had to. Better to get those extra pages de-indexed and then not let Google crawl the urls with the extra parameters to start with. Why waste Google's time in crawling pages that are just resorted versions of another? If you use the directives wisely you probably "only" have 200,000 pages worth crawling if you have that many sort parameters.
Good luck!
-
I'll echo Robert's concern about duplicate content. If those facet combinations are creating many pages with very similar content, that could be an issue for you.
If, let's say, there are 100 facet combinations that create essentially the same basic page content, then consider taking facet elements that do NOT substantially change the page content, and use rel=canonical to tell Google that those are all really the same page. For instance, let's say one of the facets is packaging size, and product X comes in boxes of 1, 10, 100, or 500 units. Let's say another facet is color, and it comes in blue, green, or red. Let's say the URLs for these look like this:
www.mysite.com/product.php?pid=12345&color=blue&pkgsize=1
www.mysite.com/product.php?pid=12345&color=green&pkgsize=10
www.mysite.com/product.php?pid=12345&color=red&pkgsize=100
You would want to set the rel=canonical on all of these to:
www.mysite.com/product.php?pid=12345
Be sure that your XML sitemap, your on-page meta robots, and your rel=canonicals are all in agreement. In other words, if a page has meta robots "noindex,follow", it should NOT show up in your XML sitemap. If the pages above have their rel=canonicals set as described, then your sitemap should contain www.mysite.com/product.php?pid=12345 and NONE of the three example URLs with the color and pkgsize parameters above.
-
There are several concerns to be addressed with this scenario:
- Organization
This is going to be very difficult to keep track of. If you are well-organized or the pages will not need much adjusting, this is probably okay.
- Duplicate Content
This is going to be a pain the behind. That being said, most site auditing tools will allow you to make adjustments as necessary.
- Broken Links
With a site of this size, broken links and 404's are going to be inevitable. This could lead to some negative SEO impacts and will have to be kept on top of.
- Hacking
This is a big reason why some sites have enormous numbers of URLs. This would likely be the biggest concern on my mind and worth looking in to. Going through that many pages will be impossible, so it might be worth taking a look at the link profile and determining where most of your links are coming from. If these are coming from spammy sites, you may have a problem there.
All this being said, the size of a website is normally not a cause for concern. Just make sure that your main pages (Home, Landing Pages) are properly handled and optimized and you shouldn't have too much trouble. I would add that unwieldy htaccess files (large ones) can result in slower loading times, which can impact your rankings with Google.
Let me know if there is anything specific concerning you and I will be happy to help. Congrats on the upgrade and hope it works out!
Rob
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Stuck with canonical URL - main site vs categorys?
Hello, I started to doubt myself. We have a classified advertisements website. On the main www.website.com page, almost all the advertisements are shown. Now we take those advertisements and also split them into categorys Category 1 / category 2 / category 3 / category 4 Now all those categories almost always have the same content as www.website.com except a bit less (because X amount of content is now divided also to 4-5 groups) For raking should i actually tell google that those categories are a copy of www.website.com or they should still be as they are?
Technical SEO | | advertisingcloud0 -
Are sliders killing our site?
Our website, http://shatterbuggy.com, has what I believe is a systemic issue that stems from the heavy reliance upon the Revolution Slider for Wordpress. I am not an SEO expert and our site has vexed many SEOs in the past. We get feedback regularly from customers (especially those that are not tech savvy) that express gratitude for the ease of use via following an image to image sequence to get to their respective booking. This was our goal when creating the site. Incidentally, in many cases, the only linking from page to page is within the slider itself (clickable image) and there is little to no content. That said, we seems to stumble in SERPS against seemingly inferior competition. For example, we should be ranked in spot 1, 2, or 3 ish for "iPhone repair Minneapolis" but rather we are stuck near spot 15. Any thoughts on whether this is a strategy that may be harming us? If so, would simply creating content on these empty (slider only) pages help? Should we create "static links" that connect to the same places as the slider? Also, is our particular use of the slider creating H1 issues? Thank you all! B.
Technical SEO | | BenjaminH0 -
Is it problematic for Google when the site of a subdomain is on a different host than the site of the primary domain?
The Website on the subdomain runs on a different server (host) than the site on the main domain.
Technical SEO | | Christian_Campusjaeger0 -
Special characters in URL
Will registered trademark symbol within a URL be bad? I know some special characters are unsafe (#, >, etc.) but can not find anything that mentions registered trademark. Thanks!
Technical SEO | | bonnierSEO0 -
Updating content on URL or new URL
High Mozzers, We are an event organisation. Every year we produce like 350 events. All the events are on our website. A lot of these events are held every year. So i have an URL like www.domainname.nl/eventname So what would you do. This URL has some inbound links, some social mentions and so on. SO if the event will be held again in 2013. Would it be better to update the content on this URL or create a new one. I would keep this URL and update it because of the linkvalue and it is allready indexed and ranking for the desired keyword for that event. Cheers, Ruud
Technical SEO | | RuudHeijnen0 -
Has any positive or negative effect for the SEO results if the domain contains desired keyword?
Helo! Has any positive or negative effect for the SEO results if the domain contains desired keyword? Thanks for the answer.
Technical SEO | | Brainsum0 -
Crawl reveals hundreds of urls with multiple urls in the url string
The latest crawl of my site revealed hundreds of duplicate page content and duplicate page title errors. When I looked it was from a large number of urls with urls appended to them at the end. For example: http://www.test-site.com/page1.html/page14.html or http://www.test-site.com/page4.html/page12.html/page16.html some of them go on for a hundred characters. I am totally stymied, as are the people at my ISP and the person who talked to me on the phone from SEOMoz. Does anyone know what's going on? Thanks So much for any help you can offer! Jean
Technical SEO | | JeanYates0 -
When URL rewrite can lead to un pretty URLs
Hi Mozzers. I've a client that has done a little bit of mess rewriting the URLs of its site. In fact, also the data base driven URLs are rewritten, but the dev forgot to change the space with "-", so that now the 95% of the URLs are like this one: http://www.portalesardegna.com/search/Appartamenti e Residence/ Obviously not really a pretty URL. I am not so sure if this issue has an SEO consecuences (in fact, the site ranks pretty well also with those kind of url), but I am thinking more on usability issue. Could you suggest me any easy fix to this rewrite problem?
Technical SEO | | gfiorelli12