URL Parameter Handling In GWT to Treat Overindexation - how aggressive?
-
Hi,
My client recently launched a new site and their index went from about 20K up to about 80K - which is a severe over indexation.
I believe this was caused by parameter handling as some category pages now have 700 pages in the results for "site:domain.com/category1" - and apart from the top result, they are all parameters being indexed.
My question is how active/aggressive should I be in blocking these parameters in Google Webmaster Tools? Currently, everything is set to 'let googlebot decide'.
-
Hi! Did these answers take care of your question, or do you still have some questions?
-
Hey There
I would use a robots meta noindex on them (except for the top page of course) and use rel = prev/next to show they are paginated.
I would prefer to do that than use WMT. Also, WMT crawl settings will stop the crawling, but not remove them from the index. Plus, WMT will only handle Google, not other engines like Bing etc. Not that Bing matters, but always better to have a universal solution.
-Dan
-
Hello Search Guys,
Here is some food for thought taken from: http://www.quora.com/Does-Google-limit-the-number-of-pages-it-indexes-for-a-particular-site
Summary:
"Google says they crawl the web in "roughly decreasing PageRank order" and thus, pages that have not achieved widespread link popularity, particularly on large, deep sites, may not be crawled or indexed."
"Indexation
There is no limit to the number of pages Google may index (meaning available to be served in search results) for a site. But just because your site is crawled doesn't mean it will be indexed.Crawl
The ability, speed and depth for which Google crawls your site and retrieves pages can be dependent on a number of factors: PageRank, XML sitemaps, robots.txt, site architecture, status codes and speed.""For a zero-backlink domain with 80.000+ pages, in conjunction with rel=canonical and an xml-sitemap (You do submit a sitemap, don't you?), after submitting the domain to Google for a crawl, a little less than 10k pages remained in index. A few crawls later this was reduced to a mere 250 (very good job on Google's side).
This leads me to believe the indexation cap for a newer site with low to zero pagerank/authority is around 10k."
Another interesting article: http://searchenginewatch.com/article/2062851/Google-Upping-101K-Page-Index-Limit
Hope this helps, and easy response is to limit crawling to the most needed pages as aggressive as possible to remove the unneeded links leaving only needed ones
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Redirecting to Modal URLs
Hi everyone! Long time no chat - hope you're all well! I have a question that for some reason is causing me some trouble. I have a client that is creating a new website, the process was a mess and I am doing a last minute redirect file for them (long story, for another time). They have different teams for different business categories, so there are multiple staff pages with a list of staffers, and a link to their individual pages. Currently they have a structure like this for their staff bios... www.example.com/category-staff/bob-johnson/ But now, to access the staffers bio, a modal pops up. For instance... www.example.com/category-staff/#bob-johnson Should I redirect current staffers URLs to the staff category, or the modal URL? Unfortunately, we are late in the game and this is the way the bio pages are set up. Would love thoughts, thanks so much guys!!
Intermediate & Advanced SEO | | PatrickDelehanty0 -
Duplicate URL Parameters for Blog Articles
Hi there, I'm working on a site which is using parameter URLs for category pages that list blog articles. The content on these pages constantly change as new posts are frequently added, the category maybe for 'Heath Articles' and list 10 blog posts (snippets from the blog). The URL could appear like so with filtering: www.domain.com/blog/articles/?taxonomy=health-articles&taxon=general www.domain.com/blog/articles/?taxonomy=health-articles&taxon=general&year=2016 www.domain.com/blog/articles/?taxonomy=health-articles&taxon=general&year=2016&page=1 All pages currently have the same Meta title and descriptions due to limitations with the CMS, they are also not in our xml sitemap I don't believe we should be focusing on ranking for these pages as the content on here are from blog posts (which we do want to rank for on the individual post) but there are 3000 duplicates and they need to be fixed. Below are the options we have so far: Canonical URLs Have all parameter pages within the category canonicalize to www.domain.com/blog/articles/?taxonomy=health-articles&taxon=general and generate dynamic page titles (I know its a good idea to use parameter pages in canonical URLs). WMT Parameter tool Tell Google all extra parameter tags belong to the main pages (e.g. www.domain.com/blog/articles/?taxonomy=health-articles&taxon=general&year=2016&page=3 belongs to www.domain.com/blog/articles/?taxonomy=health-articles&taxon=general). Noindex Remove all the blog category pages, I don't know how Google would react if we were to remove 3000 pages from our index (we have roughly 1700 unique pages) We are very limited with what we can do to these pages, if anyone has any feedback suggestions it would be much appreciated. Thanks!
Intermediate & Advanced SEO | | Xtend-Life0 -
What are partial urls and why this is causing a sitemap error?
Hi mozzers, I have a client that recorded 7 errors when generating Xml sitemap. One of the errors appear to be coming from partial urls and apparently I would need to exclude them from sitemap. What are they exactly and why would they cause an error in the sitemap. Thanks!
Intermediate & Advanced SEO | | Ideas-Money-Art0 -
URL Parameters Duplicate Page Title
Thanks in advance, I'm getting duplicate page titles because seomoz keeps crawling through my url parameters. I added forcefiltersupdate to the URL parameters in webmaster tools but it has not seemed to have an effect. Below is an example of the duplicate content issue that I am having. http://qlineshop.com/OC/index.php?route=product/category&path=59_62&forcefiltersupdate=true&checkedfilters[]=a.13.13.387baf0199e7c9cc944fae94e96448fa Any thoughts? Thanks again. -Patrick
Intermediate & Advanced SEO | | bamron0 -
Spaces in URL line
Hi Gurus, I recently made the mistake of putting a space into a URL line between two words that make up my primary key word. Think www.example.com/Jelly Donuts/mmmNice.php instead of www.example.com/JellyDonuts/mmmNice.php This mistake now needed fixing to www.example.com/Jelly Donuts/mmmNice.php to pass W3, but has been in place for a while but most articles/documents under 'Jelly Donuts' are not ranking well (which is probably the obvious outcome of the mistake). I am wondering whether the best solution from an SEO ranking viewpoint is to: 1. Change the article directory immediately to www.example.com/JellyDonuts/mmmNice.php and rel=canonical each article to the new correct URL. Take out the 'trash' using robots.txt or to 301 www.example.com/Jelly Donut to the www.example.com/JellyDonut directory? or perhaps something else? Thanks in advance for your help with this sticky (but tasty) conundrum, Brad
Intermediate & Advanced SEO | | BM70 -
URL rewrites
We have a problem whereby a number of our urls are adressable from different urls - I'm told because of a quirk of developing in .net. e.g. mysite/FundComparison mysite/Fund-comparison mysite/fund-comparison We asked our supplier who hosts this section of our site to do some url rewrites so that the duplicates would 301 to the correct url. They're on IIS 6.0 and are not ready to upgrade to IIS 7.0 (my recommendation, which makes it easier for them to do the rewrite using the rewrite module). They said it would take 6-8 weeks to implement a web controller to do this. "The bulk of the time for this implementation is in the build of the engine + the addition of all the possible permutations of the URL to redirect to the proper URL." This sounds absolutely insane to me. I would have thought it could be done in a matter of hours. What do people think?
Intermediate & Advanced SEO | | SearchPM0 -
Does URL format affect Keyword effectiveness for a URL?
I am looking at our site structure, and don't want to have to rebuild the way the site was linked together based on it's current folder structure so I am wondering what option would work better for our URL structure. I will uses car categories as an example of what I am talking about, but you can insert any category structure you like. For example I would like to have pages like this: www.example.com/ford-convertibles
Intermediate & Advanced SEO | | SL_SEM
www.example.com/chevy-convertibles But instead due to the site structure I will need to have pages like this: www.example.com/ford/convertibles
www.example.com/chevy/convertibles But wonder if I shouldn't do the following to ensure the proper phrase is known for the page: www.example.com/ford/ford-convertibles
www.example.com/chevy/chevy-convertibles The "/ford/ford-convertibles" just seems odd to me as a human, but I haven't seen anything on how well a keyphrase in a URL split by /'s does and I know dashes for phrases are fine. This means I am inclined to go with the"/ford/ford-convertibles"style because it keeps the keyphrase separated by dashes even if it is a bit repetitive. There will be other pages too like "/ford/top-10-fords-ever" but I don't wonder about that since it isnt "ford/ford-xxxxx" Thoughts on whether /'s in a keyphrase are as good as dashes?0 -
Rel=Canonical URLs?
If I had two pages: PageA about Cats PageB about Dogs If PageA had a link rel=canonical to PageB, but the content is different, how would Google resolve this and what would users see if they searched "Cats" or "Dogs?" If PageA 301 redirected to PageB, (no content in PageA since it's 301 redirected), how would Google resolve this and what would users see if they searched "Cats" or "Dogs?"
Intermediate & Advanced SEO | | visionnexus0