For large sites, best practices for pages hidden behind internal search?
-
If a website has 1M+ pages, with most of them being hidden behind an internal search, what's the best way to get pages included in an engine's index?
Does a direct clickpath to those pages need to exist from the homepage or other major hub pages on the site?
Is submitting an XML sitemap enough?
-
Hello Vlevit,
You could do several things. I recommend giving Google your product feed, which should accomplish your goals. Another possible solution would be to make those search pages noindex,follow so they don't end up getting indexed, but Google can still use them for discovery.
Thanks for explaining the situation.
Below is more on submitting product feeds. It is for Google Product Search, but I would imagine the "link" field where you put the URL to your product detail page will help those pages get indexed in the standard results:
http://support.google.com/merchants/bin/answer.py?hl=en&answer=188494#USEverett
-
Everett, thanks for your reply. I understand the problems of showing internal search pages. I'm not looking to have internal search results being indexed, just the pages that the results link to. We're in eCommerce.
I was under the impression that there was a clever way to have the individual product pages indexed without establishing a direct click path, but best practices recommend otherwise.
Question answered. Thanks all for your help.
-
Hello Vlevit,
If you can be more specific we may be able to be of more help. Google doesn't want you to show internal search result pages, but if this is a different type of situation it there may be an exception. Are these search result pages, product pages, category pages, content pages.... is it an eCommerce site, community, content site... ?
Generally speaking, 1M+ pages with no links going into them and content that is either sparce/thin or partially/fully duplicated on other similar pages (like a search for widgets and a search for green widgets showing overlapping content) is exactly the type of thing that will get you in hot water that would affect even the rankings of your home page.
Do you feel like your question has been answered or would you like to be more specific about your site and goals?
Cheers,
Everett
-
This is what I was assuming, but was wondering if there was a clever way around creating direct click paths to those pages, while still maintaining their importance to the site. Thanks for the info.
-
Make sure they are part of the actual structure of your website, not just part of search. Meaning, you have to have links pointing at them. Also, you will also want to make sure that those pages have value.
-
Hi vlevit,
The best practice would be to exist a direct path of flow from index page. Something like: index -> category(filter) -> subcategory(filter) -> page/product. But in some cases xml sitemaps can also help you in indexing.
BUT, beware with to large XML sitemaps, try to create more then one sitemap, group them as possible.
A few very good resources can be found under the next links:
http://www.seomoz.org/ugc/solving-new-content-indexation-issues-for-large-b2b-websites
http://www.seomoz.org/qa/view/29009/sitemaps-management-for-big-sites-tens-of-millions-of-pages
I hope it helpes,
Istvan
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How do I redirect old html pages to new site?
Im seeing in my Google search console some of my old html pages. I never redirecting them and now they get 404 errors. Below is my current htaccess file, how would I changed it so that any html page i.e. intercallsystems.com/index.html forwards to my new site intercallsystems.com ? I have about 5 html pages that I want to redirect. Thank you for the help! Rena Currently my htaccess says: BEGIN WordPress <ifmodule mod_rewrite.c="">RewriteEngine On
Technical SEO | | palila
RewriteBase /
RewriteRule ^index.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]</ifmodule> END WordPress0 -
Duplicate Landing Pages showing up in search results
Hey Guys, I recently noticed that our Christmas Gifts landing page was ranking twice in the Google serps for the query "Christmas Gifts." One of these pages is an old url that has already been 301 redirected to the new url page which is also showing up in the search results. In the results, the following shows up in position 2 & 3 for the Christmas Gifts query: <cite class="_Rm">www.uncommongoods.com/gifts/christmas/christmas-gifts</cite> <cite class="_Rm">www.uncommongoods.com/occasions/christmas-gifts/christmas-gifts</cite>The url with "occasions" in it has already been 301 redirected to the url above it. Not sure why this is still showing up. I know it takes Google some time to index 301s and sometimes they show old urls, but it's been a few months since the old "occasions" url was redirected.The title tags for these pages are different but they are actually the same page. The new "gifts" version of the url was made live in the Navigation of our site just last week and before that it was hidden from our Navigation. Would this be the reason it's now showing up in search?Any ideas on why this might be happening? ThanksExplanations?
Technical SEO | | znotes0 -
How to stop crawls for product review pages? Volusion site
Hi guys, I have a new Volusion website. the template we are using has its own product review page for EVERY product i sell (1500+) When a customer purchases a product a week later they receive a link back to review the product. This link sends them to my site, but its own individual page strictly for reviewing the product. (As oppose to a page like amazon, where you review the product on the same page as the actual listing.) **This is creating countless "duplicate content" and missing "title" errors. What is the most effective way to block a bot from crawling all these pages? Via robots txt.? a meta tag? ** Here's the catch, i do not have access to every individual review page, so i think it will need to be blocked by a robot txt file? What code will i need to implement? i need to do this on my admin side for the site? Do i also have to do something on the Google analytics side to tell google about the crawl block? Note: the individual URLs for these pages end with: *****.com/ReviewNew.asp?ProductCode=458VB Can i create a block for all url's that end with /ReviewNew.asp etc. etc.? Thanks! Pardon my ignorance. Learning slowly, loving MOZ community 😃 1354bdae458d2cfe44e0a705c4ec38dd
Technical SEO | | Jerrion0 -
Deindexed site - is it best to start over?
A potential client's website has been deindexed from Google. We'd be completely redesigning his site with all new content. Would it be best to purchase a new url and redirect the old deindexed site to the new one, or try stick with the old domain?
Technical SEO | | WillWatrous0 -
Merging two sites into a new one: best way?
Hi, I have one small blog on a specific niche and let's call it firstsite.com (.com extension) and it's hosted on my server. I am going to takeover a second blog on same niche but with lots more links, posts, authority and traffic. But it his on a .info domain and let's call it secondsite.info and for now it's on a different server. I have a third domain .com where I would like join both blogs. Domain is better and reflects niche better and let's call it thirdsite.com How should I proceed to have the best result? I was thinking of creating a new account at my server with domain thirdsite.com After that upload all content from secondsite.info and go to google webmaster to let they know that site now sits on a new domain. Also do a full 301 redirect. Should it be page by page or just one 301 redirect? And finally insert posts (they are not many) from firstsite.com on thirdsite.com and do specific redirects. Is this a good option? Or should I first move secondsite.info to my server and keep updating it and only a few weeks later make transition to thirdsite.com? I am worried that it could be too much changes at once.
Technical SEO | | delta440 -
Why is a site search query being returned in SE results
Hello One of my top targeted keywords is now linking to the results page of an internal site search query (the search query is for my site url). Oddly, this page does not contain the targeted keywords. My site url used to be the highest ranking page for my targeted keywords. Can anybody advise why this is happening and how I can change it? Thanks Nick
Technical SEO | | PP_user0 -
Managing international sites, best practises
This question follows on from my earlier question http://www.seomoz.org/q/how-to-replace-my-co-uk-site-with-my-com-site-in-the-us-google-results My client owns www.blindbolt.co.uk for the UK site and www.blindboltusa.com for their US site. They will shortly be having a new site for Australia. They have just acquired www.blindbolt.com and have expressed an interest in using this as the main hub for all of their sites, i.e. http://uk.blindbolt.com, http://aus.blindbolt.com. The current, existing sites (e.g. www.blindbolt.co.uk) could be 301'd to the new locations. Could I have your thoughts please on whether to go down this route of having international subdomains , vs keeping the sites on separate top level domains? What should I take into consideration? Is google smart enough to return different subdomain results in different countries? Many thanks!
Technical SEO | | OffSightIT0 -
What's the best way to switch over to a new site with a different CMS?
Is it better to 301 or to closely duplicate each page URL when switching over to a new website from an established site with good ranking and a different CMS ( Drupal switching to Wordpress)?
Technical SEO | | OhYeahSteve0