How does Google index pagination variables in Ajax snapshots? We're seeing random huge variables.
-
We're using the Google snapshot method to index dynamic Ajax content. Some of this content is from tables using pagination. The pagination is tracked with a var in the hash, something like:
#!home/?view_3_page=1
We're seeing all sorts of calls from Google now with huge numbers for these URL variables that we are not generating with our snapshots. Like this:
#!home/?view_3_page=10099089
These aren't trivial since each snapshot represents a server load, so we'd like these vars to only represent what's returned by the snapshots.
Is Google generating random numbers going fishing for content? If so, is this something we can control or minimize?
-
Thanks for the great replies all. Just to clarify, this is the page we're referencing:
http://www.knackhq.com/business-directory-user-demo/?escaped_fragment=
You can see the one pagination var "next" that points here:
http://www.knackhq.com/business-directory-user-demo/?escaped_fragment=home/?view_3_page=2
As you can see this is pretty simple. There's only one potential variable (the "prev" and "next" links) for introducing these huge numbers and that's pretty limited. We tested the Google URLs up and down the app and haven't seen anything that would send it fishing for larger numbers. But Google keeps hammering us with:
GET /business-directory-user-demo/?escaped_fragment=home/?view_3_page=1000251
For now we're trying to respond to those with 404s and hope they eventually die.
Unfortunately we can't avoid hashbangs.
-
This seems to do this only for parameters that it has decided "changes, re-orders, or narrows content." They may also crawl things that look like URLs in Javascript even when it's part of a function, but it doesn't seem like that's what's happening in this case.
Depending on the setup of the site, you can either manually configure the variable in WMT (don't do this if the parameter is material), write a clever robots.txt rule (e.g. to block anything after a number of digits after the parameter), or (the best solution) re-work the system to generate URLs that don't rely on parameters.
I'm not sure I understand why the server is rendering a page if the URL isn't supposed to exist. Depending on your server config, you may also be able to return a 404 and make a rule for which (valid) pages to render. From there you can just ignore the 404 errors until Google figures it out.
I think that's the best I can do without seeing the site.
-
I agree with Federico. I've seen Google go fishing with URL parameters (?param=xyz) and I've seen it with AJAX and hashbangs as well. How far they take this and when they choose to apply it doesn't seem to follow a consistent pattern . You can see some folks on StackExchange discussing this, too: http://webmasters.stackexchange.com/questions/25560/does-the-google-crawler-really-guess-url-patterns-and-index-pages-that-were-neve
-
Awesome, thanks for looking into it. We've gotten nowhere with any kind of answer.
-
Hi There
I'm an associate here at Moz, and have asked the other associates if they might know the answer, as this one's a little outside of my experience. Please follow up and let us know if you don't hear from anyone.
Thanks!
-Dan
-
We also noticed some weird crawls last year using random numbers at the end of the URL, checking in google webmaster tools we saw that most of those urls were reported as not found, checking from where the link came from google listed some of our URLs, but didn't had any link to those URLs google was trying to fetch. After 2 or 3 months those crawls stopped. We never knew from where Google got those URLs...
-
Hi Federico, thanks for the response.
Unfortunately this is an SEO solution for a third-party JavaScript product, so removing the hash isn't an option.
I'm still interested in knowing if this is a formal Google practice and if there's some way to control or mitigate this.
-
I think you are right. Google is fishing for content. I would find a solution to make those URL friendly by removing the hash and using some URL rewrite and pushState to paginate that content instead.
Here's a previous question that may help: http://moz.com/community/q/best-way-to-break-down-paginated-content
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How long to re-index a page after being blocked
Morning all! I am doing some research at the moment and am trying to find out, just roughly, how long you have ever had to wait to have a page re-indexed by Google. For this purpose, say you had blocked a page via meta noindex or disallowed access by robots.txt, and then opened it back up. No right or wrong answers, just after a few numbers 🙂 Cheers, -Andy
Intermediate & Advanced SEO | | Andy.Drinkwater0 -
Client has moved to secured https webpages but non secured http pages are still being indexed in Google. Is this an issue
We are currently working with a client that relaunched their website two months ago to have hypertext transfer protocol secure pages (https) across their entire site architecture. The problem is that their non secure (http) pages are still accessible and being indexed in Google. Here are our concerns: 1. Are co-existing non secure and secure webpages (http and https) considered duplicate content?
Intermediate & Advanced SEO | | VanguardCommunications
2. If these pages are duplicate content should we use 301 redirects or rel canonicals?
3. If we go with rel canonicals, is it okay for a non secure page to have rel canonical to the secure version? Thanks for the advice.0 -
Do I need to re-index the page after editing URL?
Hi, I had to edit some of the URLs. But, google is still showing my old URL in search results for certain keywords, which ofc get 404. By crawling with ScremingFrog it gets me 301 'page not found' and still giving old URLs. Why is that? And do I need to re-index pages with new URLs? Is 'fetch as Google' enough to do that or any other advice? Thanks a lot, hope the topic will help to someone else too. Dusan
Intermediate & Advanced SEO | | Chemometec0 -
Google is indexing the wrong page
Hello, I have a site I am optimizing and I cant seem to get a particular listing onto the first page due to the fact google is indexing the wrong page. I have the following scenario. I have a client with multiple locations. To target the locations I set them up with URLs like this /<cityname>-wedding-planner.</cityname> The home page / is optimized for their port saint lucie location. the page /palm-city-wedding-planner is optimized for the palm city location. the page /stuart-wedding-planner is optimized for the stuart location. Google picks up the first two and indexes them properly, BUT the stuart location page doesnt get picked up at all, instead google lists / which is not optimized at all for stuart. How do I "let google know" to index the stuart landing page for the "stuart wedding planner" term? MOZ also shows the / page as being indexed for the stuart wedding planner term as well but I assume this is just a result of what its finding when it performs its searches.
Intermediate & Advanced SEO | | mediagiant0 -
Pages getting into Google Index, blocked by Robots.txt??
Hi all, So yesterday we set up to Remove URL's that got into the Google index that were not supposed to be there, due to faceted navigation... We searched for the URL's by using this in Google Search.
Intermediate & Advanced SEO | | bjs2010
site:www.sekretza.com inurl:price=
site:www.sekretza.com inurl:artists= So it brings up a list of "duplicate" pages, and they have the usual: "A description for this result is not available because of this site's robots.txt – learn more." So we removed them all, and google removed them all, every single one. This morning I do a check, and I find that more are creeping in - If i take one of the suspecting dupes to the Robots.txt tester, Google tells me it's Blocked. - and yet it's appearing in their index?? I'm confused as to why a path that is blocked is able to get into the index?? I'm thinking of lifting the Robots block so that Google can see that these pages also have a Meta NOINDEX,FOLLOW tag on - but surely that will waste my crawl budget on unnecessary pages? Any ideas? thanks.0 -
Why did this website disappear from Google's SERPs?
For the first several months this website, WEBSITE, ranked well in Google for several local search terms like, "Columbia MO spinal decompression" and "Columbia, MO car accident therapy." Recently the website has completely disappeared from Google's SEPRs. It does not even exist when I copy and paste full paragraphs into Google's search bar. The website still ranks fine in Bing and Yahoo, but something happened that caused it to be removed from Google. Beside for optimizing the meta data, adding headers, alt tags, and all of the typical on-page SEO stuff, we did create a guest post for a relevant, local blog. Here is the post: Guest Post. The post's content is 100% unique. I realize the post has way to many internal/external links, which we definitely did not recommend, but can anyone find a reason why this website was removed from Google's SERPs? And possibly how we should go about getting it back into Google's SERPs? Thanks in advance for any help.
Intermediate & Advanced SEO | | VentaMarketing0 -
Re-Direct Users But Don't Affect Googlebot
This is a fairly technical question... I have a site which has 4 subdomains, all targeting a specific language. The brand owners don't want German users to see the prices on the French sub domain and are forcing users into a re-direct to the relevant subddomain, based on their IP address. If a user comes from a different country, (ie the US) they are forced on the UK sub domain. The client is insistent on keeping control of who sees what (I know that's a debate in it's own right), but these re-directs we're implementing to make that happen, are really making it difficult to get all the subdomains indexed as I think googlebot is also getting re-directed and is failing to do it's job. Is there are a way of re-directing users, but not Googlebot?
Intermediate & Advanced SEO | | eventurerob0 -
How can I block unwanted urls being indexed on google?
Hi, I have to block unwanted urls (not that page) from being indexed on google. I have to block urls like example.com/entertainment not the exact page example.com/entertainment.aspx . Is there any other ways other than robot.txt? If i add this to robot.txt will that block my other url too? Or should I make a 301 redirection from example.com/entertainment to example.com/entertainment.aspx. Because some of the unwanted urls are linked from other sites. thanks in advance.
Intermediate & Advanced SEO | | VipinLouka780