How to stop Search Bot from crawling through a submit button
-
On our website http://www.thefutureminders.com/, we have three form fields that have three pull downs for Month, Day, and year. This is creating duplicate pages while indexing. How do we tell the search Bot to index the page but not crawl through the submit button?
Thanks
Naren
-
Hi Dan
What is happening is this - since we have all the months [12], all the dates [31] and years[1921 through 2011] in the form fields, the robot seems to be taking these incrementally and then using the submit button. After the submit button, user is presented with a registration page. While we do want the search to index the rest of the page and the crawl through the rest of the page links we do not want it to crawl through that submit button. I hope I am making sense.
Naren
-
The advantage of blocking a page from being indexed via a meta tag is it is less likely to have unexpected consequences. I've often seen in the past cases where an incorrectly modified robots.txt file leads to a site being blocked by accident.
-
Hi
To my knowledge, you don't stop it from crawling through the button (like a nofollowed link), rather you block the robot at the page it ends up on after clicking submit.
Say the user hits submit and it takes them to mydomain.com/confirm.html On that page you'll want to add;
....if you want it to NOT index the page but follow the links on it.
or
...if you want it to NOT index and NOT follow the links on that page.
Its advised that its better to do this with the meta tag than in robots.txt.
Hopefully I've understood the question correctly!
-Dan
-
Block the pages/folders you do not wish to be indexed with robots.txt file:
User-agent: * Disallow: /folder1/ Disallow: /folder2/
OR you can add canonical tags to the other pages which are creating duplicate content.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Errors In Search Console
Hi All, I am hoping someone might be able to help with this. Last week one of my sites dropped from mid first day to bottom of page 1. We had not been link building as such and it only seems to of affected a single search term and the ranking page (which happens to be the home page). When I was going through everything I went to search console and in crawl errors there are 2 errors that showed up as detected 3 days before the drop. These are: wp-admin/admin-ajax.php showing as response code 400 and also xmlrpc.php showing as response code 405 robots.txt is as follows: user-agent: * disallow: /wp-admin/ allow: /wp-admin/admin-ajax.php Any help with what is wrong here and how to fix it would be greatly appreciated. Many Thanks
Technical SEO | | DaleZon0 -
Indexed, but not shown in search result
Hi all We face this problem for www.residentiebosrand.be, which is well programmed, added to Google Search Console and indexed. Web pages are shown in Google for site:www.residentiebosrand.be. Website has been online for 7 weeks, but still no search results. Could you guys look at the update below? Thanks!
Technical SEO | | conversal0 -
New pages need to be crawled & indexed
Hi there, When you add pages to a site, do you need to re-generate an XML site map and re-submit to Google/Bing? I see the option in Google Webmaster Tools under the "fetch as Google tool" to submit individual pages for indexing, which I am doing right now. Thanks,
Technical SEO | | SSFCU
Sarah0 -
Why can no tool crawl this site?
I am trying to perform a crawl analysis on a client's website at https://www.bravosolution.com I have tried to crawl it with IIS for SEO, Sreaming Frog and Xenu and not one of them makes it further than the home page of the site. There is nothing I can see in the robots.txt that is blocking these agents. As far as I can see, Google is able to crawl the site although they have noticed a significant drop in organic traffic. Any advise would be very welcome Regards Danny
Technical SEO | | richdan0 -
My website pages are not crawled, what to do?
Hi all. I have made some changes on the website so i like to crawled them by the search engines Google especially. I have made these changes around 2 weeks ago. I have submitted my website on good bookmarking websites. Also i used a tool available in Google webmasters "Fetch as Google", Resubmitted a sitemap.xml. Still my pages are not crawled your opinion please. Thanks
Technical SEO | | lucidsoftech0 -
Ajax Crawling | Blocked URLs Spike
http://www.zando.co.za/women/shoes/ (for example) Hello, I'm concerned that WMT is reporting a large spike in blocked URLs - now reporting more blocked URLs than good URLs. Our product recommendations get generated via an Ajax call and these autogenerated, unique, URLs are rendered in the /recommendations/ folder which sits in the root of our site: http://www.zando.co.za/recommendations/ I can't see how I can prevent Google from calling the Ajax - I can only assume that's what's happening.This is what the code typically looks like:
Technical SEO | | RocketZando0 -
Is this tabbed implementation of SEO copy correct (i.e. good for getting indexed and in an ok spot in the html as viewed by search bots?
We are trying to switch to a tabbed version of our team/product pages at SeatGeek.com, but where all tabs (only 2 right now) are viewed as one document by the search engines. I am pretty sure we have this working for the most part, but would love some quick feedback from you all as I have never worked with this approach before and these pages are some of our most important. Resources: http://www.ericpender.com/blog/tabs-and-seo http://www.google.com/support/forum/p/Webmasters/thread?tid=03fdefb488a16343&hl=en http://searchengineland.com/is-hiding-content-with-display-none-legitimate-seo-13643 Sample in use: http://www.seomoz.org/article/search-ranking-factors **Old Version: ** http://screencast.com/t/BWn0OgZsXt http://seatgeek.com/boston-celtics-tickets/ New Version with tabs: http://screencast.com/t/VW6QzDaGt http://screencast.com/t/RPvYv8sT2 http://seatgeek.com/miami-heat-tickets/ Notes: Content not displayed stacked on browser when Javascript turned off, but it is in the source code. Content shows up in Google cache of new page in the text version. In our implementation the JS is currently forcing the event to end before the default behavior of adding #about in this case to the url string - this can be changed, should it be? Related to this, the developer made it so that typing http://seatgeek.com/miami-heat-tickets/#about directly into the browser does not go to the tab with copy, which I imagine could be considered spammy from a human review perspective (this wasn't intentional). This portion of the code is below the truncated view of the fetch as Googlebot, so we didn't have that resource. Are there any issues with hidden text / is this too far down in the html? Any/all feedback appreciated. I know our copy is old, we are in the process of updating it for this season.
Technical SEO | | chadburgess0 -
Do search engines still index/crawl private content?
If you have a membership site, which requires a payment to access specific content/images/videos, do search engines still use that content as a ranking/domain authority factor? Is it worth optimizing these "private" pages for SEO?
Technical SEO | | christinarule1