Allow only Rogerbot, not googlebot nor undesired access
-
I'm in the middle of site development and wanted to start crawling my site with Rogerbot, but avoid googlebot or similar to crawl it.
Actually mi site is protected with login (basic Joomla offline site, user and password required) so I thought that a good solution would be to remove that limitation and use .htaccess to protect with password for all users, except Rogerbot.
Reading here and there, it seems that practice is not very recommended as it could lead to security holes - any other user could see allowed agents and emulate them. Ok, maybe it's necessary to be a hacker/cracker to get that info - or experienced developer - but was not able to get a clear information how to proceed in a secure way.
The other solution was to continue using Joomla's access limitation for all, again, except Rogerbot. Still not sure how possible would that be.
Mostly, my question is, how do you work on your site before wanting to be indexed from Google or similar, independently if you use or not some CMS? Is there some other way to perform it?
I would love to have my site ready and crawled before launching it and avoid fixing issues afterwards...Thanks in advance.
-
Great, thanks.
With those 2 recommendations I have more than enough for the next crawler. Thank you both!
-
Hi, thanks for answering
Well, it looks doable. Will try t do it on next programmed crawler, trying to minimize exposed time.
Hw, your idea seems very compatible with my first approach, maybe I could also allow rogerbot through htaccess, limiting others and only for that day remove the security user/password restriction (from joomla) and leave only the htaccess limitation. (I know maybe I'm a bit paranoid just want to be sure to minimize any collateral effect...)
*Maybe could be a good feature for Moz to be able to access restricted sites...
-
Hi,
I ran into a similar issue while we were redesigning our site. This is what we did. We unblocked our site (we also had a user and password to avoid Google indexing it). We added the link to a Moz campaign. We were very careful not to share the URL (developing site) or put it anywhere where Google might find it quickly. Remember Google finds links from following other links. We did not submit the developing site to Google webmaster tools or Google analytics. We watched and waited for the Moz report to come in. When it did, we blocked the site again.
Hope this helps
Carla
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved Ooops. Our crawlers are unable to access that URL
hello
Moz Pro | | ssblawton2533
i have enter my site faroush.com but i got an error
Ooops. Our crawlers are unable to access that URL - please check to make sure it is correct
what is problem ?0 -
Hey all good mozzers - For my new publishing company, would you recommend building on a Wordpress Platform or a different CMS? I want it to look great to user and also allow me to fully optimize it for ranking purposes - thanks
Hey all good mozzers - For my new publishing company, would you recommend building on a Wordpress Platform or a different CMS? I want it to look great to user and also allow me to fully optimize it for ranking purposes - thanks. Seems like Wordpress has a ton of possibility and is much cheaper - but I don't want to do all that work if it is not as strong for SEO purposes - I like to use CMS made simple typically because it is exactly what it claims: simple Advise me please - and then let me know who you think might be able to help me build the best professional site? company or individual? thanks ben
Moz Pro | | creativeguy0 -
Is seomoz rogerbot only crawling the subdomains by links or as well by id?
I´m new at seomoz and just set up a first campaign. After the first crawling i got quite a few 404 errors due to deleted (spammy) forum threads. I was sure there are no links to these deleted threads so my question is weather the seomoz rogerbot is only crawling my subdomains by links or as well by ids (the forum thread ids are serially numbered from 1 to x). If the rogerbot crawls as well serially numbered ids do i have to be concerned by the 404 error on behalf of the googlebot as well?
Moz Pro | | sauspiel0 -
Can i give other accounts access
I would like to be able to give limited access to members of our team so they can see SEO campaign results and print off reports without being able to edit the campaigns. Is this possible?
Moz Pro | | wouldBseoKING0 -
Critical factor Accessible to engine
Hello , i don't understand "Accesible to Engine" - critical factor - that indicate: <dl> <dt>Crawl status</dt> <dd>Status Code: 200
Moz Pro | | lbecarelli
meta-robots: None
meta-refresh: 0; URL=/shop/searchresult.seam
X-Robots: None</dd> <dt>Explanation</dt> <dd>Pages that can't be crawled or indexed have no opportunity to rank in the results. Before tweaking keyword targeting or leveraging other optimization techniques, it's essential to make sure this page is accessible.</dd> <dt>Recommendation</dt> <dd>Ensure the URL returns the HTTP code 200 and is not blocked with robots.txt, meta robots or x-robots protocol (and does not meta refresh to another URL)</dd> <dt>My data</dt> <dd>This is the content of my index and home page:</dd> <dd>and this is my file robots content:</dd> <dd>User-agent: *
Disallow: /shop/debug.seam
Disallow: /bhimg/
Disallow:/shop/cart/
Disallow:/shop/G10/
Disallow:/shop/help/
Disallow:/shop/img/
Disallow:/shop/jQueryUI/
Disallow:/shop/js/
Disallow:/shop/layout/
Disallow:/shop/myShop/
Disallow:/shop/newUser/
Disallow:/shop/shop/
Disallow:/shop/staticPages/
Disallow:/shop/stylesheet/
Disallow:/shop/error.seam
Disallow:/shop/login.seam
Disallow:/shop/login.seam
Disallow:/shop/test/
Disallow:/shop/utility/
Disallow:/shop/zoomifyer/</dd> <dd>Tks for any reply.</dd> </dl>0 -
Fetch googlebot for sites you don't own?
I've used the "fetch as googlebot" tool in Google webmaster tools to submit links from my site, but I was wondering if there was any type of tool or submission process like this for submitting links from other sites that you do not own? The reason I ask is, I worked for several months to get a website to accept my link as part of their dealer locator tool. The link to my site was published a few months ago, however I don't think google has found it and the reason could be because you have to type in your zip code to get the link to appear. This is the website that I am referencing: http://www.ranchhand.com/dealers.php?zip=78070&radius=20 (my website is www.rangeroffroad.com) Is there any way for Google to index the link? Any ideas?
Moz Pro | | texmeix0 -
How do i get rid of a duplicate page error when you can not access that page?
How do i get rid of a duplicate page error when you can not access that page? I am using yahoo store manager. And i do not know code. The only way i can get to this page is by copying the link that the error message gives me. This is the duplicate that i can not find in order to delete. http://outdoortrailcams.com/busebo.html
Moz Pro | | tom14cat140 -
Any plans to allow direct comparison between a selected website (client) and top competitors?
Hi, I really like the SEOMoz keyword difficulty tool. It currently reports metrics between the top 10 positions. Is there any plan to introduce the facility to directly compare metrics between a selected website and that of other competing websites. For example, a clients' website compared to the top 10 results, or compared to a number of other selected competiors websites? Best wishes, David
Moz Pro | | Hallam0