Robots.txt to disallow /index.php/ path
-
Hi SEOmoz,
I have a problem with my Joomla site (yeah - me too!). I get a large amount of /index.php/ urls despite using a program to handle these issues. The URLs cause indexation errors with google (404). Now, I fixed this issue once before, but the problem persist. So I thought, instead of wasting more time, couldnt I just disallow all paths containing /index.php/ ?.
I don't use that extension, but would it cause me any problems from an SEO perspective?
How do I disallow all index.php's? Is it a simple: Disallow: /index.php/
-
Hi Cyrus,
Thanks for your reply!
Unfortunately the problem is yet to be fixed, I hope that my disallow will work shortly.
It seems that most of the index.php links to each other internally (and from old /index.php/ pages that no longer exist), which is super weird. How google found them does not make any sense to me.
I don't beleive that external sources are linking to these pages either - I mean, how would they find these links anyway?.
-
Hi Mikkel,
Like Chris, I spidered your site and couldn't find any links to /index.php files, which probably indicates one of two things:
- You've fixed the problem - Yay!
- Or Google is finding those links from external sources
- Google found those links at one time in the past, and is still trying to crawl them.
In the Crawl Errors report in Google Webmaster Tools, if you click on the link of each 404, there's often a "linked from" source where you can see where Google discovered the broken link. This is really helpful in rooting out the cause.
Regardless, I'm going to go with #1 and optimistically believe that you were able to fix the problem.
-
If I spider your site I'm not seeing any /index.php urls. Does that mean you did get Joomla to cooperate with your rewriting?
Or was your problem that you'd previously had urls indexed with /index.php/ paths and you needed to remove them?
-
Hi Mikkel, I have checked your robots.txt, it looks perfect. If you redirect /index.php to home page that using httaccess file or by using any joomla plugin that would great for you. And its also a permanent solution.
-
Well, I tried the sensible solution and redirecting to the correct URL instead. However the SEF program is quite limited and keep on creating new URLs regardless of my modification. Im looking for a more permanent solution, and the disallow seems at bit simple as I'm not a super programmer.
By the way - thanks for quick replys, kudos to both of you!
-
Sure, the website in question is www.vauni.dk
I don't think that there is any inbound links to the index.php pages. They are not easily found.
-
Couldn't you rewrite those /index.php/ urls to remove the /index.php/?
Like this in .htaccess:
RewriteRule ^(.*)$ /index.php/$1 [L]
Only used Joomla once, but there must be a way to configure joomla to just use "/" instead of "/index.php/"?
Update:
Here's a solution to your /index.php/ issue:
http://www.eprcreations.com/remove-index-php-from-joomla-urls/
Once you've updated that, and have your urls working properly without the /index.php/, you could add this slight modification of the rewrite rule above so that all your old /index.php/ urls would be 301'd to your new ones:
RewriteRule ^(.*)$ /index.php/$1 [R=301,L]
Put it underneath the RewriteBase / line they describe in that post.
-
Hi Mikkel,
Do you inbound link pointing to you index.php pages ? If yes, then it might affect your seo. Disallow: /index.ph/ is perfect but after implementing it don't inter link those index.php pages. Can you share me your website URL so that I can show you with example. How to do it.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to stop robots.txt restricting access to sitemap?
I'm working on a site right now and having an issue with the robots.txt file restricting access to the sitemap - with no web dev to help, I'm wondering how I can fix the issue myself? The robots.txt page shows User-agent: * Disallow: / And then sitemap: with the correct sitemap link
Technical SEO | | Ad-Rank0 -
From: http://www. to https://
Hi all, I am changing my hosting for legal and SEO reasons from http://www to https:// . Now I hear different stories on the redirects: 1: should i try and change my backlinks? 2: internally all links will be 301 redirected at first. Than I want to (manually) change them. It;s within Wordpress so there should be a plugin for this. Tips? 3: Will it affect my rankings and for what period? What I now know that at first it will drop little but eventually you will rank higher than before. Thanks so much in advance! Tymen
Technical SEO | | Tymen1 -
No index
Screaming frog spider does index pages on our website like: wp-content/plugins/woocommerce/assets/js/frontend/jquery-ui-touch-punch.min.js?ver=2.3.9 wp-content/plugins/mailchimp-for-wp/assets/css/checkbox.min.css?ver=2.3.2 Is it a bad/good idea to set my parameters in Webmastertools and tell Google not to crawl pages that begin with wp/content? Thanks!
Technical SEO | | Happy-SEO1 -
Why are only a few of our pages being indexed
Recently rebuilt a site for an auctioneers, however it has a problem in that none of the lots and auctions are being indexed by Google on the new site, only the pages like About, FAQ, home, contact. Checking WMT shows that Google has crawled all the pages, and I've done a "Fetch as Google" on them and it loads up fine, so there's no crawling issues that is standing out. I've set the "URL Parameters" to no effect too. Also built a sitemap with all the lots in, pushed to Google which then crawled them all (massive spike in Crawl rate for a couple days), and still just indexing a handful of pages. Any clues to look into would be greatly appreciated. https://www.wilkinsons-auctioneers.co.uk/auctions/
Technical SEO | | Blue-shark0 -
Exclude root url in robots.txt ?
Hi, I have the following setup: www.example.com/nl
Technical SEO | | mikehenze
www.example.com/de
www.example.com/uk
etc
www.example.com is 301'ed to www.example.com/nl But now www.example.com is ranking instead of www.example.com/nl
Should is block www.example.com in robots.txt so only the subfolders are being ranked?
Or will i lose my ranking by doing this.0 -
Does http://my.dudamobile.com/ Effect SEO
Hi, Hope everyone is enjoying the new year! I was wondering if converting your desk top website to a mobile one, example via http://my.dudamobile.com/, has any negative effects on SEO. Did it effect your site? Do you recommend doing it? Does it effect links? When people link to your desk top URL does that authority carry to the mobile, or would it be better if they link to the mobile (m.website.com) URL? Is http://my.dudamobile.com/ a good choice? Any feedback, as always, is greatly appreciated! Thanks Jimmy
Technical SEO | | jimmy02250 -
Backlinks Indexing
Is there a way of indexing my backlinks?? I have a lot backlinks but Google can't find them
Technical SEO | | CodePlus0 -
Restricted by robots.txt does this cause problems?
I have restricted around 1,500 links which are links to retailers website and links that affiliate links accorsing to webmaster tools Is this the right approach as I thought it would affect the link juice? or should I take the no follow out of the restricted by robots.txt file
Technical SEO | | ocelot0