Why isn't our new site being indexed?
-
We built a new website for a client recently.
Site: https://www.woofadvisor.com/
It's been live for three weeks. Robots.txt isn't blocking Googlebot or anything.
Submitted a sitemap.xml through Webmasters but we still aren't being indexed.
Anyone have any ideas?
-
Hey Dirk,
No worries - I visited the question first time today and considered it unanswered as the site is perfectly accessible in California. I like to confirm what Search Console says as that is 'straight from the horses mouth'.
Thanks for confirming that the IP redirect has changed, that is interesting. It is impossible for us to know when that happened - I would have expected thing to get indexed quite fast when it changed.
With the extra info I'm happy to mark this as answered, but would be good to hear from the OP.
Best,
-Tom
-
Hi Tom,
I am not questioning your knowledge - I re-ran the test on webpagetest.org and I see that the site is now accessible for Californian ip (http://www.webpagetest.org/result/150911_6V_14J6/) which wasn't the case a few days ago (check the result on http://www.webpagetest.org/result/150907_G1_TE9/) - so there has been a change on the ip redirection. I also checked from Belgium - the site is now also accessible from here.
I also notice that if I now do a site:woofadvisor.com in Google I get 19 pages indexed rather than 2 I got a few days ago.
Apparently removing the ip redirection solved (or is solving) the indexation issue - but still this question remains marked as "unanswered"
rgds,
Dirk
-
I am in California right now, and can access the website just fine, which is why I didn't mark the question as answered - I don't think we have enough info yet. I think the 'fetch as googlebot' will help us resolve that.
You are correct that if there is no robots.txt then Google assumes the site is open, but my concern is that the developers on the team say that there IS a robots.txt file there and it has some contents. I have, on at least two occasions, come across a team that was serving a robots.txt that was only accessible to search bots (once they were doing that 'for security', another time because they mis-understood how it worked). That is why I suggested that Search Console is checked to see what shows up for robots.txt.
-
To be very honest - I am quite surprised that this question is still marked as "Unanswered".
The owners of the site decided to block access for all non UK / Ireland adresses. The main Googlebot is using a Californian ip address to visit the site. Hence - the only page Googlebot can see is https://www.woofadvisor.com/holding-page.php which has no links to the other parts of the site (this is confirmed by the webpagetest.org test with Californian ip address)
As Google indicates - Googlebot can also use other IP adresses to crawl the site ("With geo-distributed crawling, Googlebot can now use IP addresses that appear to come from other countries, such as Australia.") - however it's is very likely that these bots do not crawl with the same frequency/depth as the main bot (the article clearly indicates " Google might not crawl, index, or rank all of your locale-adaptive content. This is because the default IP addresses of the Googlebot crawler appear to be based in the USA).
This can easily be solved by adding a link on /holding-page.php to the Irish/UK version which contains the full content (accessible for all ip adresses) which can be followed to index the full site (so - only put the ip detection on the homepage - not on the other pages)
The fact that the robots.txt gives a 404 is not relevant: if no robots.txt is found Google assumes that the site can be indexed (check this link) - quote: "You only need a
robots.txt
file if your site includes content that you don't want Google or other search engines to index." -
I'd be concerned about the 404ing robots.txt file.
You should check in Search Console:
-
What does Search Console show in the robots.txt section?
-
What happens if you fetch a page that is no indexed (e.g. https://www.woofadvisor.com/travel-tips.php) with the 'Fetch as Googlebot' tool?
I checked and do not see any obvious indicators of why the pages are not being indexed - we need more info.
-
-
I just did a quick check on your site with Webpagetest.org with California IP address http://www.webpagetest.org/result/150907_G1_TE9/ - as you can see here these IP's also go to the holding page - which is logically the only page which can be indexed as it's the only one Googlebot can access.
rgds,
Dirk
-
Hi,
I can't access your site in Belgium - I guess you are redirecting your users based on ip address. If , like me, they are not located in your target country they are 302 redirected to https://www.woofadvisor.com/holding-page.php and there is only 1 page that is indexed.
Not sure which country you are actually targeting - but could it be that you're accidentally redirecting Google bot as well?
Check also this article from Google on ip based targeting.
rgds
Dirk
-
Strangely, there are two pages indexed on Google Search.
The homepage and one other
-
I noticed the robots.txt file returned a 404 and asked the developers to take a look and they said the content of it is fine.
Sometimes developers say this stuff. If you are getting a 404, demonstrate it to them.
-
I noticed the robots.txt file returned a 404 and asked the developers to take a look and they said the content of it is fine.
But yes, I'll doublecheck the WordPress settings now.
-
Your sitemap all looked good, but when I tried to view the robots.txt file in your root, it returned a 404 and so was unable to determine if there was an issue. Could any of your settings in your WordPress installation also be causing it to trip over.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How ask Google to de index scrapper sites?
While doing text Google searches for various keywords I have found two sites that have scrapped pages from my site which goes by an old URL of www.tpxcnex.com and a new URL of www.tpxonline.com www.folder.com is one of the sites and if you try to visit that site or any of the scrapped Google index listing, Chrome warns you not to. How can I ask Chrome to deindex www.folder.com or another scrapper site, or atleast deindex the URLs which have clearly scrapped my content?
Technical SEO | | DougHartline0 -
Redirecting a single page on a separate domain to a new site?
My client started a subdivision of their company, along with a new website. There was already an individual page about the new product/topic on the main site, but recognizing a growth area they wanted to devote an entire site to the product/topic. Can we/should we redirect that page on the old corporate/main site to the new domain, or just place a link or two? Thoughts?
Technical SEO | | VTDesignWorks0 -
New Page Showing Up On My Reports w/o Page Title, Words, etc - However, I didn't create it
I have a WordPress site and I was doing a crawl for errors and it is now showing up as of today that this page : https://thinkbiglearnsmart.com/event-registration/?event_id=551&name_of_event=HTML5 CSS3 is new and has no page title, words, etc. I am not even sure where this page or URL came from. I was messing with the robots.txt file to allow some /category/ posts that were being hidden, but I didn't re-allow anything with the above appendages. I just want to make sure that I didn't screw something up that is now going to impact my rankings - this was just a really odd message to come up as I didn't create this page recently - and that shouldnt even be a page accessible to the public. When I edit the page - it is using an Event Espresso (WordPress plugin) shortcode - and I don't want to noindex this page as it is all of my events. Sorry this post is confusing, any help or insight would be appreciated! I am also interested in hiring someone for some hourly consulting work on SEO type issues if anyone has any references. Thank you!
Technical SEO | | webbmason0 -
The use of tabs on productpages, do or don't?
Does google has any trouble reading content tabs? The content is not loaded by ajax and is already in the page source code.
Technical SEO | | wilcoXXL
As i'm checking some big e-commerce websites or (amazon.com for example) they get rid of the tabs with content and put the different content below eachother. Is his better for SEO purpose? But what about user experience? For users it think it is easier to navigate by tabs then to have a long page to scroll. What do you guys think about this issue?0 -
301 Multiple Sites to Main Site
Over the past couple years I had 3 sites that sold basically the same products and content. I later realized this had no value to my customers or Google so I 301 redirected Site 2 and Site 3 to my main site (Site 1). Of course this pushed a lot of page rank over to Site 1 and the site has been ranking great. About a week ago I moved my main site to a new eCommerce platform which required me to 301 redirect all the url's to the new platform url's which I did for all the main site links (Site 1). During this time I decided it was probably better off if I DID NOT 301 redirect all the links from the other 2 sites as well. I just didn't see the need as I figured Google realized at this point those sites were gone and I started fearing Google would get me for Page Rank munipulation for 301 redirecting 2 whole sites to my main site. Now I am getting over 1,000 404 crawl errors in GWT as Google can no longer find the URL's for Site 2 and Site 3. Plus my rankings have dropped substantially over the past week, part of which I know is from switching platforms. Question, did I make a mistake not 301 redirecting the url's from the old sites (Site 2 and Site 3) to my new ecommerce url's at Site 1?
Technical SEO | | SLINC0 -
Will SEO Moz index our keywords if the site is ALL https?
We have a site coming into beta next week. Playing around with SEO Moz, I had trouble getting the keywords to rank at all. Was this because the site is entirely https? If yes, what else can SEO Moz NOT do if the site is all https? Thanks!
Technical SEO | | OTSEO0 -
Do index.php extensions count as duplicate content on Joomla sites?
When i run my error report, i see 2 duplicate pages, but both are the main domain and then the /index.php extension. how do i fix this? does it really count as duplicate content?
Technical SEO | | valetseo0 -
Pros & Cons of deindexing a site prior to launch of a new site on the same domain.
If you were launching a new website to completely replace an older existing site on the same domain, would there be any value in temporarily deindexing the old site prior to launching the new site? Both have roughly 3000 pages, will launch on the same domain but have a completely new url structure and much better optimized for the web. Many high ranking pages will be redirected with 301 to the corresponding new page. I believe the hypothesis is this would eliminate a mix of old & new pages from sharing space in the serps and the crawlers are more likely to index more of the new site initially. I don't believe this is a great strategy, on the other hand I see some merit to the arguments for it.
Technical SEO | | medtouch0