How Google Carwler Cached Orphan pages and directory?
-
I have website www.test.com
I have made some changes in live website and upload it to "demo" directory (which is recently created) for client approval.
Now, my demo link will be www.test.com/demo/
I am not doing any type of link building or any activity which pass referral link to www.test.com/demo/
Then how Google crawler find it and cached some pages or entire directory?
Thanks
-
Try putting the URL into Google and see if you find any pages linking to it.
I knew a company that created a test site that was a copy of a live site (made with a specific hosted CMS). Didn't exclude the test site in robots because "we all know we won't link to it so it'll be ok". Site got indexed, and it was because a person at the company was having problems with the implementation of the test site, went to the help forum (which person didn't think would be indexed) and posted the URL to the test site.
I found the above by just putting in the URL of the test site into Google, and I saw the post in the help desk. You might try the same to see if somehow there is a rogue link.
-
Is google crawling our mails?
Is it possible?
-
Yup, correct.
I was certain I'd replied to this
Anyway, you ever notice how the ads in gmail are always relevant to the content of your emails? Google are totally reading them
-
The <conspiracy hat="">side of things was him commenting that Google is sometimes accused of processing everything in Gmail and could have possibly pulled your link to the demo directory from that.</conspiracy>
-
Hi Barry,
Yes, We were used Gmail for reporting.
Is it make any sense??
-
<conspiracy-hat></conspiracy-hat>
Did either you or your client use gmail when you sent him the demo link?
Regardless, Dan's advice to noindex and block the directory from spiders is the future when doing development work.
-
Hi JoelHit,
NO, There is not any single refferal link to "Demo" directory from entire website and also from third party websites.
I am aware about Google Crawling and Indexing Systems.
Thanks.
-
Hi Thetjo,
I know about it.
My question is that how Google Crawl it without any referral link?
Thanks.
-
Hi Dan,
No, i am not exclude "demo" directory from robots.txt for any search engine.
I am not using wordpress its simple stattic HTML website (Not using any type of CMS).
-
Did this actually happen or are we talking about a hypothetical situation here? It could be that there is a link to the demo directory you've overlooked? Has the /demo folder perhaps been used in the past and there were still old links to it?
As a meta-solution to this problem: prevent crawlers and nosy people from accessing the content by adding a .htpasswd login to the area used for client approval.
-
Did you block the /demo/ directory in your robots.txt file? This is step number one to try and ensure they don't get crawled. Also, are you using wordpress? If so, wordpress automatically pings search engines when you add a post and if you use the common sitemap plugin, when it creates the sitemap it submits it automatically to Google, so that's another way Google could have found it.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Best way to link to 1000 city landing pages from index page in a way that google follows/crawls these links (without building country pages)?
Currently we have direct links to the top 100 country and city landing pages on our index page of the root domain.
Intermediate & Advanced SEO | | lcourse
I would like to add in the index page for each country a link "more cities" which then loads dynamically (without reloading the page and without redirecting to another page) a list with links to all cities in this country.
I do not want to dillute "link juice" to my top 100 country and city landing pages on the index page.
I would still like google to be able to crawl and follow these links to cities that I load dynamically later. In this particular case typical site hiearchy of country pages with links to all cities is not an option. Any recommendations on how best to implement?0 -
Google Indexing & Caching Some Other Domain In Place of Mine-Lost Ranking -Sucuri.net Found nothing
Again I am facing same Problem with another wordpress blog. Google has suddenly started to Cache a different domain in place of mine & caching my domain in place of that domain. Here is an example page of my site which is wrongly cached on google, same thing happening with many other pages as well - http://goo.gl/57uluq That duplicate site ( protestage.xyz) is showing fully copied from my client's site but showing all pages as 404 now but on google cache its showing my sites. site:protestage.xyz showing all pages of my site only but when we try to open any page its showing 404 error My site has been scanned by sucuri.net Senior Support for any malware & there is none, they scanned all files, database etc & there is no malware found on my site. As per Sucuri.net Senior Support It's a known Google bug. Sometimes they incorrectly identify the original and the duplicate URLs, which results in messed ranking and query results. As you can see, the "protestage.xyz" site was hacked, not yours. And the hackers created "copies" of your pages on that hacked site. And this is why they do it - the "copy" (doorway) redirects websearchers to a third-party site [http://www.unmaskparasites.com/security-report/?page=protestage.xyz](http://www.unmaskparasites.com/security-report/?page=protestage.xyz) It was not the only site they hacked, so they placed many links to that "copy" from other sites. As a result Google desided that that copy might actually be the original, not the duplicate. So they basically hijacked some of your pages in search results for some queries that don't include your site domain. Nonetheless your site still does quite well and outperform the spammers. For example in this query: [https://www.google.com/search?q=](https://www.google.com/search?q=)%22We+offer+personalized+sweatshirts%22%2C+every+bride#q=%22GenF20+Plus+Review+Worth+Reading+If+You+are+Planning+to+Buy+It%22 But overall, I think both the Google bug and the spammy duplicates have the negative effect on your site. We see such hacks every now and then (both sides: the hacked sites and the copied sites) and here's what you can do in this situation: It's not a hack of your site, so you should focus on preventing copying the pages: 1\. Contact the protestage.xyz site and tell them that their site is hacked and that and show the hacked pages. [https://www.google.com/search?q=](https://www.google.com/search?q=)%22We+offer+personalized+sweatshirts%22%2C+every+bride#q=%22GenF20+Plus+Review+Worth+Reading+If+You+are+Planning+to+Buy+It%22 Hopefully they clean their site up and your site will have the unique content again. Here's their email [email protected] 2\. You might want to send one more complain to their hosting provider (OVH.NET) [email protected], and explain that the site they host stole content from your site (show the evidence) and that you suspect the the site is hacked. 3\. Try blocking IPs of the Aruba hosting (real visitors don't use server IPs) on your site. This well prevent that site from copying your site content (if they do it via a script on the same server). I currently see that sites using these two IP address: 149.202.120.102\. I think it would be safe to block anything that begins with 149.202 This .htaccess snippet should help (you might want to test it) #-------------- Order Deny,Allow Deny from 149.202.120.102 #-------------- 4\. Use rel=canonical to tell Google that your pages are the original ones. [https://support.google.com/webmasters/answer/139066?hl=en](https://support.google.com/webmasters/answer/139066?hl=en) It won't help much if the hackers still copy your pages because they usually replace your rel=canonical with their, so Google can' decide which one is real. But without the rel=canonical, hackers have more chances to hijack your search results especially if they use rel=canonical and you don't. I should admit that this process may be quite long. Google will not return your previous ranking overnight even if you manage to shut down the malicious copies of your pages on the hacked site. Their indexes would still have some mixed signals (side effects of the black hat SEO campaign) and it may take weeks before things normalize. The same thing is correct for the opposite situation. The traffic wasn't lost right after hackers created the duplicates on other sites. The effect build up with time as Google collects more and more signals. Plus sometimes they run scheduled spam/duplicate cleanups of their index. It's really hard to tell what was the last drop since we don't have access to Google internals. However, in practice, if you see some significant changes in Google search results, it's not because of something you just did. In most cases, it's because of something that Google observed for some period of time. Kindly help me if we can actually do anything to get the site indexed properly again, PS it happened with this site earlier as well & that time I had to change Domain to get rid of this problem after I could not find any solution after months & now it happened again. Looking forward for possible solution Ankit
Intermediate & Advanced SEO | | killthebillion0 -
Google W.M.T Missing Meta Title on AJax Pages... Weird!!
Hey Mozers, I was looking through my Google Web Masters Tool under HTML Improvements. It looks like I have 2,200 pages missing Meta Titles and I was about to lose it thinking HOW COULD THIS HAPPEN! I came to realize that the pages were "Ajax Pages". This is specifically a checkprice pop up and I dont want this page crawled by google. It looks like to google I have over 2k pages missing Meta Titles and they are all "check price pop ups". How would you suggest I block this. I thought about going the easy route and removing the subfolder and putting it in the Robots.txt document and I'm scared of that because we use AJax for a bunch of calls. I'm also scared of putting in the head <metaname="robots" =="" noindex,nofollow"="">because it requires hard coding</metaname="robots"> I Know i'm not the first to come across this issue, Any Ideas??
Intermediate & Advanced SEO | | rpaiva0 -
Home Page or Internal Page
I have a website that deals with personalized jewelry, and our main keyword is "Name Necklace".
Intermediate & Advanced SEO | | Tiedemann_Anselm
3 mounth ago i added new page: http://www.onecklace.com/name-necklaces/ And from then google index only this page for my main keyword, and not our home page.
Beacuase the page is new, and we didn't have a lot of link to it, our rank is not so well. I'm considering to remove this page (301 to home page), beacause i think that if google index our home page for this keyword it will be better. I'm not sure if this is a good idea, but i know that our home page have a lot of good links and maybe our rank will be higher. Another thing, because google index this internal page for this keyword, it looks like our home page have no main keyword at all. BTW, before i add this page, google index our main page with this keyword. Please advise... U5S8gyS.png j50XHl4.png0 -
Google + pages and SEO results...
Hi, Can anyone give me insight into how people are getting away with naming their business by the SEO search term, creating a BS Google + page, then having that page rank high in the search results. I am speaking specifically about the results you get when you Google: "Los Angeles DUI Lawyer". As you can see from my attached screenshot (I'm doing the search in Los Angeles), the FIRST listing is a Google + business. Strangely, the phone number listed doesn't actually take you to a DUI attorney, but rather to some marketing group that never answers the phone. Can anyone give me insight into why Google even allows this? I just find it odd that Google cares so much about the user experience, but have the first result be something completely misleading. I know it sounds like I'm just jealous (which I am, a little), but I find it disheartening that we work so hard on SEO, and someone takes the top spot with an obvious BS page. UupqBU9
Intermediate & Advanced SEO | | mrodriguez14400 -
Page Indexed but not Cached
A section of pages on my site are indexed (I know because they appear in SERPs if I copy and paste a sentence from the content), however according to the text-only cached version of the page they are not being read by Google.Why are they indexed event hough it seems like Google is not reading them..... or is Google in fact reading this text even though it seems like they should not be?Thanks for your assistance.
Intermediate & Advanced SEO | | theLotter0 -
Meeting Google's needs 100% with dynamic pages
We have bought into a really powerful search, very exciting We can define really detailed product based 'landing pages' by creating a search that pulles on required attributeseghttp://www.OURDOMAIN.com//search/index.php?sortprice=asc&followSearch=9673&q=red+coats+short-length Pop that in a link Short Red Coats on a previous page and wonderful, that gives a page of short red coats in price ascending order, one happy consumer, straight to a page that meets their needs Question 1 however unhappy Google right? Question 2 can we meet Google's needs 100% with a redirect permanent in an .htaccess file E.G redirect permanent /short-red-coats/ http://www.OURDOMAIN.com//search/index.php?sortprice=asc&followSearch=9673&q=red+coats+short-length
Intermediate & Advanced SEO | | GeezerG
Many thanks
CB0 -
301 - should I redirect entire domain or page for page?
Hi, We recently enabled a 301 on our domain from our old website to our new website. On the advice of fellow mozzer's we copied the old site exactly to the new domain, then did the 301 so that the sites are identical. Question is, should we be doing the 301 as a whole domain redirect, i.e. www.oldsite.com is now > www.newsite.com, or individually setting each page, i.e. www.oldsite.com/page1 is now www.newsite.com/page1 etc for each page in our site? Remembering that both old and new sites (for now) are identical copies. Also we set the 301 about 5 days ago and have verified its working but haven't seen a single change in rank either from the old site or new - is this because Google hasn't likely re-indexed yet? Thanks, Anthony
Intermediate & Advanced SEO | | Grenadi0