Is it possible that Google may have erroneous indexing dates?
-
I am consulting someone for a problem related to copied content. Both sites in question are WordPress (self hosted) sites. The "good" site publishes a post. The "bad" site copies the post (without even removing all internal links to the "good" site) a few days after.
On both websites it is obvious the publishing date of the posts, and it is clear that the "bad" site publishes the posts days later. The content thief doesn't even bother to fake the publishing date.
The owner of the "good" site wants to have all the proofs needed before acting against the content thief. So I suggested him to also check in Google the dates the various pages were indexed using Search Tools -> Custom Range in order to have the indexing date displayed next to the search results.
For all of the copied pages the indexing dates also prove the "bad" site published the content days after the "good" site, but there are 2 exceptions for the very 2 first posts copied.
First post:
On the "good" website it was published on 30 January 2013
On the "bad" website it was published on 26 February 2013
In Google search both show up indexed on 30 January 2013!Second post:
On the "good" website it was published on 20 March 2013
On the "bad" website it was published on 10 May 2013
In Google search both show up indexed on 20 March 2013!Is it possible to be an error in the date shown in Google search results?
I also asked for help on Google Webmaster forums but there the discussion shifted to "who copied the content" and "file a DMCA complain". So I want to be sure my question is better understood here.
It is not about who published the content first or how to take down the copied content, I am just asking if anybody else noticed this strange thing with Google indexing dates.How is it possible for Google search results to display an indexing date previous to the date the article copy was published and exactly the same date that the original article was published and indexed?
-
Thanks Doug. Really an eye-opener.
-
Thanks Doug for your response. It really cleared up the questions I had about that date Google shows next to the search results.
I was not able to find official details about it, all I was able to find was different referencing as the indexing date of a page.
But I knoew here in the MOZ community there are people who really know things, that's why I asked.
So that date is just Google's estimation of the publishing date, not the date Google indexed the content!
Thanks again for taking the time to answer me!
-
Hiya Sorina,
When you use the custom date range, Google isn't listing results based on the date they were indexed. Google is using an estimated publication date.
Google tries to estimate the the publication date based on meta-data and other features of the page such as dates in the content, title and URL. The date Google first indexed the page is just one of the things that Google can use to estimate the publication date.
I also suspect that dates in any sitemap.xml files will also be taken into consideration.
But, given that even Google can't guarantee that it'll crawl and index articles on the day they've been published the crawl data may not be an accurate estimate.
Also, if the scraped content is being re-published with intact internal links (are these the full URL - do you they resolve to your original website?) then it's pretty obvious where the content came from.
Hope this help answer your question.
-
Hi Sorina,
I can tell you that the index dates shown by Google are accurate but is not the case with the Cache date sometimes as the date shown in the Cache and the copy shown in the cache don't match many times but the index dates are accurate. Send me a private message with the actual URLs under discussion and I will be able to comment with more clarity.
Best,
Devanur Rafi
-
Thank you for your response Devanur Rafi, but the "good" site doesn't have problems getting indexed.
Actually all posts on the "good" site are indexed the very same day they are published.My question was more about the indexing date shown in Google search results
How come, for a post from the "bad" site, Google is displaying an indexing date previous to the actual date the post was published on that site?!
And how come this date is exactly the same as the date Google says it indexed the post from the "good" site?
-
Hi Sorina,
This is a common thing and it all depends on a site's crawlability (how easy is it to crawl for the bot) and crawl frequency for that site. Google would have picked up that post first on the bad site and then from the good site. However, just because one or two posts were picked up late does not mean that the good site is not crawler friendly. It also depends on how far the resource is from the root. Let us take an example:
A page on a good site: abc.com/folder1/folder2/folder3/page.html
Now a bad site copies that page: xyz.com/page.html
In this case, Google might first pickup the copied page from the bad site as it is just a click away from the root which is not the case with the good site where the page is nested deep inside multiple folders.
You can also give the way back machine (archive.org) a try to find which website published the post first. Sometimes this might work out pretty well. You can also try to look at the cache dates of the posts on both the sites in Google to get some info in this regard.
Hope those help. I wish you good luck.
Best,
Devanur Rafi.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How do you get a url to show as a tagline in google mobile search?
When searching in google via mobile, I am seeing urls changed to taglines. I have attached pictures that show the url in a web search, but a tag line from the mobile search. Does anyone know how to get a tagline to show in place of a url in a mobile search? Any advice would be appreciated! uLkYWRx.png wljXRI3.png
Algorithm Updates | | David-Kley0 -
404s in Google Search Console and javascript
The end of April, we made the switch from http to https and I was prepared for a surge in crawl errors while Google sorted out our site. However, I wasn't prepared for the surge in impossibly incorrect URLs and partial URLs that I've seen since then. I have learned that as Googlebot grows up, he'she's now attempting to read more javascript and will occasionally try to parse out and "read" a URL in a string of javascript code where no URL is actually present. So, I've "marked as fixed" hundreds of bits like /TRo39,
Algorithm Updates | | LizMicik
category/cig
etc., etc.... But they are also returning hundreds of otherwise correct URLs with a .html extension when our CMS system generates URLs with a .uts extension like this: https://www.thompsoncigar.com/thumbnail/CIGARS/90-RATED-CIGARS/FULL-CIGARS/9012/c/9007/pc/8335.html
when it should be:
https://www.thompsoncigar.com/thumbnail/CIGARS/90-RATED-CIGARS/FULL-CIGARS/9012/c/9007/pc/8335.uts Worst of all, when I look at them in GSC and check the "linked from" tab it shows they are linked from themselves, so I can't backtrack and find a common source of the error. Is anyone else experiencing this? Got any suggestions on how to stop it from happening in the future? Last month it was 50 URLs, this month 150, so I can't keep creating redirects and hoping it goes away. Thanks for any and all suggestions!
Liz Micik0 -
Trafic drop after a huge indexation
Hello everyone, My website used to have about 500k indexed pages in Google. After publishing fresh sitemaps and a little local "buzz", it now has about 6 millions indexed pages and the numbers are skyrocketing (GWT says 7 millions and it will probably keep going). My website has a total number of pages of 10 millions. I used to have about 5k organic visite each day, but since the big indexation has started, I now have half less. I read many things about that kind of trafic drop, and it seems to be a normal step when indexing a huge site. I just wanted to know if you guys had any similar experiences and if yes, if there are specific tasks to do in order to recover/develop the organic trafic or if it's just a matter of time. Thanks for your help and share of experiences 😉
Algorithm Updates | | Pureshore0 -
De-indexed homepage in Google - very confusing.
A website I provide content for has just suffered a de-indexed homepage in Google (not in any of the other search engines) - all the other pages remained indexed as usual. Client asked me what might be the problem and I just couldn't figure it out - no linkbuilding has ever been carried out so clean backlink profile, etc. I just resubmitted it and it's back in its usual place, and has maintained the rankings (and PR) it had before it disappeared a few days ago. I checked WMT and no warnings or issues there. Any idea why this might've happened?
Algorithm Updates | | McTaggart0 -
Do you think Google is destroying search?
I've seen garbage in google results for some time now, but it seems to be getting worse. I was just searching for a line of text that was in one of our stories from 2009. I just wanted to check that story and I didn't have a direct link. So I did the search and I found one copy of the story, but it wasn't on our site. I knew that it was on the other site as well as ours, because the writer writes for both publications. What I expected to see was the two results, one above the other, depending on which one had more links or better on-page for the query. What I got didn't really surprise me, but I was annoyed. In #1 position was the other site, That was OK by me, but ours wasn't there at all. I'm almost used to that now (not happy about it and trying to change it, but not doing well at all, even after 18 months of trying) What really made me angry was the garbage results that followed. One site, a wordpress blog, has tag pages and category pages being indexed. I didn't count them all but my guess is about 200 results from this blog, one after the other, most of them tag pages, with the same content on every one of them. Then the tag pages stopped and it started with dated archive pages, dozens of them. There were other sites, some with just one entry, some with dozens of tag pages. After that, porn sites, hundreds of them. I got right to the very end - 100 pages of 10 results per page. That blog seems to have done everything wrong, yet it has interesting stats. It is a PR6, yet Alexa ranks it 25,680,321. It has the same text in every headline. Most of the headlines are very short. It has all of the category and tag and archive pages indexed. There is a link to the designer's website on every page. There is a blogroll on every page, with links out to 50 sites. None of the pages appear to have a description. there are dozens of empty H2 tags and the H1 tag is 80% through the document. Yet google lists all of this stuff in the results. I don't remember the last time I saw 100 pages of results, it hasn't happened in a very long time. Is this something new that google is doing? What about the multiple tag and category pages in results - Is this just a special thing google is doing to upset me or are you seeing it too? I did eventually find my page, but not in that list. I found it by using site:mysite.com in the search box.
Algorithm Updates | | loopyal0 -
Google Update on the 6th July
Hi Mozzers, Has anyone noticed a Google update on the 6th July? A price comparison site I optimise has fallen off the SERPs for most generic terms, however still getting traffic for longer tail phrases. Cheers Aran
Algorithm Updates | | Entrusteddev0 -
Do we have a timeline of google, bing updates
I thought it would be handy if we had a timeline with dates of any updates to the algo's.
Algorithm Updates | | AlanMosley
Does one exists here at SEOMoz or elsewhere.
Thanks3 -
If Google turns down the weight of keywords in domains then what will they be turning up?
Per Matt Cutts video "We will be turning that keyword in domain down." http://youtu.be/rAWFv43qubI So what will they be turning up?
Algorithm Updates | | Thos0030