Steps you can take to ensure your content is indexed and registered to your site before a scraper gets to it?
-
Hi,
A clients site has significant amounts of original content that has blatantly been copied and pasted in various other competitor and article sites.
I'm working with the client to rejig lots of this content and to publish new content.
What steps would you recommend to undertake when the new, updated site is launched to ensure Google clearly attributes the content to the clients site first?
One thing I will be doing is submitting a new xml + html sitemap.
Thankyou
-
There are no "best practices" established for the tags' usage at this point. On the one hand, it could technically be used for every page, and on the other, should only be used when it's an article, blog post, or other individual person's writing.
-
Thanks Alan.
Guess there's no magic trick that will give you 100% attribution.
Regarding this tag, do you recommend I add this to EVERY page of the clients website including the homepage? So even the usual about us/contact etc pages?
Cheers
Hash
-
Google continually tries to find new ways to encourage solutions for helping them understand intent, relevance, ownership and authority. It's why Schema.org finally hit this year. None of their previous attempts have been good enough, and each has served a specific individual purpose.
So with Schema, the theory is there's a new, unified framework that can grow and evolve, without having to come up with individual solutions.
The "original source" concept was supposed to address the scraper issue, and there's been some value in that, though it's far from perfect. A good scraper script can find it, strip it out or replace the contents.
rel="author" is yet one more thing that can be used in the overall mix, though Schema.org takes authorship and publisher identity to a whole new, complex, and so far confused level :-).
Since Schema.org is most likely not going to be widely adopted til at least early next year, Google's encouraging use of the rel="author" tag as the primary method for assigning authorship at this point, and will continue to support it even as Schema rolls out.
So if you're looking at a best practices solution, yes, rel="author" is advisable. Until it's not.
-
Thanks Alan... I am surprised to learn about this "original source" information. There must not have been a lot of talk about it when it was released or I would have seen it.
Google recently started encouraging people to use the rel="author" attribute. I am going to use that on my site... now I am wondering if I should be using "original source" too.
Are you recommending rel="author"?
Also, reading that full post there is a section added at the end recommending rel="canonical"
-
Always have a sitemap.xml file with all the URLs you want indexed included in it. Right after publishing, submit the sitemap.xml file (or files if there are tens of thousands of pages) through Google Webmaster Tools and Bing Webmaster Tools. Include the Meta "original-source" tag in your page headers.
Include a Copyright line at the bottom of each page with the site or company name, and have that link to the home page.
This does not guarantee with 100% certainty that you'll get proper attribution, however these are the best steps you can take in that regard.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How can a site rank high without any visible SEO tactics?
Hi SEO specialists, I don't know if this is the right place to ask this, but there is something bugging me for awhile now, so I thought let's give it a try. I work for several clients and one of them works in the Psychic business. Since a few months we see that site from Psychicvop.com is taking over all kinds of top rankings in Google. The strange thing is that I can't detect anything that I always learned which is basic material for SEO. There isn't much content and there link profile isn't really spectacular either. Does any of you have an idea what it is about this sites that makes them outrank every other site in the business?
Intermediate & Advanced SEO | | LiveFootballTickets0 -
What is 508 compliance and how do I ensure it gets done?
Greetings Mozzers, I'm completely new to 508 compliance and I hadn't really heard of it until yesterday. It came up in a conversation about W3C compliance (which I know Google doesn't necessarily validate for), but I hadn't ever heard of 508. So question is, what is 508 and what needs to be done on the back end to become compliant here or to check for compliance. I do know that 508 refers to government regulation to make your website accessible to all. Thanks for helping out on a rookie question 🙂 Cheers,
Intermediate & Advanced SEO | | CSawatzky
Pedram0 -
Google isn't seeing the content but it is still indexing the webpage
When I fetch my website page using GWT this is what I receive. HTTP/1.1 301 Moved Permanently
Intermediate & Advanced SEO | | jacobfy
X-Pantheon-Styx-Hostname: styx1560bba9.chios.panth.io
server: nginx
content-type: text/html
location: https://www.inscopix.com/
x-pantheon-endpoint: 4ac0249e-9a7a-4fd6-81fc-a7170812c4d6
Cache-Control: public, max-age=86400
Content-Length: 0
Accept-Ranges: bytes
Date: Fri, 14 Mar 2014 16:29:38 GMT
X-Varnish: 2640682369 2640432361
Age: 326
Via: 1.1 varnish
Connection: keep-alive What I used to get is this: HTTP/1.1 200 OK
Date: Thu, 11 Apr 2013 16:00:24 GMT
Server: Apache/2.2.23 (Amazon)
X-Powered-By: PHP/5.3.18
Expires: Sun, 19 Nov 1978 05:00:00 GMT
Last-Modified: Thu, 11 Apr 2013 16:00:24 +0000
Cache-Control: no-cache, must-revalidate, post-check=0, pre-check=0
ETag: "1365696024"
Content-Language: en
Link: ; rel="canonical",; rel="shortlink"
X-Generator: Drupal 7 (http://drupal.org)
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8 xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:dc="http://purl.org/dc/terms/"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:og="http://ogp.me/ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:sioc="http://rdfs.org/sioc/ns#"
xmlns:sioct="http://rdfs.org/sioc/types#"
xmlns:skos="http://www.w3.org/2004/02/skos/core#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"> <title>Inscopix | In vivo rodent brain imaging</title>0 -
Is this keyword easy to rank? how can I estimate what kind of work it takes.
I am Using Micro Niche Finder KeyWord: Send SMS online domain available: .org Avg. Monthly searches: 1600 Adword cost : 2.12 Measure of Back link: 633 SOC: 20 (green) Most of the Back link using Majestic SEO appear to be coming from sendhub.com and massmailsoftware.com website. I researched sites with these keywords and those sites appear to have tiered pricing plans and appear to be charging money for their services. My plan, put some effort in organic SEO and rank this site to page 1. Get some optin users and start communicating with them to see what is their pain point. How can I tell on what it takes to rank this page to page 1 on google? Would it take lot of blog articles for back link and if so… how many? Is a rough estimate possible. Thanks.
Intermediate & Advanced SEO | | zsyed0 -
How to get content to index faster in Google.....pubsubhubbub?
I'm curious to know what tools others are using to get their content to index faster (other than html sitmap and pingomatic, twitter, etc) Would installing the wordpress pubsubhubbub plugin help even though it uses pingomatic? http://wordpress.org/extend/plugins/pubsubhubbub/
Intermediate & Advanced SEO | | webestate0 -
Sites with dynamic content - GWT redirects and deletions
We have a site that has extremely dynamic content. Every day they publish around 15 news flashes, each of which is setup as a distinct page with around 500 words. File structure is bluewidget.com/news/long-news-article-name. No timestamp in URL. After a year, that's a lot of news flashes. The database was getting inefficient (it's managed by a ColdFusion CMS) so we started automatically physically deleting news flashes from the database, which sped things up. The problem is that Google Webmaster Tools is detecting the freshly deleted pages and reporting large numbers of 404 pages. There are so many 404s that it's hard to see the non-news 404s, and I understand it would be a negative quality indicator to Google having that many missing pages. We were toying with setting up redirects, but the volume of redirects would be so large that it would slow the site down again to load a large htaccess file for each page. Because there isn't a datestamp in the URL we couldn't create a mask in the htaccess file automatically redirecting all bluewidget.com/news/yymm* to bluewidget.com/news These long tail pages do send traffic, but for speed we only want to keep the last month of news flashes at the most. What would you do to avoid Google thinking its a poorly maintained site?
Intermediate & Advanced SEO | | ozgeekmum0 -
If you host a video on another site will you get credit for it?
If we host the video on a third party player & use their player (such as Brightcove), will we get credit for the video (will it show in SERPs on our domain? Or, do we need to host it on our own site?
Intermediate & Advanced SEO | | nicole.healthline0 -
Most Painless way of getting Duff Pages out of SE's Index
Hi, I've had a few issues that have been caused by our developers on our website. Basically we have a pretty complex method of automatically generating URL's and web pages on our website, and they have stuffed up the URL's at some point and managed to get 10's of thousands of duff URL's and pages indexed by the search engines. I've now got to get these pages out of the SE's indexes as painlessly as possible as I think they are causing a Panda penalty. All these URL's have an addition directory level in them called "home" which should not be there, so I have: www.mysite.com/home/page123 instead of the correct URL www.mysite.com/page123 All these are totally duff URL's with no links going to them, so I'm gaining nothing by 301 redirects, so I was wondering if there was a more painless less risky way of getting them all out the indexes (IE after the stuff up by our developers in the first place I'm wary of letting them loose on 301 redirects incase they cause another issue!) Thanks
Intermediate & Advanced SEO | | James770