Steps you can take to ensure your content is indexed and registered to your site before a scraper gets to it?
-
Hi,
A clients site has significant amounts of original content that has blatantly been copied and pasted in various other competitor and article sites.
I'm working with the client to rejig lots of this content and to publish new content.
What steps would you recommend to undertake when the new, updated site is launched to ensure Google clearly attributes the content to the clients site first?
One thing I will be doing is submitting a new xml + html sitemap.
Thankyou
-
There are no "best practices" established for the tags' usage at this point. On the one hand, it could technically be used for every page, and on the other, should only be used when it's an article, blog post, or other individual person's writing.
-
Thanks Alan.
Guess there's no magic trick that will give you 100% attribution.
Regarding this tag, do you recommend I add this to EVERY page of the clients website including the homepage? So even the usual about us/contact etc pages?
Cheers
Hash
-
Google continually tries to find new ways to encourage solutions for helping them understand intent, relevance, ownership and authority. It's why Schema.org finally hit this year. None of their previous attempts have been good enough, and each has served a specific individual purpose.
So with Schema, the theory is there's a new, unified framework that can grow and evolve, without having to come up with individual solutions.
The "original source" concept was supposed to address the scraper issue, and there's been some value in that, though it's far from perfect. A good scraper script can find it, strip it out or replace the contents.
rel="author" is yet one more thing that can be used in the overall mix, though Schema.org takes authorship and publisher identity to a whole new, complex, and so far confused level :-).
Since Schema.org is most likely not going to be widely adopted til at least early next year, Google's encouraging use of the rel="author" tag as the primary method for assigning authorship at this point, and will continue to support it even as Schema rolls out.
So if you're looking at a best practices solution, yes, rel="author" is advisable. Until it's not.
-
Thanks Alan... I am surprised to learn about this "original source" information. There must not have been a lot of talk about it when it was released or I would have seen it.
Google recently started encouraging people to use the rel="author" attribute. I am going to use that on my site... now I am wondering if I should be using "original source" too.
Are you recommending rel="author"?
Also, reading that full post there is a section added at the end recommending rel="canonical"
-
Always have a sitemap.xml file with all the URLs you want indexed included in it. Right after publishing, submit the sitemap.xml file (or files if there are tens of thousands of pages) through Google Webmaster Tools and Bing Webmaster Tools. Include the Meta "original-source" tag in your page headers.
Include a Copyright line at the bottom of each page with the site or company name, and have that link to the home page.
This does not guarantee with 100% certainty that you'll get proper attribution, however these are the best steps you can take in that regard.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Want to get site on same sub-folder but move CMS
Hi guys, We currently have a site which uses WordPress & attached to it is a Woocommerce for its store. We want to just move the store to shopify, but keep the blog on wordpress. We also want to keep the same URL structure for the store the same e.g. www.website.com/store (woocommerce) to www.website.com/store (shopify) Is this even possible? Can you point a domain to two separate servers (servers – one for wordpress, one on shopify) at the same time? This is SEO related as we want to preserve the authority as a sub-folder. See this: https://inbound.org/blog/the-sub-domain-vs-sub-directory-seo-debate-explained-in-one-flow-chart Cheers.
Intermediate & Advanced SEO | | cerednicenko0 -
Can Google read content that is hidden under a "Read More" area?
For example, when a person first lands on a given page, they see a collapsed paragraph but if they want to gather more information they press the "read more" and it expands to reveal the full paragraph. Does Google crawl the full paragraph or just the shortened version? In the same vein, what if you have a text box that contains three different tabs. For example, you're selling a product that has a text box with overview, instructions & ingredients tabs all housed under the same URL. Does Google crawl all three tabs? Thanks for your insight!
Intermediate & Advanced SEO | | jlo76130 -
My website has dropped in the rankings drastically. How can I get it back up the SERPs?
I manage a website that I took over 6 months ago - the site was sitting happily on page one of google so I haven't had to do much to keep it there - other than a few onsite improvements. However, last week the site dropped off the SERPs. The site is http://www.pro-techairconditioning.co.uk/content/home.html Could someone please suggest reasons for this and ways to solve the problem? Thanks
Intermediate & Advanced SEO | | SWD.Advertising0 -
Ranking sites in vertical markets with 90% scraped content
Hi, Hoping to get advice about ranking sites (a vertical market search engine/portal like a car site for example) that gets its content from scraping car sites. For various reasons (mostly scale eg cant get car dealers to push their listings to us) content was scraped. The startup has received great press, TV interviews, incubator programs etc, and has also secured very significant investment. I feel if this site was launched pre-panda it would be ranking much better. We have invested significantly in our tech, our search tools and site innovation place us easily as market leader in this space. Anyone with experience in ranking sites with legitimate reasons for using scraped content?
Intermediate & Advanced SEO | | edthomasnp0 -
Duplicate content issue - online retail site.
Hello Mozzers, just looked at a website and just about every product page (there are hundreds - yikes!) is duplicated like this at end of each url (see below). Surely this is a serious case of duplicate content? Any idea why a web developer would do this? Thanks in advance! Luke prod=company-081
Intermediate & Advanced SEO | | McTaggart
prod=company-081&cat=20 -
Impact of simplifying website and removing 80% of site's content
We're thinking of simplifying our website which has grown to a very large size by removing all the content which hardly ever gets visited. The plan is to remove this content / make changes over time in small chunks so that we can monitor the impact on SEO. My gut feeling is that this is okay if we make sure to redirect old pages and make sure that the pages we remove aren't getting any traffic. From my research online it seems that more content is not necessarily a good thing if that content is ineffective and that simplifying a site can improve conversions and usability. Could I get people's thoughts on this please? Are there are risks that we should look out for or any alternatives to this approach? At the moment I'm struggling to combine the needs of SEO with making the website more effective.
Intermediate & Advanced SEO | | RG_SEO0 -
WMT Index Status - Possible Duplicate Content
Hi everyone. A little background: I have a website that is 3 years old. For a period of 8 months I was in the top 5 for my main targeted keyword. I seemed to have survived the man eating panda but not so sure about the blood thirsty penguin. Anyway; my homepage, along with other important pages, have been wiped of the face of Google's planet. First I got rid of some links that may not have been helping and disavowed them. When this didn't work I decided to do a complete redesign of my site with better content, cleaner design, removed ads (only had 1) and incorporated social integration. This has had no effect at all. I filed a reconsideration request and was told that I have NOT had any manual spam penalties made against me, by the way I never received any warning messages in WMT. SO, what could be the problem? Maybe it's duplicate content? In WMT the Index Status indicates that there are 260 pages indexed. However; I have only 47 pages in my sitemap and when I do a site: search on Google it only retrieves 44 pages. So what are all these other pages? Before I uploaded the redesign I removed all the current pages from the index and cache using the remove URL tool in WMT. I should mention that I have a blog on Blogger that is linked to a subdomain on my hosting account i.e. http://blog.mydomain.co.uk. Are the blog posts counted as pages on my site or on Blogger's servers? Ahhhh this is too complicated lol Any help will be much appreciated! Many thanks, Mark.
Intermediate & Advanced SEO | | Nortski0 -
How can I remove duplicate content & titles from my site?
Without knowing I created multiple URLs to the same page destinations on my website. My ranking is poor and I need to fix this problem quickly. My web host doesn't understand the problem!!! How can I use canonical tags? Can somebody help, please.
Intermediate & Advanced SEO | | ZoeAlexander0