Help finding website content scraping
-
Hi,
I need a tool to help me review sites that are plagiarising / directly copying content from my site. But tools that I'm aware, such as Copyscape, appear to work with individual URLs and not a root domain. That's great if you have a particular post or page you want to check. But in this case, some sites are scraping 1000s of product pages. So I need to submit the root domain rather than an individual URL.
In some cases, other sites are being listed in SERPs above or even instead of our site for product search terms. But so far I have stumbled across this, rather than proactively researched offending sites.
So I want to insert my root domain & then for the tool to review all my internal site pages before providing information on other domains where an individual page has a certain amount of duplicated copy. Working in the same way as Moz crawls the site for internal duplicate pages - I need a list of duplicate content by domain & URL, externally that I can then contact the offending sites to request they remove the content and send to Google as evidence, if they don't.
Any help would be gratefully appreciated.
Terry
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to Reduce the spam score of the website?
Hello, My Website Spam Score is 60. how can I reduce the spam score of the website. plz give me suggestion
White Hat / Black Hat SEO | | jyhkgkkkkkhkjgj0 -
My Website is getting too many DMCA Hits
My Website has been getting too many DMCA Hits since last december then my rankings dropped i would like to know if getting a new domain would be advisable ... and would it be good to redirect my website that is getting DMCA hits to the new domain i want to get it is advisable to build links for it the new domain or would it pass link juice to it (it has some spammy links tho)
White Hat / Black Hat SEO | | emmycircle0 -
Is this considered duplicate content?
Hi Guys, We have a blog for our e-commerce store. We have a full-time in-house writer producing content. As part of our process, we do content briefs, and as part of the brief we analyze competing pieces of content existing on the web. Most of the time, the sources are large publications (i.e HGTV, elledecor, apartmenttherapy, Housebeautiful, NY Times, etc.). The analysis is basically a summary/breakdown of the article, and is sometimes 2-3 paragraphs long for longer pieces of content. The competing content analysis is used to create an outline of our article, and incorporates most important details/facts from competing pieces, but not all. Most of our articles run 1500-3000 words. Here are the questions: NOTE: the summaries are written by us, and not copied/pasted from other websites. Would it be considered duplicate content, or bad SEO practice, if we list sources/links we used at the bottom of our blog post, with the summary from our content brief? Could this be beneficial as far as SEO? If we do this, should be nofollow the links, or use regular dofollow links? For example: For your convenience, here are some articles we found helpful, along with brief summaries: <summary>I want to use as much of the content that we have spent time on. TIA</summary>
White Hat / Black Hat SEO | | kekepeche1 -
Do we get de-indexed for changing some content and tags frequently? What is the scope in 2017?
Hi all, We are making some changes in our website content at some paragraphs and tags with our main keywords. I'm just wondering if this is going to make us de indexed from Google? Because we recently dropped in rankings when we added some new content; so I am worried whether there are any chances it will turn more risky when we try to make anymore changes like changing the content. There are actually many reasons a website gets de indexed from Google but we don't employ any such black hat techniques. Our website got a reputation with thousands of direct traffic and organic search. However I am curious to know what are the chances of getting de indexed as per the new trends at Google? Thanks
White Hat / Black Hat SEO | | vtmoz0 -
Question regarding subdomains and duplicate content
Hey everyone, I have another question regarding duplicate content. We are planning on launching a new sector in our industry to satisfy a niche. Our main site works as a directory with listings with NAP. The new sector that we are launching will be taking all of the content on the main site and duplicating it on a subdomain for the new sector. We still want the subdomain to rank organically, but I'm having struggles between putting a rel=canonical back to main site, or doing a self-referencing canonical, but now I have duplicates. The other idea is to rewrite the content on each listing so that the menu items are still the same, but the listing description is different. Do you think this would be enough differentiating content that it won't be seen as a duplicate? Obviously make this to be part of the main site is the best option, but we can't do that unfortunately. Last question, what are the advantages or disadvantages of doing a subdomain?
White Hat / Black Hat SEO | | imjonny0 -
More sitemap issues: help
Hey Guys, Seems I'm having more sitemap issues -I just checked my WMT and find that for my com.au and com site - the com.au site is showing i only have 2 pages indexed and 72 Web Pages submitted. The .com I look under sitemaps and it doesn't show any results as to how many pages have been indexed instead it is giving me this error warning - "Your Sitemap appears to be an HTML page. Please use a supported sitemap format instead." All 3 sites are listed here: http://bit.ly/1KTbWg0 http://bit.ly/1AU0f5k http://bit.ly/1yhz96v Any advice would be much appreciate here! Thanks guys
White Hat / Black Hat SEO | | edward-may0 -
Secondary Domain Outranking Master Website
IEEE is a large professional association dedicated to serving engineers. The IEEE Web Presence is made up of flagship sites like IEEE.org, IEEEXplore, and IEEE Spectrum, mid-tier sites like Computer.org, and smaller sites like those dedicated to specific conferences. It is unclear exactly when this started - but searches in Google for [ieee] currently return ieeeusa.org before ieee.org. This is troublesome, as users are typically looking for IEEE.org with such a general query. ieeeusa.org is a site that has a much narrower focus - it is dedicated to public policy. IEEE.org is one of the strongest domains - I am thinking that this is a glitch of some sort. I am removing a stale sitemap that is referenced in robots.txt (though again, I'm not seeing any issues with other pages - its just two queries that are trouble: [ieee] and [about ieee]. And its noticeable in analytics 🙂 http://ieee.d.pr/hMg0/YhklCw7Z What do you think? 🙂
White Hat / Black Hat SEO | | thegrif3290 -
Moving content to a clean URL
Greetings My site was seriously punished in the recent penguin update. I foolishly got some bad out sourced spammy links built and I am now paying for it 😞 I am now thinking it best to start fresh on a new url, but I am wondering if I can use the content from the flagged site on the new url. Would this be flagged as duplicate content, even if i took the old site down? your help is greatly appreciated Silas
White Hat / Black Hat SEO | | Silasrose0