Skip to content
Advanced seo 74da3a3

How To Create an Excel Spreadsheet to Remove Link Spam Post Penguin

Ron Medlin

This YouMoz entry was submitted by one of our community members. The author’s views are entirely their own (excluding an unlikely case of hypnosis) and may not reflect the views of Moz.

Table of Contents

Ron Medlin

How To Create an Excel Spreadsheet to Remove Link Spam Post Penguin

This YouMoz entry was submitted by one of our community members. The author’s views are entirely their own (excluding an unlikely case of hypnosis) and may not reflect the views of Moz.

Love it or hate it, we are now living in a Post Penguin world. Google’s algorithm has taken a serious and some might say harsh step towards eliminating all link spam from their index. You might be asking, what does Google consider link spam? Well Google is pretty hush hush about exactly what constitutes link spam, but if you get that not so happy feeling in your stomach when you think about the links you’ve been building, then it could be time to take a looksy.

Glen Gabe at G-squared interactive wrote a great post outlining some of the unnatural linking practices that many sites hit by Penguin have in common. The big ones we’ve seen in the link profiles of the Penguin affected sites we’ve analyzed were:

  1. Obvious paid links from blog networks
  2. Sitewide links with exact anchor text either in the footer or blogroll
  3. Lots of links from article marketing sites (Usually having exact anchor text)
  4. Guest posts on questionable sites (a footprint of spun article distribution products like Unique Article Wizard, SEO Linkvine, etc.)
  5. Lots of de-indexed or low quality directory links

We have not seen many sites affected by comment spam, but we still wouldn’t suggest having them in your link profile.

There has been some debate on whether or not a site hit by Penguin should remove these questionable links since there could be a possibility that some of these links are helping your site, and the ROI on spam links removal could be nominal. This might be true and I would definitely suggest fixing any on-page issues before undertaking any link spam removal project as it can be a daunting task, but in a post-Penguin world, I subscribe to the idea that anything less than completely white hat will be taken away eventually. If this is the case then removing unnatural links from your profile is a no brainer.

Now that we’ve gotten that out of the way, let’s get in to why I wrote this article in the first place. After Penguin was released we had several businesses call us up freaking out that their rankings had dropped and they couldn’t find their websites anywhere in Google. We took a look at their backlink profiles and sure enough the unnatural tactics we listed above were showing up for each of these businesses. We told them what we thought the issues were, signed the contracts, and went to work to remove the bad links.

Before starting the projects, I wanted to create a comprehensive excel spreadsheet template that could be easily duplicated and handed off to an outsourcer for webmaster outreach in order to get the links removed. I think what came out is a pretty good start, if you have any additions please add them in the comments section. The tools you’ll need to create this spreadsheet are:

  1. Excel
  2. Google Webmaster Tools
  3. Majestic SEO
  4. Open Site Explorer
  5. NetPeak Checker (AWESOME Tool)

The first step is to gather all the backlinks you can access. Let’s start with the webmaster tools links. Login to your webmaster tools account, and click on the site in question. In the menu on the left expand the “Traffic” tab and click on “Links To Your Site.” Click on more under the first column, then “Download more sample links.”

Remove Link Spam

Open the excel spreadsheet and rename the current sheet “Webmaster Tools Links” and save the spreadsheet as “Client’s Name Backlink Removal Spreadsheet.”

Next let’s go over to Open Site Explorer and type in the URL. OSE will bring back a list of links. You want to show links from “only external” and to “pages on this root domain” then hit “Filter.”

Link Spam Removal

Click “Download CSV” and open the file, then right click on the sheet tab at the bottom of the sheet and select “move or copy” and move it over to the “Client’s Name Backlink Removal Spreadsheet.” Now rename this current sheet “Open Site Explorer Links.”

Sorry if this is really basic for some of you, but sometimes when reading posts like this I wish people would be as descriptive as possible so… 

On we go to Majestic SEO to get our final and probably biggest set of links. Type in the URL and hit explore, then “Create Report.” I like to do this for both the “Fresh Index” and the “Historic Index” to make sure I get all the links, but I’ll describe it once. Select the index you want and I use the drop down box and select whichever has the most backlinks then “Create Report.” Download the backlinks and and open the file. Again move the sheet over to “Client’s Name Backlink Removal Spreadsheet” and rename it “Fresh Index Links” and repeat this whole process for the Historic Index Links.

Now create another sheet in your Master Spreadsheet and name it “Root Linking Domains.” You’ll want to take all the links from the other sheets and copy them over to this sheet. After doing this I highlight the first two columns, insert a table, remove duplicate links from Column 1, and sort alphabetically.

Now I want to strip out the Root Domains for all the links in Column 1 and put them into Column 2, so in cell B2 I use the following formula:

=LEFT(A2,FIND("/",A2,8))

Copy all the Root Domains in Column 2 and move a couple of cells over, paste special, and select “Values.” Insert a table into the column you created and remove duplicates. Now we have all our links in Column 1, Root Domains with duplicates in Column 2, and Root Domains Without Duplicates in our new Column, rename each column accordingly.

Now we need to analyze all the root domains to determine which links are relevant, have authority, are obvious spam links, etc., so head over to NetPeak Checker, and download their free software. This is a great tool for analyzing links if you don’t know how to use APIs for excel. It can be a little buggy at times though, so be aware. Once you’ve installed and opened the software we want to choose the factors we most want to analyze:

  1. PR Main
  2. Status Code (Under Server)
  3. Index
  4. SEOmoz Page Authority
  5. SEOmoz Domain Authority

Hit save after selecting these because you may have to shut down the software if it acts up, and you don’t want to have to select these all again.

Click on the “Load” icon in the top left, copy around 300 of your root linking domains without duplicates, and paste them in the box. You can play around with this number, but I’ve found that it will usually safely analyze about 300 links at a time. Anymore and you might start getting a bunch of n/f for PR. This is a sign that the software is bugging out, so close it down, restart, and analyze fewer links.

Once the software is finished analyzing, export, and save the file. Open the file and move the sheet over to your Master Spreadsheet like we did before and rename it “Link Analysis.” Use NetPeak to analyze all the root linking domains. For the rest of the domains after you’ve analyzed and exported them you’ll need to copy and paste them to your new “Link Analysis” Sheet, so you have all the analysis on the same sheet.

So that’s it, now you’ve created a comprehensive spreadsheet with all your links and analysis of all their root domains. Click on the link below to download a copy of the spreadsheet template.

BACKLINK REMOVAL SPREADSHEET

All you have to do now is go through and decide which links you want to remove, get the contact info of the webmasters, email the webmasters with a list of the links you want removed, document all correspondence, and submit to Google for reconsideration with all your documentation. This is a simplified explanation, but Ryan Kent gave a great answer in this Q & A about the link removal process.

I would start with the obvious, anything that’s de-indexed (some might think this is a waste of time but we still do it), sites with DA less than 25, blog networks (HI PR homepage links with spun articles and exact anchor), sites with posts that make no sense and have exact anchor text, and whatever you would consider spam links.

But a major one to look out for is sitewide links with exact anchor text. These can create a large imbalance in your exact anchor text to branded anchor text ration, which seems to be a major factor in the Penguin Update. You can either change these sitewide links to branded terms like www.yoursite.com or delete them all together. Here examples of two sites that were able to recover from Penguin after dealing with these sitewide links.

Recovering from an Over Optimization Penalty - A True Story
How WPMU.org Recovered From The Penguin Update

I hope this spreadsheet helps you or you clients on your road to Penguin recovery. Please leave any comments or suggestions on how to improve the spreadsheet below.

Ron Medlin is the owner of SearchDog Marketing, an Atlanta based Inbound Marketing firm. If you need help removing link spam feel free to contact SearchDog anytime.

Back to Top

Make smarter decisions with Moz API

Start building and scale easily with affordable plans

Read Next

Build a Search Intent Dashboard to Unlock Better Opportunities

Build a Search Intent Dashboard to Unlock Better Opportunities

Aug 29, 2024
How to Optimize for Google's Featured Snippets [Updated for 2024]

How to Optimize for Google's Featured Snippets [Updated for 2024]

Aug 20, 2024
20 SEOs Share Their Key Takeaways From the Google API Leaks

20 SEOs Share Their Key Takeaways From the Google API Leaks

Jun 18, 2024

Comments

Please keep your comments TAGFEE by following the community etiquette

Comments are closed. Got a burning question? Head to our Q&A section to start a new conversation.