Any tools for scraping blogroll URLs from sites?
-
This question is entirely in the whitehat realm...
Let's say you've encountered a great blog - with a strong blogroll of 40 sites.
The 40-site blogroll is interesting to you for any number of reasons, from link building targets to simply subscribing in your feedreader. Right now, it's tedious to extract the URLs from the site. There are some "save all links" tools, but they are also messy.
Are there any good tools that will
a) allow you to grab the blogroll (only) of any site into a list of URLs (yeah, ok, it might not be perfect since some sites call it "sites I like" etc.)
b) same, but export as OPML so you can subscribe.
Thanks!
Scott
-
Not at all. I guess my feeling here is that there is a sort of untapped social graph defined by blogrolls. If it were simple to harvest them upon visiting a blog (e.g. this blogger recommends...) one could do a stumble-on-steroids approach to a niche.
-
I thought you might be able to use the outbound link scraper to grab the outbound link onto the page. Pop in your URLS of the pages you want to scrape and it will spit out our a list of those domaind and urls. You can take those urls and put them into the contact finder and it will return the contact details for those sites. Combine the two spreadsheets for an epiuc list of blogs to contact for your outreach.
This is obviously for link building rather than subscribing - sorry if I have misunderstood what you were trying to do
-
Hi Keri,
That is a very cool tool, but is overkill for this. It takes far too many steps to accomplish only part of the desired goal of grabbing all blogroll URLs (within the blogroll DIV tag) and exporting the list to a valid OMPL file or URL list.
thanks!
-
nothing I saw there would do this. It looks like it could manage to list all external links, and I suppose you could manually pick the blogroll out of it.
-
Hi there,
Well, Keris response reminded me of this question and the fact that I found a tool for scraping these kind of lists:
Here it is (with some other cool tools) , have fun:
-
Hi Scott,
I'm going through older questions. Did you ever find a tool to do what you wanted to do here?
-
One thing to look at is Outwit Hub for Firefox. It might be able to help with that. It can scrape data from a page and do a lot with it. http://www.outwit.com/products/hub/. Don't know that it meets all of your needs, but I also haven't seen a response with anything better at the moment.
-
Hey Scott,
What a great question and <sigh>I don't have the answer. I am going to back to find out what people come up with here. Surely there is someone that lurks these parts that can throw something together?</sigh>
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate URLs
A campaign that I ran said that my client's site had some 47,000+ duplicate pages and titles. I was wondering how I can possibly set that many 301 redirects, but a Moz help engineer said it has a lot to do with session IDs. See this set of duplicate URLs: http://www.lumberliquidators.com/ll/c/engineered-hardwood-flooring (clearly the main URL for the page)
Moz Pro | | AlanJacob
http://www.lumberliquidators.com/ll/c/engineered-hardwood-flooring?PIPELINE_SESSION_ID=0ac00a2e0ad53eb90cb0b0304d178fc1
http://www.lumberliquidators.com/ll/c/engineered-hardwood-flooring?PIPELINE_SESSION_ID=0ac3039d0ad4af2720b3ccd2238547ab
http://www.lumberliquidators.com/ll/c/engineered-hardwood-flooring?PIPELINE_SESSION_ID=0ac071ed0ad4af292684b0746931158f To a crawler, that looks like 4 different pages, when it's clear that they're actually all different URLs for the same page. I was wondering if some of you, maybe with experience in site architecture, would have insight into how to address this issue? Thanks Alan0 -
Get into Google : New Sites
I have a brand new website. It was created 10 days ago. How long would it take for it to show up in search results? I understand that since the site is new, there are no sites sending it backlinks. Also, i have optimized the page for my keyword "xyz" and it received an A grade. The site does not figure even in the top 50 results. Please help me out. It is a one page web application that needs to drive traffic to survive.
Moz Pro | | dl_s0 -
Is the on page optimization tool not working?
i received a grade f for one of my keywords/page. i corrected some of the points but when i tried to submit the form again, it doesn't check off those corrected items. is there something wrong with the tool right now? also, how does the tool work if i'm targeting 2 different keywords for one page? e.g. digital marketing philippines and digital marketing agency philippines I'm pretty sure one of the keywords will have problems with at least 3 critical and high importance on page factors (broad keyword usage in page title, exact keyword usage in page title, etc.) is there an effect if there's a critical factor left unchecked because using both keywords in the title might look redundant?
Moz Pro | | optimind0 -
Keyword rankings tool is not working properly
My website http://www.logobite.com/ is in 29th position for the keyword "logo inspiration" but your keyword rankings tool is not showing up 😞 why?
Moz Pro | | logobite0 -
Big changes in site titles
So as I pour though some of the diagnostics data for over 100,000 pages of my site I see thousands of page title that "could" be changed. Could this cause some lost traffic for a while due to the big changes?
Moz Pro | | dvduval0 -
How reliable are the external link metrics found in the Open Site Explorer research tool?
I have noticed huge discrepancies with the external link metrics of one of the websites that I have been tracking since the launch of the website. Shortly after launch I captured a screenshot of the external link metrics to the website to track the progress. As I look at the same Open Site Explorer metrics now (2 months later) it has increased by 900% which is very inconsistent with the amount of link building done for the client. Thanks!
Moz Pro | | michaeleagar0 -
Bulk OSE Open Site Explorer Tool?
I am trying to do some spring cleaning for a client and hoping to prune any unnecessary domains. Is there a tool that will check, in bulk, these domains through Open Site Explorer? I've looked through all the different Excel spread sheet apps and google doc apps but they are incredibly buggy if they work at all since SEOmoz changed their data limits. Maybe a new tool has been updated in the last few months that I am not aware of. Thanks!
Moz Pro | | kerplow0 -
Links not appearing on Open Site Explorer
My site gained several new inbound links during December and only two of them are not all showing up on the latest Linkscape update. It seems to be the links that were created at the end of the month which are showing up, whereas a handful at the beginning of the month are nowhere to be seen. All the linking pages have been indexed by Google the links are do-follow, and one of the sites in particular is not obsure and has a DA in the 90's. I appreciate the Linkscape doesn't index everything, but I would have thought that more tof the results of my efforts would have shown up in OSE. I'd be really grateful if anyone could explain this to me please. Thanks Ben
Moz Pro | | atticus70