Any tools for scraping blogroll URLs from sites?
-
This question is entirely in the whitehat realm...
Let's say you've encountered a great blog - with a strong blogroll of 40 sites.
The 40-site blogroll is interesting to you for any number of reasons, from link building targets to simply subscribing in your feedreader. Right now, it's tedious to extract the URLs from the site. There are some "save all links" tools, but they are also messy.
Are there any good tools that will
a) allow you to grab the blogroll (only) of any site into a list of URLs (yeah, ok, it might not be perfect since some sites call it "sites I like" etc.)
b) same, but export as OPML so you can subscribe.
Thanks!
Scott
-
Not at all. I guess my feeling here is that there is a sort of untapped social graph defined by blogrolls. If it were simple to harvest them upon visiting a blog (e.g. this blogger recommends...) one could do a stumble-on-steroids approach to a niche.
-
I thought you might be able to use the outbound link scraper to grab the outbound link onto the page. Pop in your URLS of the pages you want to scrape and it will spit out our a list of those domaind and urls. You can take those urls and put them into the contact finder and it will return the contact details for those sites. Combine the two spreadsheets for an epiuc list of blogs to contact for your outreach.
This is obviously for link building rather than subscribing - sorry if I have misunderstood what you were trying to do
-
Hi Keri,
That is a very cool tool, but is overkill for this. It takes far too many steps to accomplish only part of the desired goal of grabbing all blogroll URLs (within the blogroll DIV tag) and exporting the list to a valid OMPL file or URL list.
thanks!
-
nothing I saw there would do this. It looks like it could manage to list all external links, and I suppose you could manually pick the blogroll out of it.
-
Hi there,
Well, Keris response reminded me of this question and the fact that I found a tool for scraping these kind of lists:
Here it is (with some other cool tools) , have fun:
-
Hi Scott,
I'm going through older questions. Did you ever find a tool to do what you wanted to do here?
-
One thing to look at is Outwit Hub for Firefox. It might be able to help with that. It can scrape data from a page and do a lot with it. http://www.outwit.com/products/hub/. Don't know that it meets all of your needs, but I also haven't seen a response with anything better at the moment.
-
Hey Scott,
What a great question and <sigh>I don't have the answer. I am going to back to find out what people come up with here. Surely there is someone that lurks these parts that can throw something together?</sigh>
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is there any report / tool that gives me last cache date for each page on my site ?
my site has several hundred pages, and it is important for me to know last crawl date of each page as well as number of pages cralwed in a particualr period ( from / to date ). is there any report in seomoz that can help for this ? or any other suggestion ?
Moz Pro | | elegantmicroweb0 -
Using Seomoz for Site Evaluation am I up to par ?
Just wanted to see how people using the seomoz bar would rate a four month old site with Domain-Homepage Authority of 27 Mozrank of 5.08 and Moztrust of 5.65 . I've read up on all the factors but just wanted to know if Im up to par on building a great site thats search engine friendly. Inner pagers are on a PA of 20 and around the same mozrank and moztrust levels of +- 5.
Moz Pro | | NikolasNikolaou0 -
Anchor Text Report in Open Site Explorer
When downloading an anchor text report in OSE, there are very often a bunch or anchor texts at the end of the report that have 0 next to them (i.e. anchor texts that come from 0 domains and from 0 links - if you want a URL to run as an example try www.bbc.co.uk/sport/ and paginate your way to page 7) Surely it is not possible for an anchor text to be found on zero domains/links - so how should these zeros be interpreted? There are numerous different anchor texts showing these zero's. Thanks in advance for any responses.
Moz Pro | | searchysearchy0 -
Usable to set up campaign because site cannot be
I don't understand this message. i never had problems with other sites and now I get problems with this message when trying to set a campaign twice for 2 different sites. I received the same message twice. What do I do? Help! We have detected that the root domain xxxxxxxxxxxxxxxxxxxx does not respond to web requests. Using this domain, we will be unable to crawl your site or present accurate SERP information. Thanks.
Moz Pro | | mcuneo0 -
Links in Open Site Explorer turning into downloads
Hey guys. This is my first question on here so hello 🙂 I have noticed recently a couple of times in Open Site Explorer, when I am checking out links, they are direct download links. The two I have noticed are flash files and with one companies links, dropbox related. Can anyone shed any light on this? I am pretty new to SEO and find it really confusing. Thanks in advance 🙂
Moz Pro | | Nextman0 -
How to get seomoz to re-crawl a site?
I had a lot of duplicate content issues and have fixed all the other warnings. I want to check the site again.
Moz Pro | | adamzski0 -
How often do you update Link Analysis tool?
Wondering how often SEOMOZ updates their Link Analysis pages? I have had the same status for a month or so - although I know we have added links, etc.
Moz Pro | | findachristianjob0