Any tools for scraping blogroll URLs from sites?
-
This question is entirely in the whitehat realm...
Let's say you've encountered a great blog - with a strong blogroll of 40 sites.
The 40-site blogroll is interesting to you for any number of reasons, from link building targets to simply subscribing in your feedreader. Right now, it's tedious to extract the URLs from the site. There are some "save all links" tools, but they are also messy.
Are there any good tools that will
a) allow you to grab the blogroll (only) of any site into a list of URLs (yeah, ok, it might not be perfect since some sites call it "sites I like" etc.)
b) same, but export as OPML so you can subscribe.
Thanks!
Scott
-
Not at all. I guess my feeling here is that there is a sort of untapped social graph defined by blogrolls. If it were simple to harvest them upon visiting a blog (e.g. this blogger recommends...) one could do a stumble-on-steroids approach to a niche.
-
I thought you might be able to use the outbound link scraper to grab the outbound link onto the page. Pop in your URLS of the pages you want to scrape and it will spit out our a list of those domaind and urls. You can take those urls and put them into the contact finder and it will return the contact details for those sites. Combine the two spreadsheets for an epiuc list of blogs to contact for your outreach.
This is obviously for link building rather than subscribing - sorry if I have misunderstood what you were trying to do
-
Hi Keri,
That is a very cool tool, but is overkill for this. It takes far too many steps to accomplish only part of the desired goal of grabbing all blogroll URLs (within the blogroll DIV tag) and exporting the list to a valid OMPL file or URL list.
thanks!
-
nothing I saw there would do this. It looks like it could manage to list all external links, and I suppose you could manually pick the blogroll out of it.
-
Hi there,
Well, Keris response reminded me of this question and the fact that I found a tool for scraping these kind of lists:
Here it is (with some other cool tools) , have fun:
-
Hi Scott,
I'm going through older questions. Did you ever find a tool to do what you wanted to do here?
-
One thing to look at is Outwit Hub for Firefox. It might be able to help with that. It can scrape data from a page and do a lot with it. http://www.outwit.com/products/hub/. Don't know that it meets all of your needs, but I also haven't seen a response with anything better at the moment.
-
Hey Scott,
What a great question and <sigh>I don't have the answer. I am going to back to find out what people come up with here. Surely there is someone that lurks these parts that can throw something together?</sigh>
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What SEO tools do you use in conjunction with Moz?
It seems like most people use multiple SEO tools. I am interested in hearing what you use in conjunction with Moz and why. -Stephen
Moz Pro | | martechwiz2 -
Multiple Sites/Internal Pages Campaign
Can MOZ do reports/ranks/campaigns separately for each of our sites, then do separate keyword campaigns for specific internal content pages of each of the websites? For a law firm with both a defense and family law sites. We have multiple pages within each site, we will need to do separate and individual campaigns for assault, burglary, traffic tickets, ect. in our defense site. And we'll need other specific campaigns for other specific content in out family law site. Will Moz be able to accommodate our needs with their $99/mo plan or is that service only available for higher packages? (or is it even possible in ANY of their packages???) Thanks.
Moz Pro | | Wallin_Klarich0 -
Can we add sites to the crawl queue for OSE?
Is it possible to request that Open Site Explorer crawls a new URL on its next run? This tool is the first place I go to when working on a new site, and when there is "No Data Available" this is a little frustrating. I fully appreciate that this lack of data is usually a signal that the website is either very new or of low quality, however that if often the reason that I am brought in and would very much like to benchmark and provide initial analysis using this tool. It would make sense that OSE crawls the sites that Moz members are working on wouldnt it? Scott.
Moz Pro | | eseyo0 -
Where does the crawler find the urls?
The SEO Moz crawler has found a number of 500 error pages, and 404s etc which is very useful 🙂 however some of the urls are weird/broken formats we don't recognise and nobody remembers ever using - not weird enough to imply hacking, but something broken in the CMS Is there anyway to find out where the crawler found these urls? I can patch up and redirect the end result as best I can but I would prefer to fix plug the leak thanks 🙂
Moz Pro | | Fammy1 -
How to force a recrawl of a site?
Hi, I made changes in my site. I would like to see the result of the crawl diagnostic. I know the crawl is happening every week, however, is there a way to force a re-crawl in order not to have to wait 5 days? Cheers,
Moz Pro | | nuxeo0 -
Is there a beginners guide to using the tools and what do the results mean.
Great set of tools! But what would make them better is some type of guide to what the tools do (remember I am a new user) and how to maximize the results. Oh and while your at it what do the results mean.
Moz Pro | | marcreece0 -
Which tools are better? SEOMoz Tools or Bruce Clay's Tools.
I've ALWAYS wanted to hear some discussion on this, please give me your honest opinion so I can make the correct decision.
Moz Pro | | fergseo2 -
Domain and Submain : which choice ? (open explorer tool)
Hi, 1/ Please could you tell me why Moztrust and Mozrank give not similar figures for subdomain and root domain ? 2/ Which is the best way for Google webmaster tool for configuring : Sub or Root domain ? 3/ Finally, regarding anchor text, Sub or root domain ? Tks for links or knowledge base about it....
Moz Pro | | mozllo2