Site Spider/ Crawler/ Scraper Software
-
Short of coding up your own web crawler - does anyone know/ have any experience with a good bit of software to run through all the pages on a single domain?
(And potentially on linked domains 1 hop away...)
This could be either server or desktop based.
Useful capabilities would include:
- Scraping (x-path parameters)
-
of clicks from homepage (site architecture)
- http headers
- Multi threading
- Use of proxies
- Robots.txt compliance option
- csv output
- Anything else you can think of...
Perhaps an oppourtunity for an additional SEOmoz tool here since they do it already!
Cheers!
Note:
I've had a look at:- Nutch
http://nutch.apache.org/ - Heritrix
https://webarchive.jira.com/wiki/display/Heritrix/Heritrix - Scrapy
http://doc.scrapy.org/en/latest/intro/overview.html - Mozenda (does scraping but doesn't appear extensible..)
Any experience/ preferences with these or others?
-
Hey Alex,
Screaming Frog is hands down the best desktop crawling software and it has most of what you are looking for.
-Mike
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Managing multi-regional and multilingual sites
Hello, It's been a year since we launched our website and at first, we did it with a domain name called misitio.co. We have just bought the domain name mysite.com and my doubts are what should I do with the domains I have in other countries, for example .mx .br, should I redirect them to mysite.com or manage them independently? Thank you very much
International SEO | | Isabelcabreromunoz1 -
International SEO : Redirecting spanish visitors to spanish site
Hi There, I have a problem I need an advice for. I run an e-commerce site in French. Things are going well. I also run the Spanish version of this site. We are starting to sell. But nothing like French site. I have traffic coming to the French site from Spain from visitors with Spanish language and they don't buy anything. That is strange as the conversion rate is good. Si I want to redirect them to the Spanish site. We sell phone parts. Our SEO is mainly based on brands, make, and reference numbers. So keywords are almost the same in both languages. Of course, site.es is aiming at google.es, and site.fr at google.fr So I am wondering. If I redirect these visitors to the Spanish site, Will it affect french site's SEO? Thanks
International SEO | | Kepass0 -
Which will rank higher: Non-mobile friendly site in native language vs. mobile friendly global site in English?
Hi, we are currently implementing a mobile site, e.g. m.company.com. The global mobile site will only be available in English. We have local subsites of the desktop site, e.g. company.com/fr. The local subsites are not mobile friendly. If a user does a search for a brand term in France, **which site will rank higher in SERPs? **If it will be the global site, is there anything we can do (other than making them mobile friendly) to make the local sites rank higher? Would it be the mobile-friendly site, even though it is only in English, because the local site would be penalized for not being mobile friendly? Or would it be the local site, because Google will give priority to the fact that it's in French, which matches the language of the person searching?
International SEO | | jennifer.new0 -
Moving my site to one domain name .com from 3
Hi Guys, I'm ranking really well for my domains in my local geo - im wondering if it will be more effective if i moved the co.nz and com.au over to the .com - the only thing is will i still see my com.au and co.nz results on the .com?
International SEO | | edward-may0 -
Setting up I.P Filter Google Analytics - I.p ending with 0/24
Hi everyone, Your help would be much appreciated for the following: I am trying to setup I.P filters for our Google Analytic account to exclude internal traffic. We are located in multiple locations and each location have multiple I.p addresses. The I.P addresses we have end either by 0/24 which apparently means they provide a range from 0 to 255 and or 128/25. I have tried to setup the I.P addresses in different formats on the GA filter but they are apparently are not valid: example of one setup I tried: 1**.\2**.\8*.([0-256]) I have gone through the Filter setup guide from Google but I must be doing something wrong- probably to do on how I setup the I.P's ending with 0/24 and 128/25 If anyone could help me on how I can set up the I.P filters Google analytic would be great. The I.P addresses look like the following (changed digits): Location 1: 174.177.179.0/25 174.177.179.128/25 Location 2: 196.222.87.0/24
International SEO | | AlphaDigital2
194.59.197.0/24 Thanks you so much for your help, L.0 -
Multilingual Site with 2 Separate domains and hand-translated
I have 2 separate domains: .com & .jp
International SEO | | khi5
I am having a professional translator translate the English written material from .com. However, the .jp will have same pictures and videos that I have on the .com which means alt tags are in English and video titles are in English. I have some dynamic pages where I use Google Translate and those pages I place as "no index follow" to avoid duplicate issues and they are not very important pages for me any way. Question: since I am doing a proper translating - no machines involved - can I leave pages as is or should I include any format of these: ISO language codes
2) www.example/com/” /> Even though hand translated, the translation will probably be 85% similar to that if I used Google Translate. Will that potentially be seen as duplicate content or not at all since I have not used the Google Translate tool? I wonder from which angle Google analyses this. Thank you,0 -
URL Structure for Multilingual Site With Two Major Locations
We're working on a hotel site that has two major locations. Locations currently live in separate domains. The sites target users from around the world and offer content in multiple languages. The client is looking into migrating all content into one domain and creating sub-folders for each location. The sites are strong in organic search, but they want to expand the keyword portfolio to broader keywords regarding activities, which they also market on their sites. The goal is to scale their domain authority as they have a really strong brand. The question is which would be a preferred URL structure in case content is finally migrated into one domain? - (we have doubts about were the lang folder should be placed as each location has different amenities and services). Here is what we had in mind: domain.com – this is the homepage domain.com/location-1 – to target English visitors domain.com/location-2 – to target English visitors domain.com/es/location-1 – to target Spanish visitors domain.com/es/location-2 – to target Spanish visitors
International SEO | | burnseo0 -
SEO Audit "Hybrid Site"
Hi everyone! I'm trying to analyze a website which is regional in scope. The way the site for every market has been build out is like this : http://subdomain.rootdomain.com/market | http://asiapacific.thisismybrandname.com/ph OR http://subdomain.rootdomain.com/language | http://asiapacific.thisismybrandname.com/en Since this is the first time I'm trying to work on these kinds of sites, I would want to ask for any guidance / tips on how to do about SEO site and technical audit. FYI, the owner of the sites is not giving me access / data to their webmaster account nor their analytics tracking tool. Thanks everyone! Steve
International SEO | | sjcbayona-412180