How to identify orphan pages?
-
I've read that you can use Screaming Frog to identify orphan pages on your site, but I can't figure out how to do it. Can anyone help?
I know that Xenu Link Sleuth works but I'm on a Mac so that's not an option for me.
Or are there other ways to identify orphan pages?
-
DeepCrawl.co.uk is another great resource here. This tool gives a full list of URLs, including number of internal links to each page. Filter this list by "No. links in" = 0, and this will give you a good list of orphaned pages.
Cheers,
Mike | Fresh Egg Australia -
Hi Marie!
Sadly, I don't use Xenu anymore either. Most of the solutions to find orphaned pages are either hit-and-miss manual methods (search OSE, search your server files). Or you could use a method like Agents of Value describes here.
Couple of posts that may help:
1. Find Orphaned Pages From Your Sitemap.xml File with Excel and IIS Toolkit
Requires IIS toolkit, which unless your installing on an external machine, isn't mac friendly
Ian has some great tips here, including:
- Search the server log files for every unique URL loaded over a 6-month period. Compare that to all unique URLs found in a site crawl. People have a funny way of stumbling into pages you’ve accidentally blocked or orphaned. Chances are, blocked pages will show up in your log file, even if they’re blocked.
- Do a database export. If you’re using WordPress or another content management system, you can export a full list of every page/post on the site, as well as the URL generated. Then compare that to a site crawl.
- Run two crawls of your site using your favorite crawler. Do the first one with the default settings. Then do a second with the crawler set to ignore robots.txt and nofollow. If the second crawl has more URLs than the first, and you want 100% of your site indexed, then check your robots.txt and look for meta ROBOTS issues.
3. Supposedly, Webseo has an automated option to find orphaned files, but I haven't used it nor can I vouch for it:http://www.webseo.com/
Hope this helps! Let us know what works.
-
Well, because they are 'orphans', you probably can't find them using a spider tool! I'd recommend the following process to find your orphan pages:
1. get a list of all the pages created by your CMS
2. get the list of all the pages found by Screaming Frog
3. add the two url lists into Excel and find the URLs in your CMS that are not in the Screaming Frog list.
You could probably use an Excel trick like this one:
http://superuser.com/questions/289650/how-to-compare-two-columns-and-find-differences-in-excel
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Filter pages - Shopify
Hi there, /collections/living-room-furniture/black
Technical SEO | | williamhuynh
/collections/living-room-furniture/fabric Is that ok to make all the above filter/tag pages canonicalised with their main category /collections/living-room-furniture OR I keep them as it is, so /collections/living-room-furniture/black can rank for filter keywords, example: black living room furniture, /collections/living-room-furniture/fabric fabric living room furniture etc. Also, does it needs to be noindex, follow as well? Note - already removed the main category content from filter pages, updated meta tags as well. Please advice, thank you0 -
How come only 2 pages of my 16 page infographic are being crawled by Moz?
Our Infographic titled "What Is Coaching" was officially launched 5 weeks ago. http://whatiscoaching.erickson.edu/ We set up campaigns in Moz & Google Analytics to track its performance. Moz is reporting No organic traffic and is only crawling 2 of the 16 pages we created. (see first and third attachments) Google Analytics is seeing hundreds of some very strange random pages (see second attachment) Both campaigns are tracking the url above. We have no idea where we've gone wrong. Please help!! 16_pages_seen_in_wordpress.png how_google_analytics_sees_pages.png what_moz_sees.png
Technical SEO | | EricksonCoaching0 -
How to Break Up a Page with Too Many Links
My client has a live page with 100+ links subdivided into 10 categories that each have great potential keyword targeting opportunities. I'd like to improve this page and my intuition is to split it into 11 pages, one page with links to all the others and a bit of content about each. Here's an example of the potential IA: Dog Rescue Groups
Technical SEO | | elenarox
Golden Retriever Rescue - description
Poodle Rescue - description
Cocker Spaniel Rescue - description
Poodle Rescue - description
Labrador Retriever Rescue - description
etc. --------- Golden Retriever Rescue
Link 1 - description
Link 2 - description
Link 3 - description Is this a good idea and will I see a big traffic drop overall at first? Also, these are all internal links, not external.0 -
Cache pages in search results
My URL is: pure mobile . ca when searching on google for "puremobile note 2 defender" the search results are coming up with the incorrect title pages of my search results - for some reason all the search results are coming up with "unlocked cell phone" at the end of the title. but on the android and on my desktop - they show the correct title of my pages. we used to deal with unlocked cell phone ( over a year ago) - and all meta tags and title tags have been fully updated. how can i let google know to update these results.
Technical SEO | | puremobile0 -
Should I nofollow search results pages
I have a customer site where you can search for products they sell url format is: domainname/search/keywords/ keywords being what the user has searched for. This means the number of pages can be limitless as the client has over 7500 products. or should I simply rel canonical the search page or simply no follow it?
Technical SEO | | spiralsites0 -
Index page
To the SEO experts, this may well seem a silly question, so I apologies in advance as I try not to ask questions that I probably know the answer for already, but clarity is my goal I have numerous sites ,as standard practice, through the .htaccess I will always set up non www to www, and redirect the index page to www.mysite.com. All straight forward, have never questioned this practice, always been advised its the ebst practice to avoid duplicate content. Now, today, I was looking at a CMS service for a customer for their website, the website is already built and its a static website, so the CMS integration was going to mean a full rewrite of the website. Speaking to a friend on another forum, he told me about a service called simple CMS, had a look, looks perfect for the customer ... Went to set it up on the clients site and here is the problem. For the CMS software to work, it MUST access the index page, because my index page is redirected to www.mysite.com , it wont work as it cant find the index page (obviously) I questioned this with the software company, they inform me that it must access the index page, I have explained that it wont be able to and why (cause I have my index page redirected to avoid duplicate content) To my astonishment, the person there told me that duplicate content is a huge no no with Google (that's not the astonishing part) but its not relevant to the index and non index page of a website. This goes against everything I thought I knew ... The person also reassured me that they have worked within the SEO area for 10 years. As I am a subscriber to SEO MOZ and no one here has anything to gain but offering advice, is this true ? Will it not be an issue for duplicate content to show both a index page and non index page ?, will search engines not view this as duplicate content ? Or is this SEO expert talking bull, which I suspect, but cannot be sure. Any advice would be greatly appreciated, it would make my life a lot easier for the customer to use this CMS software, but I would do it at the risk of tarnishing the work they and I have done on their ranking status Many thanks in advance John
Technical SEO | | Johnny4B0 -
Does page speed affect what pages are in the index?
We have around 1.3m total pages, Google currently crawls on average 87k a day and our average page load is 1.7 seconds. Out of those 1.3m pages(1.2m being "spun up") google has only indexed around 368k and our SEO person is telling us that if we speed up the pages they will crawl the pages more and thus will index more of them. I personally don't believe this. At 87k pages a day Google has crawled our entire site in 2 weeks so they should have all of our pages in their DB by now and I think they are not index because they are poorly generated pages and it has nothing to do with the speed of the pages. Am I correct? Would speeding up the pages make Google crawl them faster and thus get more pages indexed?
Technical SEO | | upper2bits0