Crawlers crawl weird long urls
-
I did a crawl start for the first time and i get many errors, but the weird fact is that the crawler tracks duplicate long, not existing urls.
For example (to be clear):
there is a page: www.website.com/dogs/dog.html
but then it is continuing crawling:
www.website.com/dogs/dog.html
www.website.com/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dogs/dogs/dog.htmlwhat can I do about this? Screaming Frog gave me the same issue, so I know it's something with my website
-
Answer from Screaming Frog!
The reason the SEO spider is crawling these URLs, is due to incorrect relative linking on the site from the login URL.
It's actually when the spider crawls the login page, http://www.website.com/login?returnurl=%2F which then leads to this URL http://www.website.com/Home/ctl/SendPassword?returnurl=http:/www.website.com/ and then this /home/ sub directory URL http://www.website.com/Home/ctl/page/dogs.aspx which links to http://www.website.com/Home/ctl/page/page/dogs.aspx and so on and so forth. This is the path to the incorrect relative linking (attached for you).To stop this, you can correct the incorrect relative linking, or easier, simply exclude the login page.
-
Wow, Big mistakes are made one Home
maybe because of the .aspx. extension? alle pages have seo-friendly urls
Thanks Wesley and Paddy Displays
-
I see a link to http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/HeutinkICT.aspx from http://www.odin-groep.nl/Home/ctl/OverOdin/ReindersICT.aspx.
It's the bottom left block which causes this link. This way you will get a big nesting effect.
-
OK found one problem
on this page
http://www.odin-groep.nl/Home/ctl/OverOdin/ReindersICT.aspx
you have a link to
http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/LesscherIT.aspx
which i think should be
-
ok I did a quick screaming fog and I think I have an idea, you just have to follow the breadcrumbs
You said in you example "In Links 9", you need to find out what those pages are and follow it back to the point of origin As I think its just one bad link that cause this nested link effect.
eg
http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/OverOdin/OverOdin/HeutinkICT.aspx
is being linked from
http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/OverOdin/StationtoStation.aspx (as well as others)
You just have to follow that trail till you find the source of the problem
-
every link, except the hompage itself
-
I can't see any source:
The pages are like:
| URL | www.website.com/page/ |
| Status Code | 200 |
| Status | OK |
| Type | text/html; charset=utf-8 |
| Size | 55811 |
| Title | |
| Level | 10 |
| In Links | 9 |
| Out Links | 38 | -
Which URL(s) is/are causing problems?
-
please be free to check: http://tinyurl.com/lox7le9
-
You don't necessarily have to remove the link. As long as you can verify that it directs to the right page.
But curious to see what caused the problem
-
I think Screaming Frog will tell you the page it found the weird url, then you can check the source, and find out whats producing that link.
-
That is a good one! It's true that I have the same linking to the page itself. I will remove all that kind of links first and crawl again. I'll keep you in touch!
-
Are you somehow linking to www.website.com/dogs/dog.html from the page itself? There could be something wrong with that link.
I made a small mistake not so long ago with a redirection plugin. I told it to go to domain.com. This plugin was looking at the base + what i told it to. So it went to: domain.com/domain.com. Perhaps you made a similar mistake.Maybe you can send me the URL and i can take a look at it?
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved Moz Pro crawl signaling missing canonical which are not?
Hi,
Moz Pro | | rolandvintners
I'm trying MozPro considering using it.
One of the tool which is appealing is the crawl and insights.
After quick use, I really question many of the alerts, for instance, I got a "missing canonical tag" on this url: https://vintners.co/wine/grawu_gto#2020 but when I check my markup, there's clearly a canonical tag: <link rel="canonical" href="https://vintners.co/wine/grawu_gto"> Anybody can explain?
I asked Moz Pro staff when being onboarded but didn't get an answer...
Honestly, I'm questioning the value of these crawls, or may be I miss something?0 -
How do you create tracking URLs in Wordpress without creating duplicate pages?
I use Wordpress as my CMS, but I want to track click activity to my RFQ page from different products and services on my site. The easiest way to do this is through adding a string to the end of a URL (ala http://www.netrepid.com/request-for-quote/?=colocation) The downside to this, of course, is that when Moz does its crawl diagnostic every week, I get notified that I have multiple pages with the same page title and the dup content. I'm not a programming expert, but I'm pretty handy with Wordpress and know a thing or two about 'href-fing' (yeah, that's a thing). Can someone who tracks click activity in WP with URL variables please enlighten me on how to do this without creating dup pages? Appreciate your expertise. Thanks!
Moz Pro | | Netrepid0 -
Having 1 page crawl error on 2 sites
Help! A few weeks back, my dev team did some "changes" (that I don't know anything about), but ever since then, my Moz crawl has only shown one page for either http://betamerica.com or http://fanex.com. Moz service was helpful in talking about a redirect loop that existed, and I asked my team to fix it, which it looks to me like they have. Still, 1 page. I used SEO Book's spider tool and it also only sees 1 page, and sees the sites as http://https://betamerica.com (for example), which is just weird. I don't know enough about HT Access or server stuff to figure out what's going on, so if someone can help me figure that out, I'd appreciate it.
Moz Pro | | BetAmerica0 -
No follow links also been reported in SEOmoz crawl diagnostics
Hi, Why does SEOmoz reports links which has been marked as 'nofollow'. I am getting 'Overly-Dynamic URL' reports on links which I have designated as nofollow which means Google will discount them. So why does SEOmoz still report them. Thanks.
Moz Pro | | malpani0 -
My Campaign only crawled 3 pages on my site
On my first crawl of a new campaign, the software only crawled 3 pages. XXXaceXXXscholarships.org any ideas?
Moz Pro | | Santaur0 -
Why does SEOMoz only crawl 1 page of my site?
My site is: www.thetravelingdutchman.com. It has quite a few pages, but for some reason SEOMoz only crawls one. Please advise. Thanks, Jasper
Moz Pro | | Japking0 -
How to get past PA and DA value for a specific URL ?
Hi everyone, I was wondering if there is a way to get the past PA and DA value for a specific URL ? I did run a small SEO campaign targeting a couple of deep pages over a month on my site and I would like to measure the efficiency of this campaign but I forgot to write down what was the PA (I know more aloess the DA) of those pages before the starting the campaign. Is their a way to retrieve the historical data of PA/DA ? thanks
Moz Pro | | Gus_Martin0 -
20000 site errors and 10000 pages crawled.
I have recently built an e-commerce website for the company I work at. Its built on opencart. Say for example we have a chair for sale. The url will be: www.domain.com/best-offers/cool-chair Thats fine, seomoz is crawling them all fine and reporting any errors under them url great. On each product listing we have several options and zoom options (allows the user to zoom in to the image to get a more detailed look). When a different zoom type is selected it adds on to the url, so for example: www.domain.com/best-offers/cool-chair?zoom=1 and there are 3 different zoom types. So effectively its taking for urls as different when in fact they are all one url. and Seomoz has interpreted it this way, and crawled 10000 pages(it thinks exist because of this) and thrown up 20000 errors. Does anyone have any idea how to solve this?
Moz Pro | | CompleteOffice0