Crawlers crawl weird long urls
-
I did a crawl start for the first time and i get many errors, but the weird fact is that the crawler tracks duplicate long, not existing urls.
For example (to be clear):
there is a page: www.website.com/dogs/dog.html
but then it is continuing crawling:
www.website.com/dogs/dog.html
www.website.com/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dogs/dogs/dog.htmlwhat can I do about this? Screaming Frog gave me the same issue, so I know it's something with my website
-
Answer from Screaming Frog!
The reason the SEO spider is crawling these URLs, is due to incorrect relative linking on the site from the login URL.
It's actually when the spider crawls the login page, http://www.website.com/login?returnurl=%2F which then leads to this URL http://www.website.com/Home/ctl/SendPassword?returnurl=http:/www.website.com/ and then this /home/ sub directory URL http://www.website.com/Home/ctl/page/dogs.aspx which links to http://www.website.com/Home/ctl/page/page/dogs.aspx and so on and so forth. This is the path to the incorrect relative linking (attached for you).To stop this, you can correct the incorrect relative linking, or easier, simply exclude the login page.
-
Wow, Big mistakes are made one Home
maybe because of the .aspx. extension? alle pages have seo-friendly urls
Thanks Wesley and Paddy Displays
-
I see a link to http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/HeutinkICT.aspx from http://www.odin-groep.nl/Home/ctl/OverOdin/ReindersICT.aspx.
It's the bottom left block which causes this link. This way you will get a big nesting effect.
-
OK found one problem
on this page
http://www.odin-groep.nl/Home/ctl/OverOdin/ReindersICT.aspx
you have a link to
http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/LesscherIT.aspx
which i think should be
-
ok I did a quick screaming fog and I think I have an idea, you just have to follow the breadcrumbs
You said in you example "In Links 9", you need to find out what those pages are and follow it back to the point of origin As I think its just one bad link that cause this nested link effect.
eg
http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/OverOdin/OverOdin/HeutinkICT.aspx
is being linked from
http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/OverOdin/StationtoStation.aspx (as well as others)
You just have to follow that trail till you find the source of the problem
-
every link, except the hompage itself
-
I can't see any source:
The pages are like:
| URL | www.website.com/page/ |
| Status Code | 200 |
| Status | OK |
| Type | text/html; charset=utf-8 |
| Size | 55811 |
| Title | |
| Level | 10 |
| In Links | 9 |
| Out Links | 38 | -
Which URL(s) is/are causing problems?
-
please be free to check: http://tinyurl.com/lox7le9
-
You don't necessarily have to remove the link. As long as you can verify that it directs to the right page.
But curious to see what caused the problem
-
I think Screaming Frog will tell you the page it found the weird url, then you can check the source, and find out whats producing that link.
-
That is a good one! It's true that I have the same linking to the page itself. I will remove all that kind of links first and crawl again. I'll keep you in touch!
-
Are you somehow linking to www.website.com/dogs/dog.html from the page itself? There could be something wrong with that link.
I made a small mistake not so long ago with a redirection plugin. I told it to go to domain.com. This plugin was looking at the base + what i told it to. So it went to: domain.com/domain.com. Perhaps you made a similar mistake.Maybe you can send me the URL and i can take a look at it?
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moz Crawl Test error
Moz crawl test show blank report for my website test - guitarcontrol.com. Why??? Please suggest.
Moz Pro | | zoe.wilson170 -
Why for all my campaigns it is always shows that the number of pages crawled as 1
Hi All, I am new to moz. Can anyone help to solve my problem. I am signed up for a pro account and taking a free trial. and I've created 3 campaigns, for everything, the number of pages crawled is shown as 1 (i.e there are only one page is crawled for a given url, it doesn't crawl my pages comes through that url, like pagination and etc.) Anyone please tell me, Is this is error due to my site or any activity in my campaign.
Moz Pro | | sandy7th0 -
Moz crawl only shows 2 pages, but we have more than 1000 pages.
Hi Guys Is there anyway we can test Moz crawler ? it showing only 2 pages crawls. We are running website on HTTPS ? Is HTTPS is issues for Moz ?
Moz Pro | | dotlineseo0 -
Crawl Report Re-direct Notice?
Just trying to understand if this is bad or not. The crawl report has picked up that my website is redirecting (301) from http://mysite.com to http://www.mysite.com - under Crawl Notices (blue section). Is this the wrong way to do it as we wanted the www domain version? Is that why SEOMoz has flagged it ?
Moz Pro | | Ubique0 -
Still Cant Crawl My Site
I've removed all blocks but two from our htaccess. They are for amazonaws.com to block amazon from crawling us. I did a fetch as google in our WM tools on our robots txt with success. SEOMoz crawler here hit's our site and gets a 403. I've looks in our blocked request logs and amazon is the only one in there. What is going on here?
Moz Pro | | martJ0 -
Truncate page URLs
We have some pages (for example a contact us form) for which the URL is modified by the CMS depending on the referring page (this helps to put the form submission in context for the sales reps who get the contact submission). The SEOmoz crawler considers each URL a new page -- and so numbers like in diagnostics are all inflated as the same page is listed multiple times (e.g. for too many links) Is there a setting to change what the crawler considers to be the same page? Here are two URLs for the same page that the reports treat as separate pages: http://www.spirent.com/About-Us/Contact_us.aspx?referurl=0F528F4D703D8BB3523738D6373AA8AD http://www.spirent.com/About-Us/Contact_us.aspx?referurl=10ACDA6055244E369395223437FDCF30 The page is actually: http://www.spirent.com/About-Us/Contact_us.aspx Thanks Ken
Moz Pro | | spirent.marcom0 -
Lots of site errors after last crawl....
Something interesting happened on the last update for my site on SEOmoz pro tools. For the last month or so the errors on my site were very low, then on the last update I had a huge spike in errors, warnings, and notices. I'm not sure if somehow I made a change to my site (without knowing it) and I caused all of these errors, or if it just took a few months to find all the errors on my site? My duplicate page content went from 0 to 45, my duplicate page titles went from 0 to 105, my 4xx (client error) went from 0 to 4, and my title missing or empty went from 0 to 3. On the warnings sections my missing meta description tag went form a hand full to 444. (most of these looking to be archive pages.) Down in the notices I have over 2000 that are blocked by meta robots, meta-robots nofollow, and Rel canonical. I didn't have any where near this many prior to the last update of my site. I just wanted to see what I need to do to clean this up, and figure out if I did something to cause all the errors. I'm assuming the red errors are the first things I need to clean up. Any help you guys can provide would be greatly appreciated. Also if you'd like me to post any additional information, please let me know and I'd be glad to.
Moz Pro | | NoahsDad0 -
Title Element too Long - Counts ampersand as &
I was wondering if Google counts the ampersand written as '&' as 5 characters or a single one? It seems that Google counts it as a single character and SEOmoz PRO counts it as 5 characters.
Moz Pro | | DPSSeomonkey0