Googlebot and other spiders are searching for odd links in our website trying to understand why, and what to do about it.
-
I recently began work on an existing Wordpress website that was revamped about 3 months ago. https://thedoctorwithin.com. I'm a bit new to Wordpress, so I thought I should reach out to some of the experts in the community.Checking ‘Not found’ Crawl Errors in Google Search Console, I notice many irrelevant links that are not present in the website, nor the database, as near as I can tell. When checking the source of these irrelevant links, I notice they’re all generated from various pages in the site, as well as non-existing pages, allegedly in the site, even though these pages have never existed.
For instance:
- https://thedoctorwithin.com/category/seminars/newsletters/page/7/newsletters/page/3/feedback-and-testimonials/ allegedly linked from:
- https://thedoctorwithin.com/category/seminars/newsletters/page/7/newsletters/page/3/ (doesn’t exist)
In other cases, these goofy URLs are even linked from the sitemap. BTW - all the URLs in the sitemap are valid URLs.
Currently, the site has a flat structure. Nearly all the content is merely URL/content/ without further breakdown (or subdirectories). Previous site versions had a more varied page organization, but what I'm seeing doesn't seem to reflect the current page organization, nor the previous page organization.
Had a similar issue, due to use of Divi's search feature. Ended up with some pretty deep non-existent links branching off of /search/, such as:
- https://thedoctorwithin.com/search/newsletters/page/2/feedback-and-testimonials/feedback-and-testimonials/online-continuing-education/consultations/ allegedly linked from:
- https://thedoctorwithin.com/search/newsletters/page/2/feedback-and-testimonials/feedback-and-testimonials/online-continuing-education/ (doesn't exist).
I blocked the /search/ branches via robots.txt. No real loss, since neither /search/ nor any of its subdirectories are valid.
There are numerous pre-existing categories and tags on the site. The categories and tags aren't used as pages. I suspect Google, (and other engines,) might be creating arbitrary paths from these. Looking through the site’s 404 errors, I’m seeing the same behavior from Bing, Moz and other spiders, as well.
I suppose I could use Search Console to remove URL/category/ and URL/tag/. I suppose I could do the same, in regards to other legitimate spiders / search engines. Perhaps it would be better to use Mod Rewrite to lead spiders to pages that actually do exist.
- Looking forward to suggestions about best way to deal with these errant searches.
- Also curious to learn about why these are occurring.
Thank you.
-
Thanks, Kevin.
Glad I'm not the only one.
Disabling tags and categories aren't an option, in my case. Guess I need to look at more of the potential upside. Seems tags and categories, if handled correctly, could provide a new way to engage visitors and search engines.
I've heard people refer to 'spidering budgets, or whatnot'. Guess it's an entirely new topic of discussion... if limiting the spurious spider searching, (from good spiders,) means that said spiders will spend more time on the conventional pathways of a site.
-
Thanks, Vjay.
Did a lot of work fixing links in the database.
The issue was occurring even before implementation of WP super cache, and before the link fixing.
Being new-ish to WP, it seems strange that it's so willing to:
-
provide access via directories that don't really exist:
-
categories, tags, even search, if using a theme-provided site search.
I'm getting better at .htaccess, so I'm able to handle a lot of the old incoming links fairly well. In the case of these weird 'in the mind of the spiders' links, will be try to address these as well.
Thanks for your advice about 404 and 301 plugins. Time to look around and see what other useful tools are out there.
-
-
I have the same issue, I have stopped using tags because of all the irrelevant links they cause. Looking forward to reading the comments on this thread.
KJr
-
Hi There,
Your website is built on WordPress and it looks like that there might be spurious entries in the DB, which might also not be getting deleted due to the WP super cache plugin. You may try to empty your cache and install 'all 404 redirect' and 301 management plugins.
I hope this helps.
Regards,
Vijay
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Search visibility of website that only uses H2 tags - will not having H1 damage my visibility?
Excuse the basic question. I host my domain and website on Squarespace. I use a specific theme and after doing a site crawl of my site Moz picked up that Pages and Blog posts 'Missing or Invalid H1' tags (450 issues!). I discovered that my Squarespace theme only using H2 tags. Is this a serious issue that affects my search visibility? What would you recommend that I do to fix this, if anything? I'm starting some SEO and lnikbuilding, but wanted to see if this is an issue that I need to consider. Thanks!!!!
Technical SEO | | twofourseven0 -
Should I keep a website which is outdated or close it down? It has a few links. If I keep it can I redirect people to our newer site?
We are in the process of buying some intellectual property, and it's websites are very dated and only have around 5 external links each. What's the best course of action? Do we close down the sites; then redirect the urls to our current website, or do we leave the sites up, but redirect people to our new site. Reference: current website: www.psychometrics.com Old sites that come with the intellectual property: http://www.eri.com/ plus http://www.hrpq.com/ Thanks, Dan Costigan
Technical SEO | | dcostigan0 -
Spammy nofollow links
Hello, One of our clients - a cleaning business - has a heck of a lot of spammy nofollow links pointing to their site. The majority of the links are from comments or 'pingbacks', most with the anchor text 'cheap nfl jerseys' or 'cyber monday ugg boots'. After researching the subject of spammy nofollow links, it seems there is a lot of uncertainty regarding the negative affect these could have on your SEO efforts. So I guess my question to the community is: if your site was suddenly hit by a plethora of spammy nofollow links, what would you do and why? Cheers, Lewis
Technical SEO | | PeaSoupDigital0 -
Are sidewide badge links can harm your website?
Hey all, I wanted to check if links that have built naturally over the past years, linking from a badge (image) sitewide, can harm the linked website? Here is some more information: 1. It's from a competition that the winners were able to add the badge with the link to their site (the link to our website was to a subpage, not homepage). 2. There are around 15 websites with the badge as a link. The website has around 200 root domain links. There will not be any more websites with the badge, just these 15. 3. The sitewide links percentage are 5% of the overall number of pages linked to our website. Based on the last penguin update (4th of October, 2013), can our website be harmed from the badge link building?
Technical SEO | | stevanl0 -
Content not being spidered
I've got a site with some serious content issues. The builder of the template doesn't understand what I'm asking (they're confusing spidering with indexing). If the page is run through a spider simulator (web confs won't work on this site for some reason) it shows the content is not being seen by Google. The template is Momentum and on Joomla. Most other sites I've found on the web have a similar issue. Basically it's reading the text in the header and footer, but nothing in the body. Any thoughts? www.rocksolidroof.com
Technical SEO | | GregWalt0 -
Ecommerce website with too many links on page
Hi, I'm working on onsite seo for an ecommerce website and my recent report has shown that I have a high number of pages where there are 'too many links on page'. Does anyone have tips on how to avoid this when we're using mega menus, plenty of navigation for the user and links to products on each page? Thanks
Technical SEO | | Will_Craig1 -
How to alter the search result to this?
When searching for "kredittkort" on Norwegian Google I get a search results that looks like this. I want to replicate this, but I'm not sure what information they've provided and how they've done it. It's seems like their both listing products AND have sitelinks connected to a subsite. How is this possible? The sitelinks aren't even subpages of the ranked site. How have they managed this? Also, is the product previews they have?
Technical SEO | | Inevo0 -
Page that has no link is being crawled
http://www.povada.com/category/filters/metal:Silver/nstart/1/start/1.htm I have no idea how the above page was even found by google but it seems that it is being crawled and Im not sure where its being found from. Can anyone offer a solution?
Technical SEO | | 13375auc30