Way to spider Wordpress site
-
I have an old Wordpress site and I want to move it to a new server and take it off Wordpress (too many hacks). I am trying to spider the site so as to get static, non-Wordpress, pages.
I am having trouble doing this. When I spider the site, it changes the URLs. For instance, if the URL is www.domain.com/page/ the URL I get out of the spider is /page/index.html And those are not the URLs in the search engine indices. There are about 2000 pages on this site, so it is not feasible to set up 301 redirects.
I tried using these spidering programs: WinHTTack Website Copier and PageNest
Does anyone know of another method of turning a Wordpress site into a non Wordpress site?
-
Hi Dan
Hmm that's a little strange. Two things;
- is WordPress updated? Do you get the normal URLs when viewing in your browser?
- have you tried Screaming Frog SEO Spider? It's free to crawl up to 500 pages Although it won't get the actual HTML on the pages, it could solve the URL issue perhaps.
This blackhat world thread has a few options too.
-Dan
-
Hi Dan, I'm not so experienced in migrating a WP to non -wp but I understand that the issue you're having is that the spider is returning index.htmlfiles for urls like domain/page/.
IT's normal, any spider you will use you'll always have and index.html file. Every directory has it's index.html which is the default file to show if you're not establishing something different with rewrite rules.
If you write /page/ the browser will read the index.html file. What you have to be sure is that you'll set up a 301 redirect to avoid any index.html url to show and have it redirected to the main / page (with wildcards is a one line rule) and that your internal links are pointing all to / pages and not to index.html version of it. You can jsut find and replace the /index.html" string into the html code with the /" text (dreamweaver or any html editor will do that in bulk.
Only one commentary on you idea is that you may consider useful to build a php driven site, using includes for header, footer and nav/sidebar, jsut because thinking ahead if you're willing to make changes to a portion of the page repeating throughout the site you'll have to make changes in all pages and uplaod them all which is quite huge to do and also let space for many human/machine errors.
Hope that helped you out!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Swapping Homepages in WordPress
Hi, Th situation is: I've cloned my homepage & added new content to the cloned page I've then updated the WordPress settings to make the cloned page the new homepage Will I lose PA as the new cloned page is in effect a new article?
Technical SEO | | jasongmcmahon0 -
On our site by mistake some wrong links were entered and google crawled them. We have fixed those links. But they still show up in Not Found Errors. Should we just mark them as fixed? Or what is the best way to deal with them?
Some parameter was not sent. So the link was read as : null/city, null/country instead cityname/city
Technical SEO | | Lybrate06060 -
An article we wrote was published on the Daily Business Review, we'd like to post it on our site. What is the proper way?
Part 1
Technical SEO | | peteboyd
We wrote an article and submitted it to the Daily Business Review. They published the article on their website. We want to also post the article on our website for our users but we want to make sure we are doing this properly. We don't want to be penalized for duplicating content. Is this the correct way to handle this scenario written below? We added a rel="canonical" to the blog post (on our website). The rel="canonical" is set to the Daily Business Review URL where the article was originally published. At the end of the blog post we wrote. "This article was originally posted on The Daily Business Review." and we link to the original post on the Daily Business Review. Should we be setting the blog post (on our website) to be a "noindex" or rel="canonical" ? Part 2 Our company was mentioned in a number of articles. We DID NOT write those articles, we were only mentioned. We have also posted those same articles on our website (verbatim from the original article). We want to show our users that we have been mentioned in highly credited articles. All of these articles were posted on our website and are set to be a "noindex". Is that the correct thing to do? Should we be using a rel="canonical" instead and pointing to the original article URL? Thanks in advance MOZ community for your assistance! We tried to do the leg work of our own research for the answers but couldn't find the exact same scenario that we are encountering**.**0 -
Should we dump the https from a client site?
We inherited a site that has both http and https. No e-commerce or data transfer...just html. Should we dump the https certificate? I think it might be causing issues with indexing and possible duplicate content. The https site has a certificate warning message...not good. The URL is www.charlottemechanical.com
Technical SEO | | theideapeople0 -
Wordpress Archive pages
In the SEOMOZ site report a number of errors were found. One of which was no or duplicate meta desctions on certain blog pages. When I drilled down to find these i noticed thosepages are the wordpress autocreated archive pages. When I searched for these through the wordpress control panel through both pages and blogs they were nowhere to be found. Does anyone know how to find these pages or are they not something I need to worry about?
Technical SEO | | laserclinics0 -
Will rankings for my micro site rank better if I 301 redirect it to my main site?
This is my first time asking so I will try to be as clear as possible. Ok, I have a micro site that is an (exact match domain) and the domain is a couple 3-4 years old and ranks very well for several search terms. The main two terms it ranks for are like this. houses for rent in XXXXX XXXXX homes for rent (XXXXX equals a city name) The issue is this site has no backlinks, zero advanced SEO, I only did basic optimization to it when i set the site up. Even site structure, url structure all are not good.
Technical SEO | | Robbie8299
The only page I have ever even seen rank is the main root url. But with all that the site does really good in the top 1-2 results for key search terms. Now, I have a main site that is a very big site that has steadily been climbing in search terms every month with great backlinks, optimized for the city and all.
It currently ranks on second page for the listed search terms listed above. What I want to do is 301 redirect this microsite to my city page on my main site that is much better optimized for the key city terms.
The 301 redirect would point this "root domain" (mymicrosite.com) to my city page that looks like this. www.mymaindomain.com/city/XXXXXXX If I do this will Google rank my main URL city page as well as it ranks this microsite with zero links, seo, etc, etc. What happens if it does not? Will I be able to turn off the 301 redirect and keep the microsite rankings? My main reason for wanting this is I want this city page to rank well and I only want to optimize one site instead of both. Any help would be great!0 -
Problems with changing the page title on Wordpress Site...
The website is http://www.masterpieceinteriors.co.uk/ I'm not sure why the homepage title is reading 'masterpieces' in Google but looks fine here. There seem to be 2 versions of the homepage coming up in google - you can see them both by searching 'masterpieceinteriors' and then 'masterpiece interiors'. Some help would be hugely appreciated!
Technical SEO | | Opiyo0 -
My site ranking
Hello, I have a website and working more than 1 year ago,I worked hard last year and paid alot to make guys write articles from my website to other forums so my keywords rank high and got good visitors, then I get in much care in SEO and found SEOMoz with is very nice,when I downloaded the tool bar it was a shock to find my website is almost zero although the big effort I had, I can do more but I need to guide what I exactly need to improve my website,I almost read alot of the beginner PDF and got good information to work with and can hire people to help too. I did a real big work sharing my subjects and i can see them in top#5 google but for other sites and now i found I am still zero 😞 adding my links inside also didnt help or counted. attached the statistics of the website and the competitors site to let me know which important things to take care to jump over. would be very thankful for detailed help, Best Regards 1_01308477251.png 1_01308477465.png
Technical SEO | | nesr_20200