Way to spider Wordpress site
-
I have an old Wordpress site and I want to move it to a new server and take it off Wordpress (too many hacks). I am trying to spider the site so as to get static, non-Wordpress, pages.
I am having trouble doing this. When I spider the site, it changes the URLs. For instance, if the URL is www.domain.com/page/ the URL I get out of the spider is /page/index.html And those are not the URLs in the search engine indices. There are about 2000 pages on this site, so it is not feasible to set up 301 redirects.
I tried using these spidering programs: WinHTTack Website Copier and PageNest
Does anyone know of another method of turning a Wordpress site into a non Wordpress site?
-
Hi Dan
Hmm that's a little strange. Two things;
- is WordPress updated? Do you get the normal URLs when viewing in your browser?
- have you tried Screaming Frog SEO Spider? It's free to crawl up to 500 pages Although it won't get the actual HTML on the pages, it could solve the URL issue perhaps.
This blackhat world thread has a few options too.
-Dan
-
Hi Dan, I'm not so experienced in migrating a WP to non -wp but I understand that the issue you're having is that the spider is returning index.htmlfiles for urls like domain/page/.
IT's normal, any spider you will use you'll always have and index.html file. Every directory has it's index.html which is the default file to show if you're not establishing something different with rewrite rules.
If you write /page/ the browser will read the index.html file. What you have to be sure is that you'll set up a 301 redirect to avoid any index.html url to show and have it redirected to the main / page (with wildcards is a one line rule) and that your internal links are pointing all to / pages and not to index.html version of it. You can jsut find and replace the /index.html" string into the html code with the /" text (dreamweaver or any html editor will do that in bulk.
Only one commentary on you idea is that you may consider useful to build a php driven site, using includes for header, footer and nav/sidebar, jsut because thinking ahead if you're willing to make changes to a portion of the page repeating throughout the site you'll have to make changes in all pages and uplaod them all which is quite huge to do and also let space for many human/machine errors.
Hope that helped you out!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unnatural links from your site
Hi, 24 February got this penalty message in Google webmaster tool. Google detected a pattern of unnatural, artificial, deceptive, or manipulative outbound links on pages on this site. This may be the result of selling links that pass PageRank or participating in link schemes. Already removed all the link on the blog and sent reconsideration request to Google spam team. But request is rejected. Please help me on this or share link with me on same case. Thanks,
Technical SEO | | KLLC0 -
Wordpress Pods and Wordpress SEO by Yoast
Hi I am optimising a new site that has been built in Wordpress using Pods. The Yoast Wordpress SEO plugin is not recognising any content on the site - has anyone any ideas on how to get around this - does it matter - is it the plugin that is at fault rahter than the set up of the site?
Technical SEO | | Highlandgael1 -
If you are organizing the site structure for an ecommerce site, how would you do it?
Should you use not use slashes and use all dashes or use just a few slashes and the rest with dashes? For example, domain.com/category/brand/product-color-etc OR domain.com/anythinghere-color-dimensions-etc Which structure would you rather go for and why?
Technical SEO | | Zookeeper0 -
Wordpress question
I was curious when i run an OSE report on certain websites and their name.wordpress.com shows up with a PA of whatever and a DA of 100. But when I created my wordpress site and post on it, it only has a PA and DA of 1. is this because SEOmoz has not indexed it yet? It is a month old. http://shiftinsurance.wordpress.com/ Can anyone help pls?
Technical SEO | | greasy0 -
Google Custom Site Search
I am an admin on a google custom site search account. I am also the owner of a verified webmaster tools account for the same site. The Custom Search control panel will not let me add URL's or a Site map for on demand indexing, but says "you must submit a sitemap of your own verified sites". Has anyone else has this issue? Does the Owner of the custom search account have to be the owner of the webmaster account, or can the logged in admin be? Thanks
Technical SEO | | SEMPassion0 -
Partial Site Move -- Tell Google Entire Site Moved?
OK this one's a little confusing, please try to follow along. We recently went through a rebranding where we brought a new domain online for one of our brands (we'll call this domain 'B' -- it's also not the site linked to in my profile, not to confuse things). This brand accounted for 90% of the pages and 90% of the e-comm on the existing domain (we'll call the existing domain 'A') . 'A' was also redesigned and it's URL structure has changed. We have 301s in place on A that redirect to B for those 90% of pages and we also have internal 301s on A for the remaining 10% of pages whose URL has changed as a result of the A redesign What I'm wondering is if I should tell Google through webmaster tools that 'A' is now 'B' through the 'Change of Address' form. If I do this, will the existing products that remain on A suffer? I suppose I could just 301 the 10% of URLs on B back to A but I'm wondering if Google would see that as a loop since I just got done telling it that A is now B. I realize there probably isn't a perfect answer here but I'm looking for the "least worst" solution. I also realize that it's not optimal that we moved 90% of the pages from A to B, but it's the situation we're in.
Technical SEO | | badgerdigital0 -
Site revision
our site has complete redesign including site architecture, page url and page content (except domain). It looks like a new site. The old site has been indexed about thirty thousand results by google. now what should i do first?
Technical SEO | | jallenyang0 -
WordPress Pretty Permalinks vs Site Speed
A couple of issues at play here as I wrestle with the best permalink structure for a site I'm toying with now. 1. I know that WordPress wants a unique number in the post to improve performance and db calls. 2. I know that for basic on-page SEO, most of us would opt for CATEGORY/POST or maybe even just post. I constantly change those. It's a bad habit, but sometimes you want the killer headline and a decent title in the post. So here is the issue: I can rewrite or use a plugin (anyone have a favorite) the permalinks to speed up site performance. We all know Google wants that. Maybe the permalink becomes /1234-foo But you know, a number in front of the URL just isn't awfully user friendly. If someone wants to read the foo post, it's nice to send them directly there. So would you trade off a slowdown in site speed for the prettiest permalinks for usability and SEO? And since you're asking a WP question, has anyone heard of a hard cap on static pages where the database starts dragging? The site I have in mind has 400 each posts and pages. Would moving platforms to Drupal or Joomla allow handling that many pages more effectively? Thanks for contributing and any help you can give. George
Technical SEO | | georgebounacos0