Convert keyword rich PDFs to web pages (text & images)
-
SteriPEN is a portable water purifier that kills viruses, protozoa, e-coli, etc.
Because of the technical and safety requirements nature of the product, our website has much documentation of testing, organisms affected, and more. These are in pdf form and can often be found through google search (and through links on specific pages).
Because of the keyword-richness of these documents pertaining to microbes SteriPEN kills, etc. does it make sense to convert these pdf's into html text and images?
Then I was thinking perhaps writing a blog post AND generating key links on important landing pages to these documents (as html).
Removing pdfs may be harmful? Not a clue as to the cost/benefit.
-
Google can read PDFs, and returns them in search results, but some users might prefer to view an HTML version. Also, it looks like images in PDFs are not indexed, according to the 2nd post below.
Regarding duplicate content, Google says (2nd post below):
Q: Is it considered duplicate content if I have a copy of my pages in both HTML and PDF?
A: Whenever possible, we recommend serving a single copy of your content. If this isn’t possible, make sure you indicate your preferred version by, for example, including the preferred URL in your Sitemap or by specifying the canonical version in the HTML or in the HTTP headers of the PDF resource. For more tips, read our Help Center article about canonicalization.These will be of interest to you:
http://www.google.com/support/forum/p/Webmasters/thread?tid=4472512a5515686b&hl=en&fid=4472512a5515686b00047d6de91c24fa&hltp=2
http://googlewebmastercentral.blogspot.com/2011/09/pdfs-in-google-search-results.html
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is it a problem to use a 301 redirect to a 404 error page, instead of serving directly a 404 page?
We are building URLs dynamically with apache rewrite.
Intermediate & Advanced SEO | | lcourse
When we detect that an URL is matching some valid patterns, we serve a script which then may detect that the combination of parameters in the URL does not exist. If this happens we produce a 301 redirect to another URL which serves a 404 error page, So my doubt is the following: Do I have to worry about not serving directly an 404, but redirecting (301) to a 404 page? Will this lead to the erroneous original URL staying longer in the google index than if I would serve directly a 404? Some context. It is a site with about 200.000 web pages and we have currently 90.000 404 errors reported in webmaster tools (even though only 600 detected last month).0 -
Should we show(to google) different city pages on our website which look like home page as one page or different? If yes then how?
On our website, we show events from different cities. We have made different URL's for each city like www.townscript.com/mumbai, www.townscript.com/delhi. But the page of all the cities looks similar, only the events change on those different city pages. Even our home URL www.townscript.com, shows the visitor the city which he visited last time on our website(initially we show everyone Mumbai, visitor needs to choose his city then) For every page visit, we save the last visited page of a particular IP address and next time when he visits our website www.townscript.com, we show him that city only which he visited last time. Now, we feel as the content of home page, and city pages is similar. Should we show these pages as one page i.e. Townscript.com to Google? Can we do that by rel="canonical" ? Please help me! As I think all of these pages are competing with each other.
Intermediate & Advanced SEO | | sanchitmalik0 -
Do image "lightbox" photo gallery links on a page count as links and dilute PageRank?
Hi everyone, On my site I have about 1,000 hotel listing pages, each which uses a lightbox photo gallery that displays 10-50 photos when you click on it. In the code, these photos are each surrounded with an "a href", as they rotate when you click on them. Going through my Moz analytics I see that these photos are being counted by Moz as internal links (they point to an image on the site), and Moz suggests that I reduce the number of links on these pages. I also just watched Matt Cutt's new video where he says to disregard the old "100 links max on a page" rule, yet also states that each link does divide your PageRank. Do you think that this applies to links in an image gallery? We could just switch to another viewer that doesn't use "a href" if we think this is really an issue. Is it worth the bother? Thanks.
Intermediate & Advanced SEO | | TomNYC0 -
Page loads fine for users but returns a 404 for Google & Moz
I have an e-commerce website that is built using Wordpress and the WP E-commerce plug-in, the products have always worked fine and the pages when you view them in a browser work fine and people can purchase the products with no problems. However in the Google merchant feed and in the Moz crawl diagnostics certain product pages are returning a 404 error message and I can't work out why, especially as the pages load fine in the browser. I had a look at the page headers and can see when the page does load the initial request does return a 404 error message, then every other request goes through and loads fine. Can anyone help me as to why this is happening? A link to the product I have been using to test is: http://earthkindoriginals.co.uk/organic-clothing/lounge-wear/organic-tunic-top/ Here is a part of the header dump that I did: http://earthkindoriginals.co.uk/organic-clothing/lounge-wear/organic-tunic-top/
Intermediate & Advanced SEO | | leapSEO
GET /organic-clothing/lounge-wear/organic-tunic-top/ HTTP/1.1
Host: earthkindoriginals.co.uk
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:21.0) Gecko/20100101 Firefox/21.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: __utma=159840937.1804930013.1369831087.1373619597.1373622660.4; __utmz=159840937.1369831087.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); wp-settings-1=imgsize%3Dmedium%26hidetb%3D1%26editor%3Dhtml%26urlbutton%3Dnone%26mfold%3Do%26align%3Dcenter%26ed_size%3D160%26libraryContent%3Dbrowse; wp-settings-time-1=1370438004; __utmb=159840937.3.10.1373622660; PHPSESSID=e6f3b379d54c1471a8c662bf52c24543; __utmc=159840937
Connection: keep-alive
HTTP/1.1 404 Not Found
Date: Fri, 12 Jul 2013 09:58:33 GMT
Server: Apache
X-Powered-By: PHP/5.2.17
X-Pingback: http://earthkindoriginals.co.uk/xmlrpc.php
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Cache-Control: no-cache, must-revalidate, max-age=0
Pragma: no-cache
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 6653
Connection: close
Content-Type: text/html; charset=UTF-80 -
Does having multiple links to the same page influence the Link juice this page is able to pass
Say you have a page and it has 4 outgoing links to the same internal page. In the original Pagerank algo if these links were links to an page outside your own domain, this would mean that the linkjuice this page is able to pass would be devided by 4. The thing is i'm not sure if this is also the case when the outgoing link, is linking to a page on your own domain. I would say that outgoing links (whatever the destination) will use some of your link juice, so it would be better to have 1 outgoing link instead of 4 to the same destination, the the destination will profit more form that link. What are you're thoughts?
Intermediate & Advanced SEO | | TjeerdvZ0 -
Different TITLE for the same page appear for different keywords
Hi there Can anyone advice please on this funny/strange issue I have title on home page. When I type some of keywords the homepage appears in SERP with shortcut TITLE (just one keyword there). But when I type company name I have full TITLE. Could anybody advice please what can be a problem and how to fix it?
Intermediate & Advanced SEO | | fleetway0 -
What causes internal pages to have a page rank of 0 if the home page is PR 5?
The home page PageRank is 5 but every single internal page is PR 0. Things I know I need to address each page has 300 links (Menu problem). Each article has 2-3 duplicates caused from the CMS working on this now. Has anyone else had this problem before? What things should I look out for to fix this issue. All internal linking is follow there is no page rank sculpting happening on the pages.
Intermediate & Advanced SEO | | SEOBrent0 -
How can I change my website's content on specific pages without affecting ranking for specific keywords?
My client's website (www.nursevillage.com) content has not been touched for 4 years and we are currently ranking #1 for "per diem nursing". They do not want to make any changes to the site in fear that it might decrease our rankings. We want to try to use utilize that keyword ranking on specific pages (www.nursevillage.com/nv/content/careeroptions/perdiem.jsp ) ranking for "per diem nursing" and try redirecting traffic or placing some banners and links on that page to specific pages or other sites related to "per diem nursing" jobs so we can get nurses to apply to our new nursing jobs. Any advice on why "per diem nursing" is ranking so high for us and what we can change on the site without messing up our ranking would be greatly appreciated. Thanks
Intermediate & Advanced SEO | | ryanperea1000