Sitemap Help!
-
Hi Guys,
Quick question regarding sitemaps. I am currently working on a huge site that has masses of pages.
I am looking to create a site map. How would you guys do this? i have looked at some tools but it say it will only do up to 30,000 pages roughly. It is so large it would be impossible to do this myself....any suggestions?
Also, how do i find out how many pages my site actually has indexed and not indexed??
Thank You all
Wayne
-
The problem that I have with CMS side sitemap generators is that it often pulls content from pages that are existing and adds entries based off that information. If you have pages linked to that are no longer there, as is the case with dynamic content, then you'll be imposing 404's on yourself like crazy.
Just something to watch out for but it's probably your best solution.
-
Hi! With this file, you can create a Google-friendly sitemap for any given folder almost automatically. No limits on the number of files. Please note that the code is the courtesy of @frkandris who generously helped me out when I had a similair problem. I hope it will be as helpful to you as it was to me
- Copy / paste the code below into a text editor.
- Edit the beginning of the file: where you see seomoz.com, put your own domain name there
- Save the file as getsitemap.php and ftp it to the appropriate folder.
- Write the full URL in your browser: http://www.yourdomain.com/getsitemap.php
- The moment you do it, a sitemap.xml will be generated in your folder
- Refresh your ftp client and download the sitemap. Make further changes to it if you wish.
=== CODE STARTS HERE ===
define(DIRBASE, './');define(URLBASE, 'http://www.seomoz.com/'); $isoLastModifiedSite = "";$newLine = "\n";$indent = " ";if (!$rootUrl) $rootUrl = "http://www.seomoz.com"; $xmlHeader = "$newLine"; $urlsetOpen = "<urlset xmlns=""http://www.google.com/schemas/sitemap/0.84"" ="" <="" span="">xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84 http://www.google.com/schemas/sitemap/0.84/sitemap.xsd">$newLine";$urlsetValue = "";$urlsetClose = "</urlset>$newLine"; function makeUrlString ($urlString) { return htmlentities($urlString, ENT_QUOTES, 'UTF-8');} function makeIso8601TimeStamp ($dateTime) { if (!$dateTime) { $dateTime = date('Y-m-d H:i:s'); } if (is_numeric(substr($dateTime, 11, 1))) { $isoTS = substr($dateTime, 0, 10) ."T" .substr($dateTime, 11, ."+00:00"; } else { $isoTS = substr($dateTime, 0, 10); } return $isoTS;} function makeUrlTag ($url, $modifiedDateTime, $changeFrequency, $priority) { GLOBAL $newLine; GLOBAL $indent; GLOBAL $isoLastModifiedSite; $urlOpen = "$indent<url>$newLine";</url> $urlValue = ""; $urlClose = "$indent$newLine"; $locOpen = "$indent$indent<loc>";</loc> $locValue = ""; $locClose = "$newLine"; $lastmodOpen = "$indent$indent<lastmod>";</lastmod> $lastmodValue = ""; $lastmodClose = "$newLine"; $changefreqOpen = "$indent$indent<changefreq>";</changefreq> $changefreqValue = ""; $changefreqClose = "$newLine"; $priorityOpen = "$indent$indent<priority>";</priority> $priorityValue = ""; $priorityClose = "$newLine"; $urlTag = $urlOpen; $urlValue = $locOpen .makeUrlString("$url") .$locClose; if ($modifiedDateTime) { $urlValue .= $lastmodOpen .makeIso8601TimeStamp($modifiedDateTime) .$lastmodClose; if (!$isoLastModifiedSite) { // last modification of web site $isoLastModifiedSite = makeIso8601TimeStamp($modifiedDateTime); } } if ($changeFrequency) { $urlValue .= $changefreqOpen .$changeFrequency .$changefreqClose; } if ($priority) { $urlValue .= $priorityOpen .$priority .$priorityClose; } $urlTag .= $urlValue; $urlTag .= $urlClose; return $urlTag;} function rscandir($base='', &$data=array()) { $array = array_diff(scandir($base), array('.', '..')); # remove ' and .. from the array / foreach($array as $value) : / loop through the array at the level of the supplied $base / if (is_dir($base.$value)) : / if this is a directory / $data[] = $base.$value.'/'; / add it to the $data array / $data = rscandir($base.$value.'/', $data); / then make a recursive call with the current $value as the $base supplying the $data array to carry into the recursion / elseif (is_file($base.$value)) : / else if the current $value is a file / $data[] = $base.$value; / just add the current $value to the $data array */ endif; endforeach; return $data; // return the $data array } function kill_base($t) { return(URLBASE.substr($t, strlen(DIRBASE)));} $dir = rscandir(DIRBASE);$a = array_map("kill_base", $dir); foreach ($a as $key => $pageUrl) { $pageLastModified = date ("Y-m-d", filemtime($dir[$key])); $pageChangeFrequency = "monthly"; $pagePriority = 0.8; $urlsetValue .= makeUrlTag ($pageUrl, $pageLastModified, $pageChangeFrequency, $pagePriority); } $current = "$xmlHeader$urlsetOpen$urlsetValue$urlsetClose"; file_put_contents('sitemap.xml', $current); ?>
=== CODE ENDS HERE ===
-
HTML sitemaps are good for users; having 100,000 links on a page though, not so much.
If you can (and certainly with a site this large) if you can do video and image sitemaps you'll help Google get around your site.
-
Is there any way i can see pages that have not been indexed?
Not that I can tell and using site: isn't going to be feasible on a large site I guess.
Is it more beneficial to include various site maps or just the one?
Well, the max files size is 50,000 or 10MB uncompressed (you can gzip them), so if you've more than 50,000 URLs you'll have to.
-
Is there any way i can see pages that have not been indexed?
Is it more beneficial to include various site maps or just the one?
Thanks for your help!!
-
Thanks for your help
do you ffel it is important to have HTML + Video site maps as well? How does this make a differance?
-
How big we talking?
Probably best grabbing something server side if your CMS can't do it. Check out - http://code.google.com/p/sitemap-generators/wiki/SitemapGenerators - I know Google says they've not tested any (and neither have I) but they must have looked at them at some point.
Secondly you'll need to know how to submit multiple sitemap parts and how to break them up.
Looking at it Amazon seem to cap theirs at 50,000 and Ebay at 40,000, so I think you should be fine with numbers around there.
Here's how to set up multiple sitemaps in the same directory - http://googlewebmastercentral.blogspot.com/2006/10/multiple-sitemaps-in-same-directory.html
Once you've submitted your sitemaps Webmaster Tools will tell you how many URLs you've submitted vs. how many they've indexed.
-
Hey,
I'm assuming you mean XML sitemaps here: You can create a sitemap index file which essentially lists a number of sitemaps in one file (A sitemap of sitemap files if that makes sense). See http://www.google.com/support/webmasters/bin/answer.py?answer=71453
There are automatic sitemap generators out there - if you're site has categories with thousands of pages I'd split up them up and have a sitemap per category.
DD
-
To extract URLs, you can use Xenu Link Sleuth. Then you msut make a hiearchy of sitemaps so that all sitemaps are efficiently crawled by Google.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
We´re in trouble with our on site internal link optimization - please help
Dear Moz community, We have made a great mistake. Looking at keyword search volumes somehow Moz showed volumes for two keywords which only differentiate by an '**s (plural) as same. **Now we optimized our internal links (all links have the keyword) for the singular word. Now looking at other search volume estimations from competitors we see, that the plural has 5 times bigger volume. Our issue: If we change some of the category links now to another keyword, we will loose our ranking with the singular word. Correct? If we do not change any of our internal links, we will never rank in top 6 with the keyword. (currently singular is 6 and plural is 15) What would you reccomend?
On-Page Optimization | | advertisingcloud0 -
Neglected Blog SEO help
I am wedding & sports photographer in San Antonio. I've been busy with photographing sports (lots of them) & totally neglected SEO work for nearly 2 years. I know it's a broad question but what is the best way to get back on track? Should I create a strong landing page with keyword wedding photographer? Should i make separate websites - one for weddings, one for sports? Is PPC good way to jump start? Do I have website structure problems etc? If anyone can comment, that would be great. I am willing to hire SEO experts so if anyone knows anybody, please let me know My website is www.soobumimphotography.com Thank you in advance
On-Page Optimization | | soobumim0 -
Can someone help with Canonical?
I have a wordpress site that On-Page Grader is saying I don't have Canonical done correctly. Here is the comment. Appropriate Use of Rel Canonical If the canonical tag is pointing to a different URL, engines will not count this page as the reference resource and thus, it won't have an opportunity to rank. Make sure you're targeting the right page (if this isn't it, you can reset the target above) and then change the canonical tag to reference that URL. Recommendation: We check to make sure that IF you use canonical URL tags, it points to the right page. If the canonical tag points to a different URL, engines will not count this page as the reference resource and thus, it won't have an opportunity to rank. If you've not made this page the rel=canonical target, change the reference to this URL. NOTE: For pages not employing canonical URL tags, this factor does not apply. I have quite a few sites and have never had an issue with this. Can anyone help? I tried installing a plugin but that seems to have made it worse. This is the front page of the site btw.
On-Page Optimization | | jonnyholt1 -
Sitemap.xml parameters
Hello, I confused about the parameters when create sitemap.xml. I don'n know how to use these parameters efficiently. Some parameters are:
On-Page Optimization | | JohnHuynh
- **Page changing frequency: **what is a good value? Can I choose "None"?
- **Last modified date: **what is a good value? Can I choose "Don’t specify"?
- **Page priority: **what is a good value? Can I choose "Don’t specify"? Thanks,0 -
Is this sitemap valid?
I seem to be having a problem getting google to index all the pages on my site. Usually within a week they've been indexed. The sitemap url is http://www.local-sex-search.com/sitemap.xml (this is an adult dating site). Any help appreciated.
On-Page Optimization | | SamCUK0 -
New adsense account request rejected - need help
I'm moving my company to Australia, shutting down the US company. Google said I had to request a new Adsense account, so I did. They opened the account, I added the same ads, in the same places, and they have rejected my application. What do I do now? The other account has been open since 2004. They never said a word about this before. After two years of working on improvements, now I'm just about destroyed. I need some help, because I thought I knew what I was doing, but obviously not! As usual. their helpful response is no help at all. http://bit.ly/NPACk - there are no G ads on the front page http://bit.ly/V8ubB5 - this is a typical story http://bit.ly/UpTC2r - this is a typical press release As mentioned in our welcome email, we conduct a second review of your AdSense application once AdSense code is placed on your site(s). As a result of this review, we have disapproved your account for the following violation(s): Issues: - Site does not comply with Google policies --------------------- Further detail: Site does not comply with Google policies: We're unable to approve your AdSense application at this time for one of the reasons listed below or another reason listed in our program policies ([https://support.google.com/adsense/bin/topic.py?topic=1271507](https://support.google.com/adsense/bin/topic.py?topic=1271507)). We recommend that you review the information provided below and make the necessary changes to your site. 1\. You need to improve your site’s user experience To ensure a good experience for users and advertisers, publishers participating in the AdSense program are required to adhere to the Webmaster Quality guidelines ([http://www.google.com/support/webmasters/bin/answer.py?answer=35769](http://www.google.com/support/webmasters/bin/answer.py?answer=35769)). These guidelines provide many tips to help you to provide a positive experience for your users. You’ll also find more useful information in this AdSense blog post which highlights five user experience principles: [http://adsense.blogspot.com/2012/10/publisher-insights-part-1-5-principles.html](http://adsense.blogspot.com/2012/10/publisher-insights-part-1-5-principles.html). Applying these principles will help you to provide a great experience for users on your site. 2\. Your site is a chat site which is not compliant with our policy Publishers are encouraged to experiment with a variety of ad placements and ad formats. However, as stated in our program policies ([http://support.google.com/adsense/bin/answer.py?hl=en&answer=48182](http://support.google.com/adsense/bin/answer.py?hl=en&answer=48182)), AdSense publishers may not place ad code, search boxes or search results in chat programs. This includes, but is not limited to, instant messaging (IMs), chat sites and other pages that contains dynamic content. 3\. You need to remove all content that encourages violation of Google product policies Publishers may not provide the means to circumvent the policies of any Google products, such as by allowing users to download YouTube videos, or encourage the violation of Google AdSense policies. Moreover, publishers may not make use of Google brand features such as logos, screenshots, or other distinctive features without our express permission. For more information, please visit our Help Center ([http://support.google.com/adsense/bin/answer.py?hl=en&ctx=as2&answer=1348688&rd=1](http://support.google.com/adsense/bin/answer.py?hl=en&ctx=as2&answer=1348688&rd=1)). 4\. Your site is dedicated to the sale and distribution of term papers We’re happy to see our publishers’ sites full of useful and informative content, however, as stated in our program policies ( [https://www.google.com/adsense/support/as/bin/answer.py?hl=en&answer=105953](https://www.google.com/adsense/support/as/bin/answer.py?hl=en&answer=105953) ), the sale or distribution of term papers, or any other content that is illegal, promotes illegal activity, or infringes on the legal rights of others is not allowed. Please review the AdSense program policies ([http://support.google.com/adsense/bin/answer.py?hl=en&answer=48182](http://support.google.com/adsense/bin/answer.py?hl=en&answer=48182)) to ensure that your site meets all of the requirements for approval. As soon as you’ve made the necessary changes, we’ll be happy to take another look at your application.
On-Page Optimization | | loopyal0 -
Will a new domain name help rankings
If I purchase a domain name that links to my site with the new domain name being keyword specific....will that help boost rankings in Google? Reason I ask is that a particular website always ranks higher than ours because of their domain name (keyword specific). They are currently not even "open" and they still manage to rank high. I checked for links with the seomoz tools but did not see any high links etc.. Thanks!
On-Page Optimization | | teachcsg0 -
URL 404 errors after crawl? HELP!
I am getting Crawl errors. It shows multiple pages as. I know this is more of a technical question however, I cannot find the answer anywhere. I'm using wordpress www.mydomain.com/title-of-page/mydomain.com/contact WHAT IS THIS?!
On-Page Optimization | | ChristineWeinbrecht0