Sitemap

Try our free XML Sitemap Generator. Available online, for Wordpress and for Windows. Use our XML sitemap generator to produce HTML, RSS and Google XML Sitemaps! Find the latest tips, advice, news stories and videos from the TODAY Show on NBC. Google Sitemap enables the Google's GoogleBot spider to quickly determine what to index on your site. It is primarily a text file that lists the web addresses of every page on your website. The Sitemap protocol necessitates the Site map should be available on your web-server as an XML document. A sitemap is a file where you provide information about the pages, videos, and other files on your site, and the relationships between them. Search engines like Google read this file to more intelligently crawl your site.

  1. How To Create Sitemap
  2. Sitemap Validator

XML sitemaps are a great way to ensure your site is crawled and indexed properly. Learn how to take control and build your own!

Search for recipes and articles. Popular Searches Staff Favorites Burgers Korean.

When it comes to creating an XML sitemap, a car analogy works best. Sure, automatic is great. It’s convenient and affords you an extra hand to turn up that Adele song you love to sing along to terribly. But any driving enthusiast will tell you that a manual shift gives you a closer connection to the vehicle and to the road, and that’s exactly what we’re after – more connection. More control.

These days, there are many options for automating the creation of XML sitemaps, whether through a plugin or an online sitemap generator. Some are better than others (the Yoast plugin for WordPress does a pretty good job), but the machines haven’t replaced us just yet. Automation still does not measure up to a carefully-constructed sitemap by hand. So roll up your sleeves and follow these steps to create and submit custom XML sitemaps that represent your site better than any plugin or tool can.

Step 1: Know What You’re Looking For

An XML sitemap is essentially just a list of the pages that make up your website. But the key thing to remember is that we are only concerned with pages that should be in Google’s index.You don’t want to put a login page or a post-purchase “thank you” page on your sitemap, for instance. Before you set out to gather up the URLs of the pages on a site, let’s ask a simple question:

“Is this a page that should be in Google’s index?”

If you’re a bit more versed in SEO, you can also ask:

“Does the page return a 200 status code?”

and

“Does the page self-canonical?”

Doing this exercise will give meaning to everything we encounter in Step 2.

Step 2: Collect Your Pages

Now that we know exactly what we’re looking for, let’s go find it! In the first part of this step, we’re going to gather up all of the website’s URLs. The easiest way to do this is with a crawler like Screaming Frog, which can quickly crawl the pages of your site and spit out a list of URLs.

Alternatively, you can simply follow each of the site’s main navigation options down to their deepest level (also known as a human crawl). This is actually the method I prefer. If the site isn’t too big, it’s a great way to learn about the navigational logic and user-friendliness of your site.

Let’s use Go Fish Digital’s site as an example. Before I toss it into a crawler, I’m going to browse it manually and gain some insights. My first takeaway, as is often the case, is from the main navigation.

On the far left, we have a logo and branding, which links to the home page. You guessed it – the home page URL is going in the sitemap.

On the right, we have About, Services, Blog, and Contact.

Right away, I’m going to begin grouping. The About and Contact pages are more general pages, like the home page, so I consider those three URLs as a “General” section of the site.

General pages

Next, we have Services and Blog.

Services has a drop-down menu – this is a perfect reason to group these pages together!

Service Pages

Then, the blog. I’ve only displayed 3 posts here, but there are a lot more blog posts on GFD’s site. This is where a crawler would come into play.

Blog posts

Would you look at that? We now have the site sectioned out nicely. With our URLs grouped together like this, we can make a beautifully-organized sitemap!

In the last part of this step, we’re going to take out any pages that don’t hold up to the question(s) we asked in Step 1. I did find a privacy policy page in the footer, and I’ve decided not to include it. It’s not a keyword-focused page that is going to perform well in search. Never forget that you can include or exclude whatever pages you want when creating a sitemap!

Step 3: Code Your URLs

If you’ve applied Step 2 carefully to your website’s pages, you now have a list of URLs that need to be formatted with the appropriate tags. XML is a lot like HTML – in fact, the “ML” in both stands for “markup language.”

For this step, you’ll need a text editor so you can create an XML file. I highly recommend Sublime Text. They offer a lifetime license key, and it will serve your SEO and text-editing future better than the finest hound.

a.) Let’s begin with an opening <urlset> tag:

<urlset xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9>

b.) Next, add your first URL with the appropriate <url> and <loc> tags:

<url>

<loc>https://gofishdigital.com</loc>

</url>

c.) When you’ve entered your last URL, simply close the <urlset> tag:

</urlset>

Now that you know the different tags, get your eyes used to looking at a simple XML sitemap. Here is what the finished product would look like:

Step 4: Validate Your Sitemap

*Please note that the validation method below does not seem to be working consistently anymore. I am seeing perfectly valid sitemaps that are validating in Google Search Console fail the validation test below. As of November 25, 2019, the best method for validating your XML sitemap is to submit it within the Google Search Console account for your specific website.

Now it’s time to run your sitemap through a validator to make sure all the syntax is correct. Go ahead and save your file and name it sitemap.xml. Then, visit https://validator.w3.org/#validate_by_upload and upload your XML file. Hopefully, you see this message:

If there are any errors, the validator will quote the line that contains the error so you can go back into Sublime Text and easily locate it.

Step 5: Add It To The Root

Next, you’ll want to add your sitemap file (sitemap.xml) to the root folder of your site. This can be done locally, through FTP or (ideally) by a developer. Adding your sitemap file to the root folder means that it will be located at yoursite.com/sitemap.xml. This is true for a lot of sites! Trying picking a couple of sites you regularly visit and type “/sitemap.xml” after the TLD (the “.com,” “.net,” etc.).

ex: https://www.apple.com/sitemap.xml

Step 6: Add It To The Robots(.txt)

A robots.txt file is a simple text file with instructions for the crawler that is visiting your site. The file exists in the root folder, so you can probably guess where it’s located – yoursite.com/robots.txt. One of the lines you can add to your robots.txt file is the “Sitemap:” line. This will ensure that the crawler goes and checks out your perdy, custom XML sitemap. Here’s how the the line would look, assuming your site is secure (HTTPS):

Sitemap: https://yoursite.com/sitemap.xml

Apple.com has a number of “Sitemap:” lines in their robots.txt file (https://www.apple.com/robots.txt):

Adding a line to your robots.txt file that points to your sitemap is somewhat debated as effective, but the purpose of this guide is to be thorough, and it is still a best practice I see utilized by many top SEOs and successful websites.

Step 7: Submit Your Sitemap

We gathered, we grouped, we tagged, we validated, and we added to the root. Now we’ll discuss how to submit your sitemap to Google and Bing. Doing so can improve the indexation of your site! Please note that I’m assuming you have Google Search Console and Bing Webmaster Tools accounts set up.

How to submit a sitemap to Google

a.) Sign into your GSC account.

b.) Click Crawl > Sitemaps > Add/Test Sitemap

c.) Enter “/sitemap.xml” into the available field and submit your sitemap!

How to submit a sitemap to Bing

a.) Sign into your BWT account.

b.) Click Configure My Site > Sitemaps

c.) Enter the full URL of your sitemap and submit your sitemap!

Check in periodically (but not obsessively) to ensure your sitemap URLs are being crawled. It is NOT uncommon for only part of your sitemap to be crawled. In fact, we rarely see a sitemap crawled in its entirety. That’s asking a lot and the major search engines love to be coy.

(Bonus) Next-Level Sitemapping: Creating an Index

The whole point of a sitemap is to make the pages of your site as crawler-accessible as possible. To do this, we present them in a simple, organized list. If you want to take order to the next level, you’ll want to create a sitemap index.

A sitemap index is an XML file that refers to a number of individual XML sitemaps. For Go Fish Digital’s site, we could make an individual sitemap for each grouping we created in Step 2:

general_sitemap.xml

services_sitemap.xml

blog_sitemap.xml

We would add each of these files to the root folder of the site and point to them within a sitemap index, which uses its own XML tags:

We would then name the sitemap index, validate, add it to the root folder, and submit it within the search engine consoles for Google and Bing – no need to submit each individual sitemap! The index will take care of everything. Additionally, you can add a “Sitemap:” line to your robots.txt file that points to the index, rather than pointing to each individual sitemap (looking at you, Apple).

A sitemap index with individual sitemaps represents the highest level of organization and is a superb way to present the indexable pages of your site to the major search engines.

Make Your Map(s)!

Whether you’re looking at your own site, a friend’s site, or a client’s site, you now have some great guidelines for creating a meaningful XML sitemap or sitemap index. So build your own custom sitemap and take charge of your SEO, learn more about your website, and cut the fat caused by automation.

Happy mapping!

Follow me on Twitter @briangormanGFD

Jump to:
XML tag definitions
Entity escaping
Using Sitemap index files
Sitemap file location
Validating your Sitemap
Extending the Sitemaps protocol
Informing search engine crawlers

This document describes the XML schema for the Sitemap protocol.

The Sitemap protocol format consists of XML tags. All data values in a Sitemap must be entity-escaped. The file itself must be UTF-8 encoded.

The Sitemap must:

  • Begin with an opening <urlset> tag and end with a closing </urlset> tag.
  • Specify the namespace (protocol standard) within the <urlset> tag.
  • Include a <url> entry for each URL, as a parent XML tag.
  • Include a <loc> child entry for each <url> parent tag.

All other tags are optional. Support for these optional tags may vary among search engines. Refer to each search engine's documentation for details.

Sample XML Sitemap

The following example shows a Sitemap that contains just one URL and uses all optional tags. The optional tags are in italics.

Also see our example with multiple URLs.

XML tag definitions

The available XML tags are described below.

AttributeDescription
<urlset>required

Encapsulates the file and references the current protocol standard.

<url>required

Parent tag for each URL entry. The remaining tags are children of this tag.

<loc>required

URL of the page. This URL must begin with the protocol (such as http) and end with a trailing slash, if your web server requires it. This value must be less than 2,048 characters.

<lastmod>optional

The date of last modification of the file. This date should be in W3C Datetime format. This format allows you to omit the time portion, if desired, and use YYYY-MM-DD.

Note that this tag is separate from the If-Modified-Since (304) header the server can return, and search engines may use the information from both sources differently.

<changefreq>optional

How frequently the page is likely to change. This value provides general information to search engines and may not correlate exactly to how often they crawl the page. Valid values are:

  • always
  • hourly
  • daily
  • weekly
  • monthly
  • yearly
  • never

The value 'always' should be used to describe documents that change each time they are accessed. The value 'never' should be used to describe archived URLs.

Please note that the value of this tag is considered a hint and not a command. Even though search engine crawlers may consider this information when making decisions, they may crawl pages marked 'hourly' less frequently than that, and they may crawl pages marked 'yearly' more frequently than that. Crawlers may periodically crawl pages marked 'never' so that they can handle unexpected changes to those pages.

<priority>optional

The priority of this URL relative to other URLs on your site. Valid values range from 0.0 to 1.0. This value does not affect how your pages are compared to pages on other sites—it only lets the search engines know which pages you deem most important for the crawlers.

The default priority of a page is 0.5.

Please note that the priority you assign to a page is not likely to influence the position of your URLs in a search engine's result pages. Search engines may use this information when selecting between URLs on the same site, so you can use this tag to increase the likelihood that your most important pages are present in a search index.

Also, please note that assigning a high priority to all of the URLs on your site is not likely to help you. Since the priority is relative, it is only used to select between URLs on your site.

Entity escaping

Your Sitemap file must be UTF-8 encoded (you can generally do this when you save the file). As with all XML files, any data values (including URLs) must use entity escape codes for the characters listed in the table below.

CharacterEscape Code
Ampersand&&amp;
Single Quote'&apos;
Double Quote'&quot;
Greater Than>&gt;
Less Than<&lt;

In addition, all URLs (including the URL of your Sitemap) must be URL-escaped and encoded for readability by the web server on which they are located. However, if you are using any sort of script, tool, or log file to generate your URLs (anything except typing them in by hand), this is usually already done for you. Please check to make sure that your URLs follow the RFC-3986 standard for URIs, the RFC-3987 standard for IRIs, and the XML standard.

Below is an example of a URL that uses a non-ASCII character (ü), as well as a character that requires entity escaping (&):

Sitemap

Below is that same URL, ISO-8859-1 encoded (for hosting on a server that uses that encoding) and URL escaped:

Below is that same URL, UTF-8 encoded (for hosting on a server that uses that encoding) and URL escaped:

Below is that same URL, but also entity escaped:

Sample XML Sitemap

The following example shows a Sitemap in XML format. The Sitemap in the example contains a small number of URLs, each using a different set of optional parameters.

Using Sitemap index files (to group multiple sitemap files)

You can provide multiple Sitemap files, but each Sitemap file that you provide must have no more than 50,000 URLs and must be no larger than 10MB (10,485,760 bytes). If you would like, you may compress your Sitemap files using gzip to stay within 10MB and reduce your bandwidth requirement. If you want to list more than 50,000 URLs, you must create multiple Sitemap files.

If you do provide multiple Sitemaps, you should then list each Sitemap file in a Sitemap index file. Sitemap index files may not list more than 1,000 Sitemaps and must be no larger than 10MB (10,485,760 bytes). The XML format of a Sitemap index file is very similar to the XML format of a Sitemap file.

The Sitemap index file must:

  • Begin with an opening <sitemapindex> tag and end with a closing </sitemapindex> tag.
  • Include a <sitemap> entry for each Sitemap as a parent XML tag.
  • Include a <loc> child entry for each <sitemap> parent tag.

The optional <lastmod> tag is also available for Sitemap index files.

Note: A Sitemap index file can only specify Sitemaps that are found on the same site as the Sitemap index file. For example, http://www.yoursite.com/sitemap_index.xml can include Sitemaps on http://www.yoursite.com but not on http://www.example.com or http://yourhost.yoursite.com. As with Sitemaps, your Sitemap index file must be UTF-8 encoded.

Sample XML Sitemap Index

The following example shows a Sitemap index that lists two Sitemaps:

Note: Sitemap URLs, like all values in your XML files, must be entity escaped.

How To Create Sitemap

Sitemap Index XML Tag Definitions

AttributeDescription
<sitemapindex>requiredEncapsulates information about all of the Sitemaps in the file.
<sitemap>requiredEncapsulates information about an individual Sitemap.
<loc>required

Identifies the location of the Sitemap.

This location can be a Sitemap, an Atom file, RSS file or a simple text file.

<lastmod>optional

Identifies the time that the corresponding Sitemap file was modified. It does not correspond to the time that any of the pages listed in that Sitemap were changed. The value for the lastmod tag should be in W3C Datetime format.

By providing the last modification timestamp, you enable search engine crawlers to retrieve only a subset of the Sitemaps in the index i.e. a crawler may only retrieve Sitemaps that were modified since a certain date. This incremental Sitemap fetching mechanism allows for the rapid discovery of new URLs on very large sites.

Sitemap file location

The location of a Sitemap file determines the set of URLs that can be included in that Sitemap. A Sitemap file located at http://example.com/catalog/sitemap.xml can include any URLs starting with http://example.com/catalog/ but can not include URLs starting with http://example.com/images/.

If you have the permission to change http://example.org/path/sitemap.xml, it is assumed that you also have permission to provide information for URLs with the prefix http://example.org/path/. Examples of URLs considered valid in http://example.com/catalog/sitemap.xml include:

URLs not considered valid in http://example.com/catalog/sitemap.xml include:

Note that this means that all URLs listed in the Sitemap must use the same protocol (http, in this example) and reside on the same host as the Sitemap. For instance, if the Sitemap is located at http://www.example.com/sitemap.xml, it can't include URLs from http://subdomain.example.com.

URLs that are not considered valid are dropped from further consideration. It is strongly recommended that you place your Sitemap at the root directory of your web server. For example, if your web server is at example.com, then your Sitemap index file would be at http://example.com/sitemap.xml. In certain cases, you may need to produce different Sitemaps for different paths (e.g., if security permissions in your organization compartmentalize write access to different directories).

If you submit a Sitemap using a path with a port number, you must include that port number as part of the path in each URL listed in the Sitemap file. For instance, if your Sitemap is located at http://www.example.com:100/sitemap.xml, then each URL listed in the Sitemap must begin with http://www.example.com:100.

Validating your Sitemap

The following XML schemas define the elements and attributes that can appear in your Sitemap file. You can download this schema from the links below:

For Sitemaps:http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd
For Sitemap index files:http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd

There are a number of tools available to help you validate the structure of your Sitemap based on this schema. You can find a list of XML-related tools at each of the following locations:
http://www.w3.org/XML/Schema#Tools
http://www.xml.com/pub/a/2000/12/13/schematools.html

In order to validate your Sitemap or Sitemap index file against a schema, the XML file will need additional headers as shown below.

Sitemap:

Sitemap index file:

Extending the Sitemaps protocol

You can extend the Sitemaps protocol using your own namespace. Simply specify this namespace in the root element. For example:

Informing search engine crawlers

Once you have created the Sitemap file and placed it on your webserver, you need to inform the search engines that support this protocol of its location by submitting it to them via the search engine's submission interface or an HTTP request.

The search engines can then retrieve your Sitemap and make the URLs available to their crawlers.

Sitemap Validator

Last Updated: 16 November 2006

Comments are closed.