The sitemap file has become, over the years, a must in SEO. Often cited in discussions related to SEO, it’s closely related to the crawling and indexing of the different URLs present on websites.
Particularly helpful when you design an ecommerce site with many pages, it facilitates the discovery of your online store’s various URLs. I advise you not to neglect this element!L
What is the sitemap file?
To better understand where this very standardized protocol comes from, let’s start with a little history and some key dates:
- 2005: Google creates the first version called Sitemaps 0.84.
- December 2006: Microsoft and Yahoo announce support for Sitemaps 0.90. It was at this point that a protocol was really established, developing a writing standard for all future sitemaps.
Official website: https://www.sitemaps.org/index.html
To return to its main function, the sitemap of a website is a file listing all the URLs that you want to index. It should only contain relevant pages for search engines to visit.
The sitemap can be composed of different kinds of information:
- loc: URL of the page. This is the only mandatory field.
- lastmod: date of last modification
- changefreq: update frequency
- priority: priority of URLs
Here is an example of an XML sitemap file that lists the URL “https://www.site.fr/”:
It’s also interesting to note that the XML format isn’t mandatory. You can, for instance, create a sitemap in text format (.txt). On the other hand, it’ll be impossible for you to indicate the optional elements mentioned above (lastmod, changefreq, and priority). This one is only going to contain the list of URLs.
The purpose of this item is to facilitate the crawling of your site. The crawl represents the exploration of a website by a search engine robot. During its passage, the robot will discover and analyze the source code of the page to reveal the different elements (HTML, images, internal and external links,...).
When this file is declared to the search engines, they’ll regularly analyze it to explore all the URLs listed inside. Then, if everything goes as planned, they’ll index these pages so that they can appear in the results when internet users make requests.
What is a sitemap index?
It’s quite possible for a website to have several sitemaps.
In this case, a sitemap index file can be created. It’ll allow you to list the different sitemaps within a sitemap page. The advantage is that all sitemaps are sent at once.
All WiziShop stores have, by default, sitemap indexes. As an ecommerce player, we prefer to separate the different sections of an online store (static pages, categories, product pages, blog...) into several sitemaps, grouped in an index. I’ll describe this element in more detail at the end of the article.
As for the classic sitemap, the sitemap index must be declared in your webmaster tools.
Information to know about the sitemap
Now that you know the main characteristics and the functioning of a sitemap, here’s some interesting information to consider.
Ability to add it to the location of your choice
A sitemap file can have any name, and its location is unrestricted.
Unlike the robots.txt file, which must be located at the root of the site and have a specific name, the sitemap can be placed anywhere and can have any name of your choosing.
The only requirement is that it must be on the domain name in question.
So, if you want, you can choose a different file name. This change allows you to hide the different URLs that you want to be crawled and indexed.
However, to find the XML sitemap of your competitors, several solutions are available to you:
- Add “/sitemap.xml” to the end of the domain in your browser.
- Consult the site’s /robots.txt file. The sitemap may be declared there...
As the name of the sitemap can be whatever you’d like you’ll perhaps come across a 404 error. Nevertheless, nothing prevents you from trying, because the great majority of the sites use this location.
Ability to have several sitemaps on the same site
As I stated above, websites can have several sitemaps files, but they may not necessarily be listed in an index.
You’re totally free to create several sitemaps in different folders of the website and to declare them, one by one, in the webmaster tools.
Some sites have, for instance, a sitemap for all static pages, another sitemap for the blog, etc.
To illustrate, you can very well have a sitemap at the root level of the site:
https://www.example.com/sitemap.xml and another in terms of the blog: https://www.example.com/blog/sitemap.xml.
Limits of URLs in the sitemap
As I just mentioned, everyone is free to use an almost unlimited number of sitemaps files.
Nevertheless, there are several limitations. Don't worry, though, as you’ll probably never reach them...
The limitations are as follows:
- A sitemap can contain up to 50,000 URLs.
- A sitemap index can contain 50,000 different sitemaps.
- Each website can have up to 500 sitemap index files.
- Once unzipped, the size of the sitemap file shouldn’t exceed 50 MB.
I'll skip the math, but by taking advantage of the sitemap indexes, your total number of URLs to send is limited to 2.5 billion. You’re a bit less worried now, aren’t you?
Specify complete URLs
The protocol stipulates that the URLs in your sitemap must be absolute and not relative.
In other words, they must always start with “http” or “https.”
Indicate only URLs with SEO interest
This is one of the most important points in this article, so I’m emphasizing it.
Since the purpose of the sitemap is to make it easier to crawl the different pages on your website that you want to be explored and indexed, you should only fill in relevant URLs.
Therefore, it’s useless to list those without SEO interest in your sitemap. Add only the useful and indexable pages.
For instance, returning to the theme of ecommerce, avoid indicating the URLs of your cart pages or your filter pages in your sitemap. This is the reason why all these pages, which are also in noindex on WiziShop stores, aren’t present in the file.
These pages are essential to the proper functioning of your ecommerce site. On the other hand, they won’t bring any added value in terms of indexing or SEO.
Files supported in the sitemap
Some sitemaps files may contain specific content such as images, videos, or news. In this case, the XML format is mandatory.
The XML sitemap file listing media or images is rarely used on internet sites.
Generally speaking, the images or videos are inside the website pages already listed in the URLs of the sitemap.
On WiziShop, for example, the URLs of the images on the product page are directly entered in the sitemap dedicated to the products.
Obligation to create a sitemap
The sitemap isn’t mandatory when creating a website. However, it’s highly recommended!
If you have a showcase site with 5 to 10 pages, the sitemap isn’t a huge concern. However, it quickly becomes essential if your site is large.
For ecommerce, for example, it’s an essential element because the site is very dynamic. Many categories are frequently added, as well as new items. The sitemap is then essential to facilitate the discovery of new URLs.
Consideration by search engines
Today, the main search engines Google and Bing take the sitemap into account. You can very easily send it in the dedicated webmaster tools.
Yandex and Baidu also support this protocol.
Difference between sitemap and site map
Finally, to finish with interesting information related to the sitemap, the name of this file can be confused with the site map. It’s thus necessary to differentiate between these two elements.
The sitemap is exclusively intended for search engine bots. It isn’t present in the standard tree structure.
The site map, on the other hand, is designed for internet users. It’s a page, often present in the footer of sites, which lists many links to different pages.
Sitemap and SEO
The sitemap doesn’t directly improve the SEO of a website. It doesn’t work like the optimization of your title tag or your editorial content.
However, it contributes to SEO indirectly:
- It facilitates the exploration of the site’s URLs.
- It improves the indexation of new URLs recently added online.
- It allows for advanced SEO analysis.
- It helps to detect orphan pages.
- It makes Google understand which pages to index.
- It helps in the redesign of a website, when it includes URL changes.
- It speeds up the deindexation of certain URLs.
Facilitate the crawling of the site’s various URLs
The main purpose of the sitemap is to list all the important pages of the site. It thus saves the crawlers time, helping them to quickly discover your relevant URLs.
By the way, the sitemap also allows you to check your indexing coverage in Google’s Search Console tool.
In this report, you can see quite easily if the search engine has encountered problems when crawling certain pages. The most frequently encountered errors are the following:
- Certain URLs in the sitemap can’t be crawled, especially if they were previously blocked for crawling by the robots.txt file.
- Some URLs may be redirected or lead to an error page (301 or 404 for example).
- There may be URLs that Google refuses to index (due to very weak content, duplicate content, etc.).
Speed up the discovery of new URLs
When new pages are added to a website, it can take a while before they’re crawled and indexed on search engines.
By adding your latest URLs to your sitemap, you speed up this process.
Deindex many URLs more quickly
Although the primary purpose of the sitemap is to indicate the URLs that you want Google to explore and index, it also allows many others to be deindexed.
By creating a specific sitemap, listing all the pages on your site with a noindex tag, it becomes a mass deindexing tool and saves time when certain pages are deleted.
All you have to do is submit this listing to the Search Console tool, which is going to encourage the crawling of Google bots on the affected pages so that the search engine realizes for itself that it no longer needs to index them.
The specific sitemap can then be deleted, once the deindexation has been completed.
Discover orphan pages
When a website has a complex structure and is very large, it’s not uncommon that some of its pages are orphaned.
In other words, they may be present in the sitemap but completely absent from the site structure. Without an internal link, the page recovers very little, or not at all, of PageRank.
In other words, they may be present in the sitemap but completely absent from the website’s structure. Without an internal link, the page recovers very little PageRank, if at all.
In this case, by crossing the URLs of your sitemap and a crawl of your site, you can identify if pages do not contain any internal links.
If you notice that some pages of your sitemap are completely absent during the crawling of your site, it means that they’re orphan pages. If you detect any, you should correct this anomaly.
How do you declare your sitemap?
When a sitemap file or sitemap index is created, it must be declared with webmaster tools such as Google Search Console and Bing Webmaster Tools.
How do you declare your sitemap in Google Search Console?
To declare these different elements, you just have to go to the Google Search Console interface. This one is specially designed to allow you to make this declaration and then consult all the statistics associated with it.
To send a sitemap, the steps are as follows:
- Select the right priority for your website.
- Click on the Sitemaps tab.
- Enter the URL of your sitemap in the appropriate field.
- Click the “Submit” button.
As soon as your sitemap file is declared, it’ll be analyzed by Google.
How do you declare your sitemap in Bing Webmaster Tools?
Here, the process is very similar to that of Google Search Console. Once your ownership is validated, you just have to apply the following steps:
- Select the right priority of your website.
- Click on the Sitemaps tab.
- Click “Submit sitemap.”
- Indicate the complete URL of your sitemap in the appropriate field.
- Click the “Submit” button.
The sitemap on WiziShop stores
Finally, to finish this article, I’ll explain the sitemap index that all WiziShop stores have. This is generated and updated automatically.
The sitemaps index file has an appearance that’s very similar to the classic sitemap file.
- a “sitemapindex” tag, which is placed at the beginning and at the end of the file;
- a “sitemap” tag, which is a parent tag for each sitemap in the file; and
- a “loc” tag, which indicates the location of each sitemap.
As you can see above, each section of your online store has its own sitemap:
- Product pages
- Category pages
- Information pages (homepage, legal information, terms and conditions, about us, etc.)
- Blog (blog homepage, blog categories)
- Blog articles
The sitemaps index of WiziShop stores is composed of a maximum of five links to the five sitemaps. If you don’t have a blog, the blog sitemap and the articles sitemap will of course not be displayed.
Within the product sitemap, we’ve also added the images present on each product page. This makes it easier for search engines to find all the product photos and increases your chances of ranking well in the image results.
As you’ll have understood, the presence of a sitemap doesn’t ensure the success of a website or its SEO. You must also pay attention to each page’s meta description, images, and much more. However, it’s a valuable tool to improve the crawling and indexing of your pages by different search engines.
Thanks to the sitemap, the crawlers have easier access to certain pages, which is very valuable within the framework of a dynamic ecommerce site.
Considering the growing number of sites on the internet and the ease of generating sitemaps today, you might as well take advantage!