Home page/SEO Guide/Technique/Generate a sitemap and improve the exploration of your site by robots

How to create a good sitemap? We will give you all the tips in this article.

Contents

What is a sitemap?

A sitemap is a file that provides search engines with detailed information on all pages, their relationships and their contents (images, videos ...). It helps Google and other search engines to crawl a site in a more intelligent way. Each item in the list is directly linked to the corresponding categories of the site. Generally speaking, the sitemap is recommended for all sites, but small websites containing only a few pages can do without it. All the big sites should be interested in creating one to improve the exploration of their pages by search engines. The sitemap is also recommended for sites with weaknesses in their internal linking. Some pages can indeed be difficult to access because they are not naturally linked. The sitemap is also useful for new sites that receive few backlinks, which does not encourage Google to browse the entire site.
Be careful, if it is tempting to make the shortcut sitemap = indexing of all pages, in reality it has only an indirect influence. It simply allows Google to find the pages, but it will be the only judge of their quality and of the utility to index them. That said, by submitting an XML sitemap in the Google Search Console, you tell Google which pages you consider as quality pages. If they are really quality pages, then you put all the chances on your side to see them appear in the SERPs.

What can a sitemap contain?

Before you start creating a sitemap, you should ask yourself which pages deserve to be included. Always start by thinking about the relevance of a URL: would it be a good result on Google? Does it meet a user need? If not, then don't include it. However, this does not mean that it will not be indexed or crawled. If you definitely want to exclude it from the search results, the no-index meta tag is required.

The internal URLs of the site

The first utility of the sitemap is to simply list the internal URLs of a site. The advantage of the XML format is that it allows you to add metadata, which will enrich this list of URLs.
We can notably add:

  • A notion of temporality, such as the date of the last URL modification.
  • The modification frequency.
  • The degree of importance of the URL in the site’s internal linking.

Of course, as always, Google remains very vague about the real impact of this metadata. But when in doubt, why deprive yourself of it?

Sitemap-XML

Images and videos

The XML sitemap media or image is not always useful for most sites. Indeed, images and media are usually found in the pages of your sites already present in the list of URLs of your sitemap. They are therefore explored at the same time as the page. Some sites are exceptions, such as those built in the form of a portfolio (often the case for photographers or graphic designers). In these cases, showing an XML sitemap to distinguish media and images from textual content pages can be wise.
To provide Google with useful information about your images, you should add relevant details to the standard sitemap. This includes image type, subject, caption, title, location, etc.

How to create a sitemap?

  • Manually, by creating an XML file. This method is generally not recommended, unless you really know what you are doing! At least it is advised to use an XML editor to create this file.
  • By making a specific development for your site: this method will be the most powerful and will adapt to all the problems of your sites. But it is also the most expensive in terms of resources. Be careful with the maximum number of URLs in a sitemap file, it is sometimes necessary to segment in several files.
  • By using an automatic sitemap generation tool (easily found on Google). This method is tempting: very little work for a professional result! But beware it has a disadvantage of maintainability because if you change something you have to restart the generator each time.

Rules to follow:

  • The XML file must be saved in UTF-8.
  • A sitemap can only list a maximum of 50,000 URLs and the size of the XML file must not exceed 50 MB (52 428 800 bytes).
  • All URLs listed in a sitemap XML file must come from the same host, such as my-domain.com for example.

Your sitemap must be structured in an XML format (sitemap.xml). If you use a standard CMS like WordPress, Joomla, Magento, Prestashop or Drupal, there are plugins that generate your sitemap very easily and allow you to upload it to Google Search Console.

Generate a sitemap on WordPress with Yoast SEO

Creating a sitemap with the Yoast SEO plugin is quick and intuitive:
- Download, install and activate the Yoast SEO plugin (Plugins > add new).
- In the WordPress left menu, go to SEO > General and select the Features tab
- Scroll down to XML sitemaps and enable the sitemap. Save.
If you only want to generate a standard sitemap, you don't have to do anything else.
If you want to modify it :
From the dashboard, click on SEO > Search Appearance.
Choose the type of content you want to appear in your sitemap by enabling or disabling the button.
You can also exclude specific posts or pages:
Go to the page in question and scroll down to the Yoast SEO insert.
Click on the small gear to change the settings.
In the "Allow search engines to display this page in their search results" menu, select "no" and save.

NB: beware of the consequences of this manipulation, which must be thought through beforehand. Generally, it is recommended to exclude pages such as legal notices that are not intended to be indexed.

Generate a sitemap on Prestashop

Google Sitemap is the most used free module to generate a sitemap on a Prestashop. It can be downloaded on Github.

In the administration panel > Modules and services, type gsitemap.

Once installed, click on "Configure", you will get a form in which you will have to indicate the average update frequency of your online business. The module will take into account each modification to refresh the sitemap.
The following checkboxes proposed in the form allow you to exclude certain pages when generating the sitemap. As with Yoast SEO on WP, it is recommended to check only the pages that have no interest in terms of SEO: shopping cart, customer account, order history, legal information etc.

Once you have selected your pages, click on the "Generate sitemap" button. It takes a few minutes before the finalization of the sitemap.

Generate a sitemap on Drupal

The Views module is a fast but a little more complex method to generate a sitemap on Drupal. You just have to:

  • Create the view with the items you want to see in the sitemap; then in the pagination options, select the pagination options and select "Display all items".
  • Configure the output in XML by going to Format > XML Data document. In the parameter page that appears, specify:

urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9″ in the Root element name field URL in the Top-level child element name fieldApply changes

Be careful, you must remember to modify the path in Page settings in sitemap.xml.

  • Add the field Content: path in "Fields".
  • Indicate loc in the label field.
  • In Rewrite Result > check the box "Use Absolute link".
  • Apply.
  • Then add the field Content: updated date.
  • In the " label " field, indicate " lastmod " then select the Custom date format with the following format: Y-m-d
  • Apply.

Generate a sitemap on Joomla

We recommend the OSMap Joomla Sitemap extension which is the most popular and easiest to use. It can be downloaded from the OSMap page.
- In your admin area Extensions > Manage > Install OSMap via the Joomla installer.
- In Components > OSMap. You should see a page with 2 side menus "Sitemaps" and "Extensions".
- Click on "Default Site Map" and select all the elements you want search engines to find. Save.
- Click on "Extension" in the left menu, then on "OSMap - Joomla Content".
- Allow search engines to find your Joomla sitemap.

Generate a sitemap on Magento

Go to the Magento administration panel and access the Catalog > Google Sitemap tab.

  • Click on the Add button.
  • Type sitemap.xml in the field File name. The Path field defines in which server directory the sitemap.xml file will be saved. Usually, the file is saved in the root directory of Magento, in this case, only insert the slash "/" in the field.
  • In the Store view field, select the store view for which you configure the sitemap.
  • Save.

Configure the sitemap in Google Search Console

Once you have generated your sitemap on your CMS, it is important to import it to your Google Search Console account so that it is properly taken into account.

sitemap-parameterization-google-search-console

The sitemap import is very fast:

  • Connect to Google Search Console.
  • Select the website concerned.
  • Expand the "Index" section, then click on "Sitemaps".
  • Enter sitemap_index.xml in the text box that appears.
  • Click on Submit.

And that's it! Of course, if your sitemap changes regularly, you should think of updating it in the GSC.

Our advice to optimize its use

Make sitemaps according to the types of pages (categories, products..).

For the big sites which need a sitemap file, you can make 2 types:

  • 1 listing the last created pages (to try to gain in indexing speed)
  • 1 by type of pages (to try to measure the indexing rate by types of pages, for example product sheets, categories, editorial articles, etc.)

Make sitemaps by language and/or by country

If you have a multilingual site, it is a good idea to separate your sitemap (or sitemaps) into several: one per language. If you have several sitemaps (by types of pages), re-split by languages.

If you have a site that targets several countries, there too you should separate them.

In both cases, the idea is to facilitate the analysis of the indexing rate of pages according to page types, languages and countries.

Technical information:

  • Sitemap XML : la balise <urlset>

La balise <urlset> est obligatoire. Elle englobe le fichier sitemap et référence le standard de protocole utilisé.

  • Sitemap XML : la balise <url>

La balise <url> est également obligatoire. Elle représente la balise parent pour chaque URL référencées.

  • Sitemap XML : la balise <loc>

La balise <loc> est la dernière des trois balises obligatoires. Elle représente l’URL de la page. Celle-ci doit commencer obligatoirement par l’intitulé du protocole (http://, https://) et ne doit pas comporter plus de 2048 caractères.

  • Sitemap XML : la balise <lastmod>

La balise <lastmod> est facultative. Elle informe de la date de la dernière modification du fichier/page. Cette date doit être au format date et d’heure du W3C. Pour une question de simplicité, on utilise généralement le format AAAA-MM-JJ

  • Sitemap XML : la balise <changefreq>

La balise <changefreq> est également facultative. Elle représente la fréquence de modification de la page. Cette valeur fournit aux moteurs de recherche une information générale et est considérée comme une indication, et non comme une commande. Même si les robots d’exploration des moteurs de recherche peuvent tenir compte de cette information, ils ne l’appliquent pas nécessairement de façon stricte.

Accepted values are: "always", "hourly", "daily", "weekly", "monthly", "yearly" and "never".

The value "always" should be used to describe documents that change with each access. The value "never" should be used to describe URLs that are considered archived.

  • Sitemap XML : la balise <priority>

La balise <priority> est la dernière des trois balises facultatives. Elle représente la priorité d’une page par rapport aux autres du site. Les valeurs acceptées sont comprises entre 0.0 et 1. Par défaut (sans balise <priority>), la priorité d’une page est égale à 0.5.

This value only allows you to notify search engines of the pages you deem most important for crawlers.

The XML Sitemap file: audit and fix problems

Checking the presence of sitemaps

  • On the Google Search Console, go to " Sitemaps " on the right, then consult the list of sitemaps displayed

sitemap-verification-presence-gsc

  • The list displays the names and types, the dates of sending and last read as well as the status and number of URLs discovered:

sitemap-envoyes-gsc

If there are one or more sitemaps displayed:

Check the information:

  • If some of them show errors instead of "Operation performed": check with your technical team what is wrong and correct it.
  • If the number of discovered URLs does not match: open the sitemap with the URL indicated in the column "Sitemap" on the left and check if the count is correct.
    Then check that the URLs present in the sitemap(s) are not de-indexed (meta tag robots noindex) or blocked by the robots.txt. In this case, remove them from the sitemap.
    Then check that the URLs present in the sitemap(s) are up to date. As the sitemap is not mandatory for indexing, it is not a blocking factor that it is not up to date. On the other hand, you should not wait too long and try to keep it up to date as much as possible because it allows you to index pages more easily and thus to control the indexation of your site.

If there's no sitemap sent:

Check with your technical team if there is a non declared sitemap.
You can also test by yourself by searching in your browser "mydomain.com/sitemap.xml", but it is not a guaranteed test because the sitemap can have another name, and you could have several.

In the case where your technical team’s answer is negative: if your site presents more than 5,000 pages, we recommend you to add a sitemap in order to indicate to Google the priority of - the pages’ crawl and to see more clearly in your indexation.

You can create several sitemaps: by language, by product category, etc.

If you have a smaller site, you can also create one: the interest here is mainly to have a clear report of valid URLs and follow your indexation. It can also allow you to have your pages discovered faster if you are creating pages and your site is still very new.

Common mistakes when generating the sitemap

Bad protocol used

It happens that the sitemap does not use the URL format. For example, it does not include "www" or displays "HTTP" instead of "HTTPS".
If you have recently changed the URL format, you may have formatted the URLs incorrectly or forgotten to update the sitemap (especially if you are not using an in-house CMS).

Bad implementation of the rewriting rules

When the sitemap does not work on an Apache or Nginx server, it is very likely that there is an error in the implementation of the writing rules.

Integration of forbidden URLs for indexing

Sometimes, without realizing it, you generate a sitemap containing URLs blocked in the robots.txt. But as a rule, Google indicates this kind of error. This can also allow you to realize possible errors in your robots.txt file.

Not segmented sitemap

Basically, the sitemap.xml file imposes a limited number corresponding to 50,000 URLs at most.
The big sites are thus sometimes in the obligation to segment their file in several sitemaps to pass under this limit. But this segmentation is also important for the management of the sitemap and simply to find your way! The key is to structure your sitemaps according to the depth of your pages and by category, so you can better diagnose certain problems, especially if your site is multilingual.

Conclusion

Even if it does not directly condition the success of your site, taking the time to build a consistent sitemap is necessary to improve the exploration of your pages by the various search engines. By giving them easier access to some deep pages, you will put all the chances on your side to see them one day indexed in the SERPs.

   Article written by Louis Chevant

Further reading

The complete guide to Internal Meshing

The step-by-step method to build your semantic cocoons, your mesh and the optimal tree structure of your website.

Download for free