Submitting an XML sitemap can provide search engines with a full breakdown of every website page you want them to index. Therefore, it’s an essential step in your Search Engine Optimization (SEO) strategy. However, for the best possible results, you’ll need to use a sitemap validator to ensure that you’re not sending files with errors.
In this article, we’ll talk about what sitemap validators are and how they work. Then we’ll guide you through common errors that you might run into when using a sitemap validator and how to troubleshoot them. Let’s get to it!
What Is a Sitemap Validator?
A sitemap is a file that contains a list of every URL on your website that you want search engines to index. Sitemaps come in either XML or HTML format, with the former being the most popular option.
Technically, you don’t need to submit a sitemap of your website to Google or other search engines. These platforms use crawlers to navigate your site, identify every URL, and index those pages. However, creating a sitemap gives you complete control over which URLs the search engines index and which ones they shouldn’t (such as private or redundant content).
A sitemap validator is a tool that can process those XML or HTML files and make sure they contain no errors. By “errors,” we mean:
- Pages that search engines can’t crawl
- 404 errors
- 401 errors
- Too many URLs in the sitemap
- Non-canonical URLs
If your sitemap contains any of those errors, search engines might not be able to index every page that you list. Manually reading XML files to find issues can take a long time, and you also need to test URLs. Fortunately, sitemap validators enable you to skip all that work and start fixing any errors that they identify.
How to Use a Sitemap Validator
Using a sitemap validator is simple. Depending on which tool you use, you might need to upload an XML file or provide an URL to your website’s sitemap. The latter option could apply if you use a tool such as XML Sitemap Validator.
Enter the URL for the sitemap that you want to check, and the tool will return a report including any errors that it finds.
If you get a clean report with no issues, search engines can index the URLs within the sitemap. You can safely submit the sitemap to Google, Bing, Yandex, or wherever you want without fear. However, if you run into errors, you’ll need to know how to fix them. That brings us to the next section.
5 Common Sitemap Errors and How to Fix Them
Unfortunately, some sitemaps don’t validate perfectly, but we’ll cover some of the most common errors that sitemap validators can find in the files you submit to them. Let’s start by discussing pages with crawling “issues.”
1. Pages With Crawling Issues
Crawling issues are among the most common problems that validators will return. This error means that the service couldn’t crawl one of the pages in your sitemap.
Generally, when the validator or search engine can’t crawl a page, it means one of the following scenarios:
- The page takes too long to load. If your website takes too long to load, the connection with the crawler will time out. That means some pages might not get indexed.
- Your website uses too many redirects. When redirects aren’t set up correctly, your website can end up in a redirection loop. That means search engines won’t be able to crawl it.
- The website is blocking search engines from crawling it. You can configure WordPress to block crawlers (using noindex tags) so that your website doesn’t get indexed. Typically, you might do this while building your site or creating private pages.
- The page returns an error code other than 404 or 401. Sitemap validators can parse 404 errors. However, other HTTP error codes will result in a “crawling issue” warning.
The “crawling issues” error can be ambiguous. However, you can determine the exact problem by visiting the URL in question. If the page loads quickly and correctly, your website might be blocking search engines from crawling it.
If the page loads without errors, we recommend testing your website’s loading times to see if there are performance issues. Otherwise, you should see specific error codes or instances of multiple redirects.
2. 404 Errors
404 errors in a sitemap are easy to solve. If a page no longer exists, you can remove that entry from the sitemap manually or set up a redirect for it. The best option for you will depend on whether that page is still getting traffic.
Website analytics from Google Search Console and other services will reveal if a 404 page is still receiving visitors. In that scenario, your best bet is to set up a redirect to the closest relevant page or post so that you don’t miss out on that traffic. As long as you use a single redirect, it won’t result in a sitemap validation error.
3. 401 Errors
A 401 “unauthorized” error in a sitemap means that crawlers can’t access a specific page because they don’t have the necessary permissions. This error usually pops up when you’re dealing with a page that requires users to log in.
The only solution to this error is removing pages requiring authorization from the sitemap. Any page that only logged-in users can see shouldn’t be indexed. Otherwise, visitors that click on it in the Search Engines Results Pages (SERPs) will find themselves facing a 401 error.
4. Too Many URLs in the Sitemap
Search engines can crawl massive websites with thousands of pages. However, in our experience, sitemaps start displaying errors if you list anywhere near (or over) 50,000 pages.
If that’s your situation, then kudos for the effort. 50,000 pages is a lot. However, most websites with over 50,000 pages probably have multiple URLs from user-generated content. In that scenario, you want to prioritize the most important pages on your site while removing sitemap entries that users might not want to see in the SERPs.
5. Non-Canonical URLs in the Sitemap
Sometimes, search engines might get confused when they see multiple versions of an URL for the same page. For example, you might be able to access a simple blog page using any of the following URLs:
In practice, all those URLs can lead to the same page (if you redirect HTTP traffic to HTTPS). However, search engines might see those URLs as four different entries in a sitemap, leading to validation errors.
The simple way to solve this problem is by designating a canonical URL for your WordPress website. SEO plugins such as Yoast will assign canonical URLs for your site automatically. If you’re using an XML file generated by an SEO plugin, you shouldn’t run into the “non-canonical” error when using a sitemap validator.
As your website grows, using a sitemap becomes more critical. Sitemaps let you tell search engines which pages they should index and which ones to ignore. Furthermore, using a sitemap validator will help you spot errors so that crawlers don’t run into issues while indexing your website.
Just to recap, the five most common errors that you might run into with a sitemap validator are:
- Pages with crawling issues: You’ll need to check your loading times, redirects, and visit your website page to determine the exact problem.
- 404 errors: This error means you should delete the non-existent page from your sitemap or set up a redirect for it.
- 401 errors: Consider removing restricted pages from your sitemap.
- Too many URLs in the sitemap: You may need to be selective about the pages in your sitemap and remove less useful ones.
- Non-canonical URLs in the sitemap: We recommend setting up a canonical URL for specific pages.
Do you have any questions about using a sitemap validator? Let’s talk about them in the comments section below!
Featured Image via hanss / shutterstock.com