Sometimes, for one reason or the other, there are site pages or files that you –as the website owner- do not wish to be crawled or visited. It could be because you have duplicate pages on your website and you may not want both to appear in the search engines, or you do not want search engines to index certain areas of your website, or to index certain files such as images or PDFs.  Whatever the reasons, robot.txt files have you covered. They are composed of disallowing and allow statements which issue instructions to robots on what URLs can be crawled on a website. Robot.txt files are also extremely helpful if you want to tell search engines where your sitemap is located. However, you might be surprised to hear that this file, if not used correctly, could be the downfall of your website as your site pages will not be listed at all by the search engines!

Robot.txt Tools to Test Functionality

It is important to know which tools you can use to test the functionality of the file. By now, I am sure you are curious to know if there are tools that test if your robot.txt file is working or not. Yes, there are, and I am glad to share them with you, and how they work!

Robots.txt Tester

If you are looking to block Google web crawlers from specific URLs on your site, this tool will instantly sort you out. This means that you won’t have to worry about your whole website being blocked from crawling. What is outstanding about this tool is that it verifies that your URL has been blocked properly. The only limitation to Robots.txt.Tester is that it ONLY tests files with Google User agents such as Googlebot.   

Here’s how to use the tool:

  • Open the Tester tool for your site; scroll through the robots.txt code to locate highlighted logic errors.
  • There is a text box at the bottom of the page of interest. Type in the URL of the page in the box.
  • Select the user agent you want to use from the dropdown list in which Googlebot is the default agent.
  • Click the TEST button in order test access.
  • Upon clicking the TEST button, it will read either Accepted or Blocked.
  • Retest as necessary until you have achieved what you want.

Testing with Ryte’s software

Inspired by Ryte, this tool, in addition to checking if crawling on your given URL is allowed or not, helps monitor your robots.txt to ensure a great website performance by enabling you to analyze and optimize up to 100 URLs. You only need to get an account with them, and it is absolutely free.

How to Use:

  • Simply enter the URL of interest.
  • Select the respective User Agent.
  • Click the Start Test and evaluate.

The Frog SEO Spider

If you have thousands of pages and are having difficulties identifying which ones are blocked, and those allowed to be crawled, then you definitely need this tool. Its uniqueness is in how it is able to detect if a URL has been blocked by mistake. We already know how blocking URLs by mistake can have a huge impact on site visibility in the search results.

How to Use:

  • Download the SEO Spider (It’s free in lite form and can crawl up to 500 URLs).
  • Open up the SEO Spider.
  • Type or copy in the site you wish to crawl in the “Enter URL to Spider” box.
  • Hit “Start”.

Robot.txt files are useful if you want search engines to ignore any duplicate pages on your website, if you do not want search engines to index certain areas of your website (or even a whole website), or if you do not want them to index certain files.  However, it is also possible to block this URL by mistake, especially if you do not understand how the files work. Consequently,this mistake can hugely have a huge impact on visibility of your website in the search results. These tools prevent you, as the website owner, from these unintentional mistakes.   Have you used these tools before?  Let us know in the comments below!

 Sources:

Screaming Frog SEO Spider

Share This
Skip to toolbar