Create SEO-friendly Website Architecture (1)

Generate and Submit Robots.txt to WP for Better SEO

Without knowing how the search engine works, it is not possible to understand what robots.txt file is. 

A quick overview of search engine working:

The search engine never ranks a live website on Search Engine Result Page (SERP). But, when a new website gets online, the web crawlers crawl the website and report new changes and many parameters on the website to the search engine. 

While in the indexing process search engine downloads the WebPages to the index page. 

So, when a search query occurs on the search bar, the search engine ranks relevant WebPages to its SERP. It’s a long process and many algorithms work in the backend. 

What is ROBOTS.TXT

Now, there are many WebPages available on your website. 

But, you don’t want every page to be crawled and indexed on a search engine. For that, you have to prevent crawlers to crawl specific pages or specific path of the website. Here, you need robots.txt file to control crawlers crawling your website.  

In Short, the robots.txt file is the list of rules for crawlers to avoid crawling specific paths or webpages. 

it is located in the root directory (also known as “the main folder”) of the website. if it is in sub-folders the crawlers will ignore it.

Crawlers / Robots / Googlebot

Crawlers (also known as “Robots”) are kind of software that scans your website according to instructions provided by robots.txt file.

Crawlers are used for many purposes like Indexing, HTML code validation, link validation, monitor new changes on the website.

different purposes of googlebot

Google search engine uses 16 different web crawlers to crawl the website for different purposes. Googlebot is among them. 

How to find robots.txt on my site?

The robots.txt file is available for the public. You can check any website’s robots.txt file by searching URL: sitename.com/robots.txt

Here you can see the robots.txt file of the Google: https://www.google.com/robots.txt

If you are curious to know the basic format of robots.txt file it looks like this:

User-agent: [user-agent name]
Disallow: [URL string not to be crawled]
 
User-agent: [user-agent name]
Allow: [URL string to be crawled]
 
 
Sitemap: [URL of your XML Sitemap]

 

 

You can add as many instructions as you want to hide URLs
from crawlers. And also you can add sitemap URLs in robots.txt file.

Here, it is an example of normal robots.txt file

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

 

In the above example,

The second syntax shows that the crawlers are not allowed to crawl the wp-admin folder.
And the third syntax shows that the crawlers are allowed to crawl the ‘admin-ajax.php’ folder located in the folder of wp-admin.

Why robots.txt is important? Does my website need the robots.txt file?

Before, you know the importance of the robots.txt, you should know about crawl budget. What is crawl budget?

Crawl Budget

When crawlers come to crawl your website, it does not crawl entire site at once. But, there is a session for each crawlers.

In each session, the crawler can crawl certain number of pages and when session ends it goes back.

And again in next session it comes back and resumes its crawling process.

Due to this, the website indexation process takes long time.

But, when you disallow crawlers crawling unnecessary pages it can spend more time crawling important data.

For example you don’t need to index the wordpress admin pages as well as plugins files and theme folders. By disallowing robots crawling these data can save much time.

Coming to the question, Does my site need robots.txt file?

 If you are beginner and have less number of pages on website, the robots.txt file would not affect your indexation process much more.

You should know that if the robots.txt file is not available on website, still it is crawled and indexed on search engine.

 With the growth of the website, you don’t want some pages to be crawled.

 

How to create robots.txt file

Creating custom robots.txt is not easy. A little mistake can affect on indexation process. Some of your pages may stop ranking.

It is simple to write robots.txt syntaxes. But, few mistakes can ruin your hard work.

User-agent: [user-agent name] (it is a name of crawler you want to disallow) 
Disallow: [URL string not to be crawled] (it is a path or specific URL)
 
User-agent: [user-agent name] (it is a name of crawler you want to allow)
Allow: [URL string to be crawled]
 
 
Sitemap: [URL of your XML Sitemap]

Note: always remember that the robots.txt is case-sensitive.

If you are a beginner and don’t want any of your pages to hide from crawlers, I would recommend just disallow the WordPress admin files, plugins file, and WordPress readme file. 

list of different googlebots used by Google

Use the robots.txt rules provided by Google help center to avoid any Error. 

How to submit robots.txt file in WordPress?

There are many ways you can submit robots.txt file but WordPress is an easy and effective way. 

I guess you are already using the Yoast plugin for on-page SEO. If not, then I would recommend using it. 

  The Yoast plugin provides the robots.txt generator tool. 

 

To use the generator tool just go to Yoast SEO > Tools page of WordPress.

How to submit Robots.txt to WordPress

Now, click on the create robots.txt file button

Now, you may see the following default syntax in the box.

User-agent: *

Disallow: /

 

This means that all crawlers are disallowed to index your entire website. If you save this syntax your website will not be crawled by the robots.

So, delete these syntaxes and add your own created syntaxes.

 

And save it by clicking the save changes to robots.txt button.

How to test robots.txt file?

 The robots.txt file is created and submitted. But it does not end here. You need to check your file to assure every important page is crawled by robots.

In the older version of the Google search console, there was a tool called robots testing tool available under the crawl menu. 

Now, you can check it by this direct link: https://www.google.com/webmasters/tools/robots-testing-tool  

 

google robots testing tool

Now, login with the Gmail account linked to the search console and select the property (your website name). 

Here, you can check each URL to assure whether it is blocked by robots.txt file.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.