Robots.txt Сonfiguration in Magento 2. Nofollow and Noindex

Robots exclusion standard, also known as robots.txt file, is important for your website or store when communicating with search engines crawlers. This standard defines how to inform the bots about the pages of your site that should be excluded from scanning or, visa versa, opened for crawling. That’s why robots.txt file is significant for the correct website indexation and its overall search visibility.

By default, Magento 2 allows you to generate and configure robots.txt files. You can prefer to use the default indexation settings or specify custom instructions for different search engines.

To configure the robots.txt file follow these steps:

1. Open Content tab, select Design option and select the Configuration.

Magento 2 robots.txt

2. Open the Search Engine Robots section and switch the Default Robots option to one from the drop-down.

robots.txt Magento 2

3. Enter custom instructions to the robots.txt file.

Magento robots

4. Hit Save Config to complete the operation.

We recommend using the following custom robots.txt for your Magento 2 store:

User-agent: *
Disallow: /*?
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /wishlist/
Disallow: /admin/
Disallow: /catalogsearch/ Disallow: /checkout/
Disallow: /onestepcheckout/
Disallow: /customer/
Disallow: /review/product/
Disallow: /sendfriend/
Disallow: /enable-cookies/
Disallow: /LICENSE.txt
Disallow: /LICENSE.html
Disallow: /skin/
Disallow: /js/
Disallow: /directory/

Lets consider each groups of commands separately.

Stop crawling user account and checkout pages by search engine robot:
Disallow: /checkout/
Disallow: /onestepcheckout/
Disallow: /customer/
Disallow: /customer/account/
Disallow: /customer/account/login/

Blocking native catalog and search pages:
Disallow: /catalogsearch/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/

Sometimes Webmasters block pages with filters..
Disallow: /*?dir*
Disallow: /*?dir=desc
Disallow: /*?dir=asc
Disallow: /*?limit=all
Disallow: /*?mode*

More reasonable to use canonical tag on these pages.

Blocking CMS directories.
Disallow: /app/
Disallow: /bin/
Disallow: /dev/
Disallow: /lib/
Disallow: /phpserver/
Disallow: /pub/

These commands are not necessary. Search engines are smart enough to avoid including CMS files in their index.

Blocking duplicate content:
Disallow: /tag/
Disallow: /review/

Don’t forget about domain and sitemap pointing:
Host: (www.)domain.com
Sitemap: http://www.domain.com/sitemap_en.xml

Meta robots tags: NOINDEX, NOFOLLOW

After configuring the robots.txt file, you can switch your attention to Nofollow and Noindex tags. These tags are used to spread the weight of pages and cover some unnecessary parts of code from crawlers.

Nofollow hides a part of the text or the whole page from indexation.
Noindex is an attribute of tag, that prohibits the transfer of the page weight to an unverified source. In addition, you can use Nofollow for pages with a large number of external links.

To apply Nofollow or Noindex to your current configuration you can either update the robots.txt file or use meta name=“robots” tag.

All possible combinations:

<meta name="robots" content="index, follow" / >
<meta name="robots" content="noindex, follow" / >
<meta name="robots" content="index, nofollow" / >
<meta name="robots" content="noindex, nofollow" / >

Magento 2 meta robots

Add the following code to the robots.txt file in order to hide specific pages:

User-agent: *
Disallow: /myfile.html

Alternatively, you can prohibit indexation with this code:

<html >
<head >
<meta name=”robots” content=”noindex, follow”/ >
<title>Site page title>
</head >

Important notice:

Tags Noindex and Nofollow has many advantages over blocking a page through robots.txt:

1. Robots.txt prevents a page crawling during scheduled website crawling. However, this page could be found and crawled from other websites links.
2. If a page has inbound links all juice will be transmitted to other website pages through this page internal links.

Using the instruction above, you will be able to manually configure the robots.txt file of your Ma-gento 2 store and hide the unnecessary parts of code or spread the weight of pages.

You can simplify working with robot.txt file with 3d party Magento 2 seo plugin

SEO Suite Ultimate

The first Magento 2 SEO solution. Eliminates duplicate content issues, improves website indexation and makes it search engine & user friendly.

$299

SEO Meta Templates

Advanced SEO attribute templates to easily optimize product and category page meta data, as well as the short and detailed descriptions.

$99

Loyalty Reward Points

Magento Store Credits tool. Reward tour best customers with points and retain them and motivate the profitable behavior of the new ones.

$149

Comments

Ronald Edelschaap | Web Whales

posted on Jun 2, 2017
I recommend against disallowing search engines to access the pub folder. Search engines like Google rate your website (amongst other things) on visual presentation and try to render the website.

All that stuff lives in the pub folder. Furthermore, Magento depends on JS a lot and renders content dynamically. When you block search engines from accessing the JS files, they cannot use those files to render your page and simulate a real browser.

Ads

posted on Apr 6, 2018
@Mageworx

Ronald makes a valid point, any response or update on this?

Also noticed a spelling mistake in this article: Disallow: /rewiew/
Should be "review"

Alex

posted on Apr 9, 2018
Thank you.
Disallow: /rewiew/ corrected

2 Ronald
If a server is configured right, all files in the PUB directory could be accessed via URL that does not contain /pub/
If there is a server misconfiguration, a file in a /pub/ could be accessed via 2 URLs with and without /pub/ in the URL structure.

So, the directive
Disavow: /pub/
protects the store from content duplication issue in case of server misconfiguration.

SE should index only URLs in the sitemap, not "All that stuff lives in the pub folder"

Ads

posted on Apr 10, 2018
Hi @Alex

Thanks for the update and information, this helps for those like me not very technical with this.

I updated my robots.txt as per your suggestion but had a question about it. Will update my ticket instead and link this article to the information request.

Just a suggestion on the article:

"To configure the robots.txt file follow these steps:"

This applies to, I believe, PRE Magento 2.1, if I can recall correctly.

I think after Magento 2.1, the structure changed and to edit the robot.txt, users have to navigate to: Content --> Design --> Configuration --> Search Engine Robots

Not sure exactly which version the change happened though but I think it was 2.1.

Small change but can trip up people like me.

Thanks.

Alex

posted on Apr 11, 2018
Thank you for useful comments, guys.

We have made necessary corrections.

Submit Comment




* Required Fields