This was very helpful, but I have a question. If we use noindex, you made it sound like we need to allow Google to crawl the page by not blocking it via robots.txt. Otherwise, if we blocked it, Googlebot wouldn't read the noindex tag. Is that right?
@RickettsFish Good question, seems like a catch-22 situation.
As I understand it, if you use robots.txt to block crawlers, your website won't get crawled. And it *may* not get indexed at all, unless other websites link to it. In that case, Google will show it in its index. So far, that's what Matt clearly explained in this video.
Then, this is my guess: if it's the case that many other websites link to your blocked website, Google may have to do some very basic crawling of your website, just to decide (via presence of "noindex" meta) if it should be listed or not on SERPs.
Combining crawling with indexing / serving directives
Robots meta tags and X-Robots-Tag headers are discovered when a URL is crawled. If a page is disallowed from crawling through the robots.txt file, then any information about indexing or serving directives will not be found and will therefore be ignored. If indexing or serving directives must be followed, the URLs containing those directives cannot be disallowed from crawling.
I think better maintain that way, Matt. As many bloggers can 'ferry' that anchor texts like what you said. Especially, your example for NISSAN. Many small entrepreneurs (vendors) related to the industry able to get benefits from it. Like mine, Nissan Impul.
BUt what if we remove from URL removal tool and then since people will be linking to those specific urls ?
oceanofweb 4 weeks ago
This one is very informative. I am learning a lot.
agapitoflores001 2 months ago
This was very helpful, but I have a question. If we use noindex, you made it sound like we need to allow Google to crawl the page by not blocking it via robots.txt. Otherwise, if we blocked it, Googlebot wouldn't read the noindex tag. Is that right?
RickettsFish 5 months ago in playlist General Tips 2
@RickettsFish Good question, seems like a catch-22 situation.
As I understand it, if you use robots.txt to block crawlers, your website won't get crawled. And it *may* not get indexed at all, unless other websites link to it. In that case, Google will show it in its index. So far, that's what Matt clearly explained in this video.
lemannequin 3 months ago
@RickettsFish
Then, this is my guess: if it's the case that many other websites link to your blocked website, Google may have to do some very basic crawling of your website, just to decide (via presence of "noindex" meta) if it should be listed or not on SERPs.
But then, I'm just guessing.
lemannequin 3 months ago
@RickettsFish:
Combining crawling with indexing / serving directives
Robots meta tags and X-Robots-Tag headers are discovered when a URL is crawled. If a page is disallowed from crawling through the robots.txt file, then any information about indexing or serving directives will not be found and will therefore be ignored. If indexing or serving directives must be followed, the URLs containing those directives cannot be disallowed from crawling.
lemannequin 3 months ago
Also robots.txt Sites dont have a Cached Version
mutchy126 11 months ago
Thanks for this tip.It still avaiable today?
chiangmaihotel 1 year ago
that tip is so good for me.. thanx matt!.. :)
mkarakas0690 2 years ago
I think better maintain that way, Matt. As many bloggers can 'ferry' that anchor texts like what you said. Especially, your example for NISSAN. Many small entrepreneurs (vendors) related to the industry able to get benefits from it. Like mine, Nissan Impul.
MichaelDadona 2 years ago
Just make sure you remove the rule from robots.txt first or Google will never see the noindex meta tag on the page.
kevinargh 2 years ago
Cool, that explains a lot, including what happened with those Google local listings that appeared to have been crawled that were robots.txt'd
allison30dc 2 years ago
This was actually pretty interesting. I didn't know that the meta tag "noindex" would actually totally dump it from the index. Very cool.
danielgayle 2 years ago
Wow, I never knew that! I though robots.txt actually blocked Google from listing the site.
gbmodern 2 years ago
This has been flagged as spam show
Very useful information. Thanks Matt.
BKPrecisionVideos 2 years ago 3