Combining crawling with indexing / serving directives
Robots meta tags and X-Robots-Tag headers are discovered when a URL is crawled. If a page is disallowed from crawling through the robots.txt file, then any information about indexing or serving directives will not be found and will therefore be ignored. If indexing or serving directives must be followed, the URLs containing those directives cannot be disallowed from crawling.
Then, this is my guess: if it's the case that many other websites link to your blocked website, Google may have to do some very basic crawling of your website, just to decide (via presence of "noindex" meta) if it should be listed or not on SERPs.
@RickettsFish Good question, seems like a catch-22 situation.
As I understand it, if you use robots.txt to block crawlers, your website won't get crawled. And it *may* not get indexed at all, unless other websites link to it. In that case, Google will show it in its index. So far, that's what Matt clearly explained in this video.
This was very helpful, but I have a question. If we use noindex, you made it sound like we need to allow Google to crawl the page by not blocking it via robots.txt. Otherwise, if we blocked it, Googlebot wouldn't read the noindex tag. Is that right?
I think better maintain that way, Matt. As many bloggers can 'ferry' that anchor texts like what you said. Especially, your example for NISSAN. Many small entrepreneurs (vendors) related to the industry able to get benefits from it. Like mine, Nissan Impul.
BUt what if we remove from URL removal tool and then since people will be linking to those specific urls ?
oceanofweb 3 weeks ago
This one is very informative. I am learning a lot.
agapitoflores001 2 months ago
@RickettsFish:
Combining crawling with indexing / serving directives
Robots meta tags and X-Robots-Tag headers are discovered when a URL is crawled. If a page is disallowed from crawling through the robots.txt file, then any information about indexing or serving directives will not be found and will therefore be ignored. If indexing or serving directives must be followed, the URLs containing those directives cannot be disallowed from crawling.
lemannequin 3 months ago
@RickettsFish
Then, this is my guess: if it's the case that many other websites link to your blocked website, Google may have to do some very basic crawling of your website, just to decide (via presence of "noindex" meta) if it should be listed or not on SERPs.
But then, I'm just guessing.
lemannequin 3 months ago
@RickettsFish Good question, seems like a catch-22 situation.
As I understand it, if you use robots.txt to block crawlers, your website won't get crawled. And it *may* not get indexed at all, unless other websites link to it. In that case, Google will show it in its index. So far, that's what Matt clearly explained in this video.
lemannequin 3 months ago
This was very helpful, but I have a question. If we use noindex, you made it sound like we need to allow Google to crawl the page by not blocking it via robots.txt. Otherwise, if we blocked it, Googlebot wouldn't read the noindex tag. Is that right?
RickettsFish 5 months ago in playlist General Tips 2
Also robots.txt Sites dont have a Cached Version
mutchy126 11 months ago
Thanks for this tip.It still avaiable today?
chiangmaihotel 1 year ago
that tip is so good for me.. thanx matt!.. :)
mkarakas0690 2 years ago
I think better maintain that way, Matt. As many bloggers can 'ferry' that anchor texts like what you said. Especially, your example for NISSAN. Many small entrepreneurs (vendors) related to the industry able to get benefits from it. Like mine, Nissan Impul.
MichaelDadona 2 years ago