How come when you do a search and get like 300,000 results, only the first 20-30 links are relevant and the rest have no connection to the search query? Hmmm?
@NMITYou If you add robots.txt after the listing has occurred, it may be a while before the pages drop out of the index. And many people have errors in their robots.txt
Excuse me Matt but did you just say that disallowing a directory in robots.txt prevent Google from crawling the content? Normally this does not help at all if there are other external links pointing to URLs in the directory. Only a "noindex" in the header is an absolute guarantee that Google will not index.
@SEOLEX - Use robots.txt to prevent Google from crawling it. Note that in general, even if a URL is disallowed by robots.txt we may still index the page if we find its URL on another site. However, Google won't index the page if it's blocked in robots.txt and there's an active removal request for the page. Alternatively, you can use a noindex meta tag. When we see this tag on a page, Google will completely drop the page from our search results, even if other pages link to it.
@mikegarde I know. But in my ears Matt actually proposed robot.txt exclusion as a solution to the problem. And I find this a wrong answer. Even if excluded by robots.txt the URL's without any obvious links to them will/might show up in the SERP. With a "noindex" they won't.
@SEOLEX If you really don't want content seen, why put it on the web? Robots.txt is not a perfect solution, but Google does respect it - many other spiders do not. Or you could use password protection?
@heenan73 I don't know. Please understand that I am not the one asking the question in Matts video. I just wondered why Matt said that robots.txt will prevent the content from being visible in Google since that is not how robots.txt works ;-)
@SEOLEX robots.txt does work as stated; it cannot prevent a site being picked up if it has other links; but the point remains, if you don't want it listed, don't FTP it - and don't get incoming links. Why is everything always Google's fault? YOU manage your content; Google obeys the rules .... you have to as well. Remember the original question was "pages that don't have any links" - in that context, Matt was 100% right - you seem to be rewriting the question.
@heenan73 It makes no sense to me what you say? I'm not the one asking and not the one wishing to FTP content that should not be in the index... I'm only asking why Matt states that robots.txt can prevent it when he in other videos state the opposite. I think you've got me all wrong?
@SEOLEX Read the question: "How is Google finding pages which don't have any links to them?" in THAT situation, robots.txt can work; robots.txt cannot work *if there are other links* - but it CAN work where there are none. Which is what he said.
@SEOLEX Your right, however I believe that when Matt mentioned robots.txt he was more or less referring to URL's googlebot will discover by self-fulfilled forms and not from other sites.
Very good question, but I really wanted to know from Matt if there are any other ways for Googlebot to discover and index new pages, besides with link crawling. For example, does Googlebot use the data from Google Chrome browsers too for discovery new webpages?
how can I make google give me an iphone4s ? :((
TaviYamato 2 months ago
Nice Googlebot! This is very relevant to all users. Magaling!
agapitoflores001 3 months ago
This has been flagged as spam show
Yeah Very helpful - thank you, Lynn!? :) I'll be at SES NY and the SEO haters run rampant,? just focus on the positive folks!
mr24bd 4 months ago
This has been flagged as spam show
Thank you so much Ali for sharing such precious tutorials with us,
may be it took a long time for you to learn these things a huge search too. But you are guiding us from your expirience very nicely.
I have seen your all videos i really enjoyed will wait for next..
mr24bd 4 months ago
This has been flagged as spam show
There is Trying to be naughty brides "benaughtyman.info"
madaraddeumiployolbc 1 year ago
If the pages are generated by the search field of the website, then they are linked to somewhat.
RackNineInc 1 year ago
Lame!
binkilinux 1 year ago
"googlebot is very broke and don't have a credit card" lmao best saying ever!
purplefreak3 1 year ago
Pas si nouveau !
CoolNiak 1 year ago
Googlebot maybe broke, but Google Inc certainly isn't. :)
MrJamesBond007 1 year ago
This has been flagged as spam show
Another case... The user can search for something in your "search box". Find an interesting result, copy the URL and post on Twitter.
ZoracKy 1 year ago
Another case... The user can search for something in your "search box". Find an interesting result, copy the URL and post on Twitter.
ZoracKy 1 year ago
This has been flagged as spam show
אתם טובים
Itay2221 1 year ago
אתם טובים
Itay2221 1 year ago
All major browsers like FF, safari, chrome, etc, sent all urls surfed to google. So all this video is wrong as he didn't say main thing.
eurovlad 1 year ago
And what about The Google Toolbar acting as spyware submitting reports of websites visited for indexing purposes?
sydneymonis 1 year ago 2
Matt which are those text you submit, ha vae a ecommerce site. and spider has crawl the XAMPP etc folder..
takeallfree 1 year ago
How come when you do a search and get like 300,000 results, only the first 20-30 links are relevant and the rest have no connection to the search query? Hmmm?
tubester4567 1 year ago
@tubester4567 Simply because the search results are sorted by their relevance
cosminx2003 1 year ago
i've also made the experience, that crawlers started visiting my non-linked pages after i had sent an *email with the direct url*
katamot 1 year ago
But I have seen many search results that had been generated by searching words not just simple drop down.
kisvarosipari 1 year ago
What about the issue raise by @jimboot where he says that he has pages indexed that are blocked by robots.txt
NMITYou 1 year ago
@NMITYou If you add robots.txt after the listing has occurred, it may be a while before the pages drop out of the index. And many people have errors in their robots.txt
heenan73 1 year ago
That was longer then normal
Zackary210 1 year ago
Excuse me Matt but did you just say that disallowing a directory in robots.txt prevent Google from crawling the content? Normally this does not help at all if there are other external links pointing to URLs in the directory. Only a "noindex" in the header is an absolute guarantee that Google will not index.
Can you please clarify this?
SEOLEX 1 year ago
@SEOLEX - Use robots.txt to prevent Google from crawling it. Note that in general, even if a URL is disallowed by robots.txt we may still index the page if we find its URL on another site. However, Google won't index the page if it's blocked in robots.txt and there's an active removal request for the page. Alternatively, you can use a noindex meta tag. When we see this tag on a page, Google will completely drop the page from our search results, even if other pages link to it.
mikegarde 1 year ago
@mikegarde I know. But in my ears Matt actually proposed robot.txt exclusion as a solution to the problem. And I find this a wrong answer. Even if excluded by robots.txt the URL's without any obvious links to them will/might show up in the SERP. With a "noindex" they won't.
SEOLEX 1 year ago
@SEOLEX If you really don't want content seen, why put it on the web? Robots.txt is not a perfect solution, but Google does respect it - many other spiders do not. Or you could use password protection?
heenan73 1 year ago
@heenan73 I don't know. Please understand that I am not the one asking the question in Matts video. I just wondered why Matt said that robots.txt will prevent the content from being visible in Google since that is not how robots.txt works ;-)
SEOLEX 1 year ago
@SEOLEX robots.txt does work as stated; it cannot prevent a site being picked up if it has other links; but the point remains, if you don't want it listed, don't FTP it - and don't get incoming links. Why is everything always Google's fault? YOU manage your content; Google obeys the rules .... you have to as well. Remember the original question was "pages that don't have any links" - in that context, Matt was 100% right - you seem to be rewriting the question.
heenan73 1 year ago
@heenan73 It makes no sense to me what you say? I'm not the one asking and not the one wishing to FTP content that should not be in the index... I'm only asking why Matt states that robots.txt can prevent it when he in other videos state the opposite. I think you've got me all wrong?
SEOLEX 1 year ago
@SEOLEX Read the question: "How is Google finding pages which don't have any links to them?" in THAT situation, robots.txt can work; robots.txt cannot work *if there are other links* - but it CAN work where there are none. Which is what he said.
heenan73 1 year ago
@SEOLEX Your right, however I believe that when Matt mentioned robots.txt he was more or less referring to URL's googlebot will discover by self-fulfilled forms and not from other sites.
mikegarde 1 year ago
the sitemap could have those pages added... so googlegot knows about them
chempranav 1 year ago
This video is indeed quieter than the average.
subrealms 1 year ago
Googlebot is broke from giving all the pagerank to spammers :{
Matt we need you to fix this!
TechieGeek1 1 year ago 2
@TechieGeek1
agreed
anonymopt 1 year ago
Thumbs up to support broke Googlebot
rtsownage 1 year ago 15
Can you record these to be a little louder please?!
calebtheredwood 1 year ago
@calebtheredwood Turn up the volume.
Simurgh 1 year ago
Don't Google Toolbar and Chrome report visited URLs back to Google for crawling as well?
drwxrxrx 1 year ago 2
Can we get links to the form submission blog post & paper? Thanks!
mattdiehl12 1 year ago
Very good question, but I really wanted to know from Matt if there are any other ways for Googlebot to discover and index new pages, besides with link crawling. For example, does Googlebot use the data from Google Chrome browsers too for discovery new webpages?
Robbertbiz 1 year ago 16
@Robbertbiz Right! Or from the Google Toolbar
Isdaron 1 year ago
@Robbertbiz Right! Or from the Google Toolbar.
Isdaron 1 year ago 2
Great video Matt
SPQ96 1 year ago