 A lot of confusion revolves around SEO because no one understands how the Googlebot actually works. Hello and welcome to another episode of SEO Mythbusting. With me today is Suze Hinton from Microsoft. Suze, what do you do at work and what is your experience with front-end and SEO? Yeah, so right now I'm doing less front-end these days. I focus more on IoT. So in the time you were a front-end developer? Yeah, I was a front-end developer for I think 12 or 13 years. And so I got to sort of work on lots of different contexts of front-end development, different websites, things like that. Today I wanted to like just address like a bunch of stuff about Googlebot specifically and like no doubt about Googlebot because that was the side of things that I was sort of the most confused about at the time. So Googlebot is basically your program that we run that does three things. The first thing is it crawls, then it indexes, and then last but not least there's another thing that is not really Googlebot anymore, that is the ranking bits. So we have to basically grab the content from the internet and then we have to figure out what is this content about, what is the stuff that we can put out to users looking for these things. And then last but not least is like which of the many things that we've picked for the index is the best thing for this particular query in this particular time, right? Got it, yeah. But the ranking bit, the last bit where we like move things around that is informed by Googlebot but it's not part of Googlebot. Is that because like there's this bit in the middle of the indexing, like the Googlebot is responsible for the indexing and making sure that that content is useful for the ranking engine to kind of... Absolutely, absolutely. You can imagine like someone has to, in the library, someone has to like figure out what the books are about and like get the index of the bits in the catalog, the catalog being our index really. And then someone else is using that index to make informed decisions and like going like, here this book is what you're looking for. I'm really glad you used that analogy because I worked in a library for like four years. Oh, so you got much better than I, how that works. And I was that person. People would be like, I want Italian cookbooks and I'm like, well it's 641.5495 and you would just like give it to them. So if I would come to you as a librarian and ask a very specific question, so what is the best book on making apple pies really quick? Would you be able to like figure out from the index of... You probably have lots of cookbooks. We did. Yeah, we had a lot. But given that I also put lots of books back on the shelf, I knew which ones were popular. I have no idea if we can link this back to Googlebot. That does. Yeah, it's pretty much so you have the index that probably doesn't really change that much unless you add new books to the... New editions. Exactly, yeah. So you have this index which Googlebot provides you with but then we have the second, the librarian, the second part that basically based on how the interactions with the index work figure out which books to recommend to someone asking for it. So that's pretty much the exact same thing there. Like someone figures out what goes into the catalog and then someone uses it. I love this. This makes total sense to me. But I guess that's still not necessarily all the answers you need, right? Yeah, I just want to know like what does it actually do? Like how often does it crawl sites? Like what does it do when it gets there? Like what is it sort of... How is it generally behaving? Like does it behave like a web browser? That's a really good question. Yeah. Generally speaking, it behaves a little bit like a browser. At least part of it does. So the very first step, the crawling bit is pretty much a browser coming to your page. Either because we found a link somewhere or you submitted a sitemap or there's something else that basically fiat that into our systems. You can use Search Console to give us a hint and ask for re-indexing in that trigger. So crawl before... I've done that before. Oh. Very good. I haven't re-asked for it to be done yet. And that is perfectly fine. Okay. But the problem then, obviously, is how often do you crawl things and how much do you have to crawl and how much can the server bear, right? If you're on the backend side, you know that you have a bunch of load and that might not be always the same thing. If it's like a Black Friday, then the load is probably higher than on any other day. So what Googlebot does is it tries to figure out from what we have in the index already, is that something that looks like we need to check it more often? Does that probably change? Is it like a newspaper or something? Got it, yeah. Or is that something like a retail site that does have offerings that change every couple of weeks or even do not change at all because this is actually the site of the museum that changes very rarely, like for the exhibitions maybe, but like a few bits and pieces don't change that much. So we try to like segregate our index data into something that we call daily or fresh and that gets crawled relatively frequently and then it becomes less and less frequent as we discover and if it's like something that is super spammy or super broken, we might not crawl it as often. Or if you specifically tell us, oh, no, do not index this, do not put this in the index. This is something that I don't want to show up in the search results and we don't come back every day and check, right? So you might want to use the re-index feature. If that changes, you might have a page that you go like, no, this shouldn't be here. And then once it has to be there, you want to make sure that we are coming back and indexing again. So that's the browser bit. That's the crawler part. But then a whole slew of stuff happens in between that happening, us fetching the content from your server and the index having the data that is then being served and ranked. So the first thing is we have to make sure that we discover if you have any other resources on your page. The crawling cycle is very important. So what we do is the moment we have some HTML from you, we check if we have any links in there or images for that matter or video, something that we want to crawl as well. And that feeds right back into the crawling mechanism. Now, if you have a gigantic retail site, let's say, just hypothetically speaking, we can't just crawl all the pages at once, both for our resource constraints, but also we don't want to overwhelm your servers. So we basically try to figure out how much we can put, how much strain we can put on your servers and how much resources we've got available as well. And that's called the crawl budget oftentimes, but it's pretty tricky to determine. So one thing that we do is we crawl a little bit and then basically ramp it up. And when we start seeing errors, we ramp it down a little bit more. So like, oops, sorry for that. So whenever your server serves us 500 errors, there are certain tools in Search Console that allow you to say, hey, can you maybe chill out a little bit? But generally, we don't try to get all of it at once and then ramp down. We are trying to carefully ramp up, ramp down again, ramp up again, ramp down. It fluctuates a little bit. There's a lot more detail in there than I was even expecting. I didn't even know that... I guess I never considered that a Googlebot crawling event could put strain on somebody's website. That sounds like it's a lot more common than I even thought it would be. It does happen, especially if we discover, say, a page that has lots of links to subpages, then all of these go into the crawling queue. And then you might... These have links to... Let's say you have 30 different categories of stuff, and each of these have a few thousand products and then a few thousand pages of products. So we might go like, oh, cool. Crawl, crawl, crawl, crawl, crawl, crawl, crawl. And then we might crawl a few hundred thousand pages. And if we don't spread that out a little bit, so it's a weird balance. On one hand, if you add a new product, you want that to be surfaced in search as quickly as possible. On the other hand, you don't want us to take all the bandwidth that your server offers. Cloud computing makes that a little less scary, I guess. But I remember the days... I'm not sure if you remember the days where you had to call someone and they ask you to send a form or fax a form and then two weeks later you get the confirmation letter that your server has been started. Yes. I remember the days when we would have to call and then we would basically pay $200 to have a human go down the aisles and push the physical reset button on the server. Those times were not tricky. I remember those days. And then imagine you basically renting five servers somewhere in a data center and that taking a week, and then we come and scoop up all your bandwidth and you're like, great, we are fine today because Google has its crawl day. That's not what we want to have. Yeah, these days it's more of a hacker news kind of moment when you've got a hit. Exactly. So I feel like you're much more considerate than... Yeah, we try to not overwhelm anyone and we respect the robots.txt, so that works within the crawl step as well. And once we have the content, we can't put strain on your infrastructure anymore, so that's fantastic. But modern web apps being mostly JavaScript driven, we then put that in a queue and then once we have the resources to render it, we actually use another headless browser kind of thing. We call that the web rendering service. Then there's other crawlers as well that might not have the capacity or the need to run JavaScript. It's like social media bots, for instance, they come and look for metadata. If that meta tag is coming in with JavaScript, you usually have a bad time and they're just like, sorry. Yeah, so that's always been a big myth. And I remember when single page applications or spas really came into vogue, a lot of people were really concerned. There's a lot of FUD around. Well, if crawlers in general don't execute JavaScript, then they're going to see a blank page on how to get around that. So contextually within Googlebot, it sounds like Googlebot executes JavaScript, even if it does do it at a later point. Yes, correct. So that's good. That's good. But is there anything that people need to be aware of beyond just, oh, well, it'll just run it, and then it'll see exactly the same thing as a human with a phone or a desktop would see. There's a bunch of things that you need to be aware of. So the most important thing is, again, as you said, it's deferred. It happens at a later point. So if you want us to crawl your stuff as quickly as possible, that also means we have to wait to find these links that JavaScript injects, that basically we crawl, we have to wait until JavaScript is executed, then we get the rendered HTML, and then we find the links. So the nice little short loop that finds these links relatively quickly right after crawling will not work, right? So we will only see the links after we render, and this rendering can take a while because the web is surprisingly big. Yeah, just a little bit. That was like 130 trillion docs in 2016. Oh, so there's way more now. There's way more than that. So robots.txt is very effective at being able to sort of tell bots how to do a certain thing. But in this scenario, how do you tell that it's Googlebot visiting your site? That's a good question. So as we are basically using a browser in two steps, one is the crawling and one is the actual rendering, both of these moments, we do give you the user agent header. But basically there's the string list, literally the string Googlebot in it. Oh, that's so straightforward. Yes, and you can actually use that to help with your SPA performance as well. So as you can detect on the server side, oh, this is Googlebot user agent requesting, you might consider sending us a pre-rendered static HTML version, and you can do the same thing for the others, like all the other search engines and social media bots have a specific string saying that they are a robot. So you can then basically go like, oh, in this case, I'm not giving you the real deal, the single-page app, I'm giving you this HTML that we pre-rendered for you. It's called dynamic rendering. We have docs on that as well. The one thing that still doesn't quite make sense to me is, does the Googlebot kind of have different contexts? Like, does it sometimes pretend that it's like, I think of it as this little mythical creature that's pretending to do certain things. So like, does it pretend to be on a mobile and then desktop? Like, are there different sort of, I guess, user agents even though it still says Googlebot? Do you differentiate between them? You're asking great questions because, yes, we have different user agents. So I'm not sure if you heard about mobile-first indexing being rolled out and happening. I've heard that, like, it's going to affect, like, how you're ranked potentially. I don't know if that's a rumor or not. Ah, that's two different things that get conflated so often. So mobile-first indexing is about us discovering your content using a mobile user agent and a mobile viewport. So we are using mobile user agents and the user agent string says so it says something about Android in the name and then you're like, aha, so this is the mobile Googlebot. We have documentation in there. There's literally a help center article that lists all these things. So we try to index mobile content to make sure that we have something nice to serve for people who are on mobile. But we're not pretending like random user agents or anything. We stick to the user agent strings that we have documented as well. And that's mobile-first indexing where we try to get your mobile content into the index rather than the desktop content. And then there's mobile readiness or mobile friendliness. If your page is mobile-friendly, it makes sure that everything is within viewport and you have large enough tap targets and all these kind of lovely things. And that just is a quality indicator. We call these signals. We have over 200 of them. That's a lot. So Googlebot collects all these signals and then stuffs them as metadata into the index. And then when we rank, we're like, okay, so this uses on mobile. So maybe this thing that has a really good mobile friendliness signal attached to it might be a better one than the thing where they have to pinch zoom all the way out to be able to read anything and then can't actually deal with the different links because they're too close to each other. So that's one of the many. It's not the signal. It's one of the many signals. It's one of the over 200 signals to deal with. I had no idea there were 200. That's making me... I know that you're not allowed to share what they all are because there has to be a certain mystique around it because of a lot of SEO abuse in the past. Yeah, unfortunately, that is a game that is still being played and people are doing weird stuff to try to game us. And the interesting thing with this is with the 200 signals, it's really hard to say which one gets you moving in the ranks. The weights of each signal. And they keep moving and they keep changing. So I love when people are like, no, let's do this. And then look, my rank change is like, yeah, for this one query. But you lost on all the other queries because you did really weird and funky stuff for that. So just build good content for the users and then you'll be fine. I feel like that feels like less effort as well than constantly trying to... Yeah. But it's not an easy answer, right? You will pay me to make you more successful on search engines. And I come to you and say like, so who are your users and what do they need and how could you express that so that they know that it's what they need. That's a hard one because that means I basically bring the ball back to you and now you have to think about stuff and figure out strategically. Whereas if I'm like, okay, I'm just going to get you links or do some funky tricks here and then you'll be ranking number one. That's an easier answer. It's the wrong answer, but it's the easier answer. It is a little bit, yeah. It's the most important metric ever. And I'm like, no, we have over 200 and it's important, but it's not that important and chill out everybody, but this still happens. I'm so glad it's better now. I feel actually more at peace in general with SEO as well after speaking to you today. That's so nice. That's so good. So thank you so much for being with me here and it has been a great pleasure. Yeah, thanks for answering all of my weird and wonderful questions about the Googlebot. Perfect questions, perfect opportunity. Did we bust some booths? I feel like we did. Fantastic. I think that's worth a high five. Awesome. Thanks. Thanks. Join us again for the next episode of SEO booth busting, where Jamie Albarico and I will discuss if JavaScript and SEO can be friends and how to get there.