 Welcome everyone to the next episode of Search Off the Record, a podcast that we're trying out. Our plan is to talk a bit about what's happening at Google Search, how things work behind the scenes, and maybe have some fun along the way. My name is John Mueller. I am a search advocate on the Search Relations team here at Google in Switzerland. I'm joined here today by Martin and Gary, who are also on the Search Relations team. Good morning. What do you mean, good morning? Do you wish me a good morning, or mean that it is a good morning, whether I want it or not? Or that you feel good this morning, or that it is a good morning to be good on? Uttram Dobrim Nebevayet. Yes, yes. There's no such thing as a good morning. You said yes. What does that mean? What does that mean, John? You gave me a bunch of options with or in between, so the logical answer is yes, if one of those is true. That's the opposite of logic. This is the worst day of my life. So far. And it's 4 AM. So far, Gary. In Gary's time zone, that is, right? All right, Martin, do you want to take us off? Sure thing. Speaking of logic and fantastic things, do you remember when we were introducing the two waves of indexing like two years ago? Yes, yes. I was involved with a lot of that back then. We worked together, I think, with Tom Greenaway, who is also on the developer relations team at the time, to talk about rendering and indexing for the first time. And I think we introduced the two waves metaphor there. Thanks for that. That's great. I think I understand where that's coming from. And if I remember correctly, basically, I joined the team right before you did that presentation at I.O. 2018. And I was convinced back then that, yeah, that's a fantastic way of explaining how things work in rendering and indexing and crawling, because it's such a complicated process with lots of things happening in parallel. But I got to say, I kind of have to deal with a fallout from that metaphor because it invites misunderstandings. And people are basically relatively frequently asking, so how long does it take for the second wave to happen? Or how do I deal with my website being indexed before the second wave happened and these kind of questions? Which I think, given the metaphor, makes sense. But given the way that the process actually looks like, is not helpful for them or is not leading to the right results. And I really try to phase out this metaphor, but it keeps coming back at me. And yeah, the way that we are seeing it most cases, and basically that's nearly 100% of the cases, your website gets crawled, then it gets rendered, and then it gets indexed. There are certain situations where that isn't true, like when the rendering fails multiple times, or when we have other signals that we can pick up from the initial HTML and stuff. So it isn't necessarily that everything gets rendered, but pretty much practically every website gets rendered before it gets indexed. And yeah, so I wonder how long I'll have to, I don't know, swim through these two wave metaphors to get to safer land. This is gonna be fun times. So do you think we should not have introduced it like that? Or do you think we just need to be clearer in what is the current status? I think we should have introduced it like that, but explain that it's a simplification. It's like a mental model for people to look at and that it's not literally that. I think people took it quite literal, and that causes a few confusing moments, I think, where I would say like, if you look for a simple mental model, assume crawl render index, and then if you are seeing really weird things, then you might actually look into what's going on more specifically, or ask us more specific questions for your specific case, where that might not be the case. But I think the metaphor was okay, it's just people took it a little too literal. No, I mean, all of rendering and JavaScript is pretty complicated. So it's probably hard to find a middle ground that explains where the problems might be coming from and what they need to be doing. That is true. But I think you've been here a couple of years now and you see how when you say things once on our side, it sticks around for a really long time. So sometimes that's challenging. And things just keep changing and that's kind of fine, because in the end, most of the changes are implementation details that have not much impact, but then people kind of like latch onto them and be like, ooh, so is this a big thing? And I'm like, no, it's fine, it's okay, don't worry about it. And then also there's ranking involved. It's like, oh, my website didn't get index. And I'm like, actually it is index, it's just not ranking for anything. Yeah, fun times, it's always fun. And then people are like, ooh, is this because the website only gets out of the sandbox once the JavaScript has been processed? I'm like, no, that's not, yeah, it's tricky. Sandbox, yeah. Well, I guess since we talked about waves and sandbox, it's almost like a visit to the beach, you know, head or toe. Ooh, vacation theme. The sandbox is probably one of those topics that's similar to, I guess the two waves of indexing in the sense that people talk about it once. And sometimes when it comes to SEO, everything is so complicated that you try to look for simplifications and you cling to those simplifications for a really long time. And sandbox and honeymoon period are kind of two other simplifications that keep coming up over and over again. Is that something you've ran into as well, Martin? Oh, yeah, so I recently did this tech SEO Reddit AMA where people were asking me questions. And not only did the two waves come up multiple times, I mean, it's vacation time. I think people are taking their mind to the beach, so that's that. But also the sandbox and honeymoon periods came up and it struck me as weird because kinda one precludes the other. It's like, if my website is new and I get all this traffic, why does the traffic then drop off eventually? And then, yeah, is this the honeymoon period kind of situation? And then the next person is like, well, my website is new and I don't get any traffic or don't get any ranking in Google. Is this the sandbox? And I'm like, so what is it? Do we have a sandbox? Do we have a honeymoon period? What's going on? Yeah, I always find it interesting when someone on the team asks these kinds of questions as well because it's not that you can go into the internal Google documentation and look for the sandbox, does it exist? Because you probably won't find a lot of useful things there. But I think the general problem there is really more a practical thing and the names are a bit, I don't know, simplified, but the practical problem is really that if you have a new website, we generally don't know a lot about it if it's new because we can look at the content, we can kind of see what's written on there, but we don't really know how it's accepted within the whole web ecosystem. So from a practical point of view, our systems essentially make some guesses and we make some assumptions trying to figure out where should we position the site in the search results, how competitive is kind of the search results area in general for the queries that we think we might be showing the site for. And based on all of these different factors, we try to figure out where could we position this website until we know more about it? And that's essentially this period of time which people externally sometimes simplify into sandbox or honeymoon period where maybe we will position it in a way that is very optimistic and say, oh, wow, this looks really good. I guess it'll perform really well in search and we'll get lots of really good signals over time or we could look at that and say, well, it's a very competitive environment. There are other sites that have been working on this area for a really long time and they're really well accepted on the web. We might need to be a little bit more critical or I don't know, watch out a little bit more with regards to how we position the site. And that's something that is not like an algorithm that's built in that says like, we should hold the site back or we should show it even more visibly for a certain period of time. It's really just, we don't know how to show this site in search. So we have to make some guesses, we have to make some assumptions. And over time, as we figure out how we should be showing it, kind of when we have more signals to show and we use lots of signals in search, of course, then we'll be able to show it a little bit more reasonably. And that could mean that maybe we were showing it over optimistically in the beginning and as we learn more about how it's accepted, we have to kind of pull that back a little bit. It could also mean that we showed it a little bit too pessimistically in search almost and we need to show it a little bit more visibly. But these are all things that happen over time anyway. And in particular, when a site is new, that transition from not knowing a lot about the site to knowing more about the site is sometimes a bit jarring. It's a bit more visible compared to kind of the traditional changes that happen with the site over time. So that's kind of, I guess, where this sandbox and the honeymoon periods come from. Usually when people externally bring up one or the other, they focus on just that one aspect and telling them or showing them how other people are talking about exactly the other side or the exact opposite of that that makes it a little bit clearer to them that it's not just like one thing or the other. It's actually kind of a balance between the two things. So I don't know. I don't think it's a topic that will go away because there will always be new people who make websites on the web and they'll always run into this situation where they find someone saying like, oh, you have to write about this topics and you will rank well. And they do that. They follow those instructions and they don't rank well or they rank particularly well. They're really enthused about doing more on the web and the next website they do doesn't rank that well. So I suspect that'll continue to follow us around for a while. Maybe longer than the two waves of indexing. I don't know. I think it makes sense that these kind of things come up because it's not very easy to understand or debug this from the external point of view. I think we can look at what signals we have and then see, oh yeah, we don't know that much about the site and this is an area where we are more optimistic for new things coming in. But yeah, it's interesting that you can explain it like that and it makes perfect sense, yet people are falling back to the simplification. And I think that's also what happens with the two waves of indexing. So yeah, except that two waves of indexing is a much more niche topic, I guess. So that's helping. Also, if I remember correctly, the sandbox thing that's been around for like 20 years now. Wow, that's a long sandbox. Yeah, that's a very long sandbox. It's probably more like a beach in Brazil. Do you remember how that initially came up? Well, if I recall correctly, we used to have this batch-based indexing system, right? And then that did have this weird effect because how was it? I think we rebuilt the index every month, but only once every month. And then that meant that if you created a website at the wrong time, as in farthest away from when we built the index, then you had this long wait period. And during that time, you couldn't do much with that site because, well, we were not indexing things from your site. And then somehow that indexing sandbox, I guess, transformed into a ranking sandbox in people's minds as well, which was probably not that helpful. And then we just got to enjoy talking about it for like 20 years. Another thing that you probably enjoy talking about, Gary, now that we are talking about indexing already is probably cookies. Well, cookies is one thing, but another thing that I want to bring up is we recently published the second episode of the SEO MoveBusting, and we talked with Alexa Sanders about crawl budget. Cookies? Actually, we talked about kimchi and crawl budget, but not cookies. I'll pick up the cookies today. I know you have another batch for me ready, which also feeds back into the batch building of the index probably, but there is still like a lot of chatter about crawl budget. I think that has been around forever as well, right? Can we just talk about cookies instead of crawl budget? No. Okay, here's the thing. You talk about crawl budget now and then we can talk about cookies. How about that as like a reward? Sounds awful. Let's do it. Okay, crawl budget. We published quite a bit about crawl budget, I think lately, more the past couple years. We've been pushing back on crawl budget historically, typically telling people that you don't have to care about and I stand my ground and I still say that most people don't have to care about it. We do think that there is substantial segment of the ecosystem that has to care about it. That's why we publish more on that topic and talk a little bit more about it, but I still believe that I'm trying to reinforce this here that the vast majority of the people don't have to care about it. Everyone wants a number. Basically how big your site has to be when you have to care about crawl budget, but I don't think it works like that. And I remember that when we were writing one of the help center documentations, then Josh, our help center tech writer was also asking this question. Okay, let's define this. Like how many pages do you think or how many URLs on the site you have to have to start caring about crawl budget? And we were working with the Googlebot team on the documentation and both the Googlebot team and us, the search relations or whatever we are called nowadays were saying that, well, it's not quite like that. It's like you can do stupid stuff on your site and then Googlebot will start crawling like crazy or you can do other kinds of stupid stuff and then Googlebot will just stop crawling altogether. And eventually he did convince us to give a number, which I don't remember, but I think it's around a million, I would say URLs on the site. And that's our baseline. So basically if you have fewer than a million URLs on the page on your site, then you don't really have to care about crawl budget. So what is crawl budget? Is like how much money you have to pay Google to get crawled? Oh, wow. You didn't. That was a big buzz. Thank you. You can't pay us money and get crawled. That's not how it works. It's never how it works. Crawl budget is essentially an external made up term which we tried to define as the number of URLs that Googlebot can and is supposed to crawl or instructed to crawl. The instructions come from indexing. Basically we have a system called crawl scheduling which tries to estimate which pages need to be re-crawled, for example, not re-crawled, but re-crawled. And Martin is making a face. You can see this, but it's beautiful. He's almost face-bombing, I think. I love that. And crawl scheduler also tries to estimate which sections of the site has to be discovered, essentially, where do we have to do discovery crawls? Discovery crawl means that we think that there are URLs in a particular section of a site that has undiscovered URLs or has new URLs, something like that. So that's what crawl scheduler does. It instructs Googlebot to crawl more and also what to crawl. And then we have pure discovery crawl where Googlebot can just go crazy on the site and hop, essentially, from one URL to the other and push stuff to indexing. Then how much Googlebot can crawl? We try to be good citizens of the internet and we try not to crush servers, not crush. Crash. Crash servers. Yes, yes, that's better. And we do have enough crawl capacity to essentially crash parts of the internet, but we don't want to use that power. With great power comes great responsibility, as we learned from a very smart man. And we try to go slow as possible but still discover and crawl enough from sites. We don't want to harm sites with crawling. Nonetheless, sometimes it happens and then we have to back out. Basically, we look at signals from the site that tells Googlebot that we have to back out from the site, back off. Well, we internally call them back off signals. So for example, if the site starts sending us 429 or 50 whatever status codes or it slows down considerably, then we would back out and Googlebot starts crawling slower. And if the signals continue, then slower, slower, slower, slower, and eventually it can even stop completely because it may perceive that the site is too overwhelmed to be crawled. I don't remember this happening or seeing escalations about this, but it can happen. Yeah, so that's crawl budget. How much Googlebot can crawl and is willing to crawl or is instructed to crawl? I forgot what was the topic. Crawl budget. OK, I covered crawl budget, so now we can talk about cookies. So how can you tell if you're running into crawl budget issues? You have 1 million and five URLs on your website. And this is one more reason why we didn't choose a number or why we didn't want to choose a number because then, OK, so 1 million and five is too much or that's still fine. Well, I have no idea what would I look at. Probably URLs that were never crawled. That's a good indicator for how well-discovered, how well-crawled the site is. Of course, that also links to the structure of the site. Like, for example, if you have orphan pages, then that's very hard for us to crawl because we can't see them unless you tell us about them somehow. So I would look at pages that were never crawled. For this, you probably want to look at your server logs because that can give you the absolute truth. Then I would also look at the refresh rates. Like, if you see that certain parts of the site were not refreshed for a long period of time, say, months, and you did make changes to pages in that section, then you probably want to start thinking about crawl budget. So how can you actually influence crawl budget? Well, one thing is that you want to send us... Cookies. Yeah, not cookies. I try that. It doesn't work. I have this site, spamiguy.com. And it's... Stop laughing, Martin. Or... Sorry. Are you? No. That's what I thought. So I have this site, spamiguy.com. And when I launched it, it's basically gibberish content with, I think, one link to matcots.com. And what I'm trying to do with that is observing how Googlebot, for example, behaves. Let's say I generate a bunch of URLs and then I see that Googlebot goes crazy with those URLs. Super excited and starts to crawl like crazy. And if I publish on another section some good stuff, something that I do think that should show up in the index, it's not gibberish, then basically Googlebot is wasting time on the gibberish side because it's auto-generated gibberish content that is linking infinitely to other pages. And it's spending time on that stuff instead of going to the good section of the site. And if you think about it, then that kind of makes sense. Like if you remove, if you chop, if you prune from your site stuff that is perhaps less useful for users in general, then Googlebot will have time to focus on higher quality pages that are actually good for users. Backout signals or backoff signals. If you send us backoff signals, then that will influence Googlebot crawl. So if your servers can handle it, then you want to make sure that you don't send us like 42950X status codes and that your server responds snappy fast. Yeah, I guess that's it. But in general, I would, unless you need to, I wouldn't worry about the crawl budget. There are better things to worry about. Like if your site is still not mobile friendly, for example, then you could catch up with 2003 and have a mobile site already. Oh, 2013, sorry. So Martin, does crawl budget also play a role with JavaScript? A little bit, because if you are already crawl budget sensitive and as Gary said, that's not very many websites. Like most websites that talk about crawl budget issues turn out to not have crawl budget issues, but if you do, then depending on how you build your JavaScript, you might actually end up having more requests than if you do like a server-side rendered version or a static version of your website. So as an example, if you have a client-side rendered website, the website loads, it loads its JavaScript and then the JavaScript makes the five API requests to fetch the actual content, then all of these five extra API requests do count against your crawl budget in a way and that can then easily scale. If like every page that you have on your website and you have 10 million of these pages, then there's like 10 API requests and that does add up quite quickly. But normally that should be fine. As Gary said, most websites have other lower hanging fruit than crawl budget to reap first. But yeah, if you are crawl budget sensitive, then the way that you build your JavaScript or architect your web application that does might have an impact. That was a very good point, the JavaScript and resources and stuff because every single URL that we crawl on the site will chip away from crawl budget. So basically, if we have to crawl alternate versions of the site, let's say that you have 170 language variations of a page, then all those will chip away from your crawl budget. It's not like, oh, these are just a variation and we don't count it in the crawl budget. We do have to crawl them. And if we have to crawl them, then that means that it will chip away from your crawl budget. Basically, we will have less time on something else. Yeah, does that also include things like CSS files or images which usually don't change? Yes. But we do have caching in place. Oh, that's what I wanted to say. Oh, sorry. Demet Martin. Sorry. Now you go. Right, so let's say like you have one style sheet for all your pages and then we crawl one URL and then we make the request to that URL. That's one request in your crawl budget gone. Then we crawl the CSS file. Let's say like, I don't know, main.css or something like that. That's the second one. And then maybe there's like one image on this one page that we just crawled. That's the third request. Then we see the next page. That's the fourth request. But we would not request the CSS again because we already have it in the cache. So that would not count again against your crawl budget. And the cache is relatively aggressive. Aggressive cache. Okay, it kind of makes sense to use cache when you're talking about a crawled budget, right? Ha, ha, ha. You didn't. Oh my, ah, so funny. Okay, so sorry. No worries. It sounds like the two waves of indexing and the crawl budget are both kind of complicated topics where people try to simplify them into ways that are a little bit easier to understand. But when you really dig into them, it's a lot of it depends in there and it's not like this clear yes or no type answer that you can just easily give, right? There are rabbit holes. You can dig quite deep into each of them, but I think it makes it even more important to just at the surface level understand how your site is doing and what your site's doing and where to spend the time to look further. If your site has 10 pages, it's very unlikely that you will suffer crawl budget issues per se. If your site uses a regular JavaScript framework and you are not ranking well, then that's very unlikely to be primarily a JavaScript issue unless you see that we are not rendering your content. So you wanna very carefully find out what's causing the problem if it even is a problem because sometimes what looks like a problem really isn't a problem depending on the tools that you use. So if this website doesn't rank for the top 100 keywords that this tool things are important, it's like, yeah, but your website is in a completely different industry or niche, so why do you care? Yeah, I think it's interesting because there are always new people that jump into SEO and they start working on small sites sometimes in the beginning and they hear all of these complicated things and wonder what they should be watching out for, but it sounds like the two waves of indexing and crawl budget, if you're working on a smaller site you probably don't have to care about it all and you can kind of grow into that over time. If you start working on different kinds of sites, more complicated JavaScript sites or really large websites. Yeah, cool, so. Start with the basics, always start with the basics and then build up from there. Cool, so when it comes to crawl budget, I heard you have to give Gary cookies and then you get more crawl budget. Is that right? Or like, how does that work? I haven't gotten more crawl budget but I have gotten fantastic cookies. I'm really looking forward to the next batch. Oh wait, he gives you cookies. You don't have to give him. Yeah, Gary had a ridiculous amount of oatmeal in his pantry apparently and is like, what do I do? And I'm like, make cookies. And so I got two batches of cookies and they were delicious. And somehow I still have, I think, maybe one pound worth of oatmeal. Well, actually more, like one was two pounds. Yeah, actually one was two pounds of oatmeal. How much is that in real measurement units? It's like one kg. Ah, okay. 27.5 cups. Yeah, let's not talk about that. How many caterpillars is that? At least seven. I think that's right. Yeah, I have no idea how I ended up with so much oatmeal but it was occupying way too much space in my pantry in my cupboard. And then I was trying to figure out what to do with them and I just searched for something like what to do with lots of oatmeal. And then one of the first results was, well, make cookies, you dumb idiot. And then I actually have no idea how I ended up with because I'm making vegan cookies now. Yes. But I don't remember how I ended up with vegan cookies based on the oatmeal. I think it was like most of the recipes that I found that were using oatmeal for cookies were vegan. Wow. And also I could never in my whole life manage to make good vegan cookies. Like I can make vegan food but I could never make good vegan cookies. That's no longer true. Yeah, no longer true. But I always had problems with that. Like either the fat part of the cookie was not right or it had a acquired taste that was like, well, disgusting. Acquired taste. I don't know how to say it nicely. It was disgusting. So yeah, this is the first time actually managed to create good vegan cookies and I'm very happy about it. That's not true. I'm just like okay about it. You're very happy about it. And it's also actually technically it's probably the second time because they're in the three batches that I got so far there were two different types of cookies. And the first type of cookie was great. That was the lemon cookies that you made. Oh, the lemon cookies were excellent. Those were really, really good. I wish you would publish these recipes somewhere, Gary. I'm, yes. Spamiguy.com. No, no, no, I have a new domain name for the cooking site or not cooking site, recipe site. Cookie guy.com. Cookie guy. Quick, register that. Cookie overlay. Yeah, maybe I should have asked you. You have better ideas than I do, apparently. Yeah, I will not tell you the domain name. But yes, the cookie site is going to happen eventually. I actually didn't have time to work on it. Now I have two interns that we will maybe introduce externally as well. And they keep me very busy nowadays somehow. So cool. Which is good. What are your interns working on? They are working on the Robots TXT parser. I don't want to spoil it yet, but they are doing some really interesting work on basically enabling others to build on top of the parser. Ooh. That sounds good. That sounds good. Yeah. Cool. Is this like the second wave of Robots text processor ring? Okay, I'm out from here. So sorry, so sorry. Okay, maybe we should take a break here before things go even more downhill. Thanks you two for joining in. It's been fun and entertaining for me at least, hopefully for those of you who are listening along as well. Stay tuned, I guess for our next episode, we'll have more of these over time. And thanks for listening in. Hope to see you next time. Well, hear you. Wait, you will hear us next time. At one point you will have to figure this out, John. I'm working on it. This is fine. Cool. Bye, everyone. Bye. Good day.