 Welcome, everyone, to the next episode of Search Off the Record podcast. Our plan is to talk a bit about what's happening at Google Search, how things work behind the scenes, and who knows, maybe have some fun along the way. My name is John Mueller. I'm a search advocate on the Search Relations team here at Google in Switzerland. And I'm joined here today by Martin and Gary, who are also on the Search Relations team. How are you two doing? All right. Can't complain. Could be worse. I'm going to die. We all are eventually. But maybe not during this episode, please. Or not anytime soon. I'll do my best. You're setting the bar quite low. All right, Gary. Before it's too late, can you take us off? Fine. It feels like I'm the only one that has a topic, anyway, all the time. How dare you? How dare you? In the previous episode, we talked about indexing and I thought that was pretty cool. And other people also thought it's pretty cool. So I am going to continue with that topic. But first, one thing that I forgot perhaps on purpose to talk about was rendering. And in my brain, rendering works in a very interesting way. Basically, we have John's workstation that's basically rendering all the documents that we are indexing. So you have John Miller standing in front of his computer and visiting random websites. And then his Chrome browser is outputting the dome of those web pages that he visited. And then that's sent to indexing. That's accurate. I got that right, right, Martin? Yeah. And you forgot a detail. There's like this entire pipeline where someone gets the request and then puts a little piece of paper onto a little slide that then goes to John Miller. And then he basically has these pieces of paper lining up over his computer. And there's this entire music thing that goes on. While all this is happening and then some steam comes out somewhere. Oh, Gary. I feel like you're being sarcastic. No. Me? Sarcastic? Never. Of course. Is this why I always get these weird advertisements in my browser? Yes, exactly. That's why. Because you click on all the things. All right. Since you are being a jerk to me again, do you want to explain how rendering works? Fine. Just because you are so nice and you are the chief of happiness at Google search, I'll happily explain to you how this works. And actually, this is interesting because I know that last time you went into a little more details into how the crawling part of things works as well. And there's a bunch of moving parts involved here. But eventually, once the crawling is done, we have the initial HTML. The initial HTML is then passed on. It's part of the caffeine system, really, that then uses a microservice that is called the web rendering service. And what that thing does is basically orchestrating or herding, if you want to, a bunch of Chrome instances across our cloud. These Chrome instances get the HTML that comes from crawling and then basically do what your browser does. If you visit a website, it uses the HTML to then construct the DOM tree. It downloads the images. It downloads the JavaScript, the CSS, all these resources, and then executes the JavaScript just as your browser would do. And to do that, we use an Evergreen Chrome. So whatever we are updating to the latest stable version of Chrome, that is usually like a few weeks after a new stable version of Chrome comes out, then we are also updating the web rendering service instances that we use for rendering. And then we have a bunch of systems that make sure that these browsers, if they crash, they retry. If your rendering fails for some reason, we can also retry. Something else goes wrong. We have error handling in place. We make sure that we get your website as it completes rendering, which sounds simple, but actually isn't, because when is a website done rendering when the HTML is there? Not really, because the JavaScript can influence the content as well. But when is the JavaScript done? That's also a tricky one to answer. But we're doing our best to actually give you enough time and enough resources to complete rendering. And then we have the HTML after JavaScript has run. And then that is passed on to other parts of caffeine and use for indexing. Does that make sense? Is that something that happens for pretty much every page? Yeah. I mean, that's like a lot more than I can load my browser myself. That is true. If you ever try to like run Chrome with lots of tabs, you know that this is not an easy feat, but we are pulling this off with lots of magic and tears. Back in the days when we could still visit each other and we've seen what people are doing, and we had those desks that were all organized in an office, I remember that John's browser always looked crazy because he always had like 10 million tabs open all the time and his workstation was always trying to take off and become the new Hubble telescope. So yeah, I'm still uncertain if what you said, Martin, is true or it's John actually doing the rendering. Although now that he's working from home, perhaps that would be much harder. So maybe you are right. I'm helping him. My laptop's also taking off because I'm also rendering some of the pages. And also, to be fair, in John's defense, you also sometimes have like a bazillion tabs open and there's even a photo that you took on Twitter that shows that you have lots of tabs open. So come on. Yeah. I mean, this is, I don't know, like I don't want to distract from the awesome work that the rendering people are doing because if you have tried to open a lot of web pages fairly quickly, then you really see there is a lot of work that needs to be done to make all these pages load and to be able to look at their content. I don't know. I find that really fascinating. On the one hand, like the crawling and the indexing side, those numbers are really impressive, but it feels like something that I can't really put into perspective. But thinking about, I don't know, opening millions of pages in Chrome, I can kind of feel that pain. Hmm. And it is not easy because some web pages are built so, how do I put this? Interestingly, that they are a true challenge to open in a browser and we somehow still pull it off and the team has worked not just for our benefit, but if you are using the Chrome DevTools protocol with things like Puppeteer, then that is partially work that the web rendering service team has done in order to be able to use a mainline or mainstream Chrome to actually do this. They also worked on reducing the memory footprint and the CPU footprint so that we can run this at scale in the cloud. And anyone who has ever tried to automate rendering websites using a browser at scale knows that this is not a trivial task to accomplish. So, like, hat tip to them. Yeah, that's pretty cool. Can we install browser extensions on the headless Chrome? I mean, if you run your own headless Chrome on your own server, then you can do that. Yeah, but not on the web rendering service. No, the rendering. No. Oh, that's a pity. Why? Do you want to bring back Flash? No, I wanted to inject cats in pages that I don't like. Also, a fun fact, you can't mine bitcoins on our infrastructure. At least I know that people tried and we are aware of this. So, no. Really? Yeah. That was a problem. That was one of the challenges that people are like, ooh, let's add this malicious JavaScript onto people's website. I mean, it's a fantastic money grab scheme, right? If you can infect lots of websites and inject malware, then why don't you inject the JavaScript-based WebGL Bitcoin miner? Interesting. So, what happens next after rendering, Gary? Are you trying to steer the discussion away from money-making schemes? Yes. Yes. Yes. Okay. So, there was rendering, there was conversion that we talked about and also collapsing, basically getting rid of soft error pages. Next, we start extracting more data from the pages. And one of the things that we care about quite deeply is structure data. And there's not that much to say there. Basically, what you see in the rich result testing tool is what we see, but we do extract all kinds of structure data. It's not just what the rich result testing tool validates, but pretty much any structure data that you put on the page. Now, this doesn't mean that you have to put every single kind of structure data on your page, but if you put something relevant, that will be extracted. Why? For entities. Basically, using structure data, we can infer some information about the entities that might appear on pages, and that's why we find that useful. And in this stage, I think that's pretty much it. Basically, this service just extracts the structure data and passes it on to the services that might want to make use of it. So, for example, the search result page might need, I don't know, recipe markup for presenting pages in the recipe carousel, let's say. And yeah, in this stage, there's nothing that much more that happens. But the next one, actually, the next step is actually quite interesting, because that's signal extraction. And I set the bad word signals, and now everyone on Twitter will go nuts about it. And it looks like John is also going nuts about it. But I would not overthink this just yet, because there are lots of signals that are kind of not so exciting, I would say. In this stage of indexing, many of the signals are just determining which version of the content is, for example, canonical. Like, is it scraper.com or myside.com? That is the original publisher of a piece of content. And for that, we, for example, use different kinds of hashing algorithms. We hash the content or the centerpiece of the page. Basically, the centerpiece is where the meaty part of the content is. We talked about this in a previous episode. And we hash that part of the content, and then we measure the differences between two versions of the potentially same content using these hashes. Why do we hash? It's because it's easier to compare shorter string than, for example, 2000 words. Essentially, that's it. But we do have exciting signals here as well. For example, I would say the bad word, page rank. What? I know. So you're adding the ranking of the page as a signal? Not that page rank. So page rank is one of our main algorithms. And it's hard to believe it, but after 20 years, we are still using it. It was used in the original backup version of Google, basically the alpha version, I guess, or beta version of Google in 1996, 7. Are we still using the actual page rank or have we replaced it with something similar? Because I thought we are not using the original one anymore. The original one is still used. Oh. But obviously, it's been improved since then. And it's more robust, cheaper to run, et cetera. Basically, it's been improved, but it is still calculated. But it's not what was displayed. Basically, if I know what my page rank was, according to the Google toolbar back in the day for my page, that's not what we are using today. That's never what we've been using. That's a dumbed-down version of the page rank algorithm. Ah, OK. So one thing, if you read the white paper about page rank, one thing will be very obvious is that it's not a number from 0 to 10 or from nothing to 10. It's actually an integer. So it goes from 0 to 65,000 something. And then on that scale, you land somewhere with your page rank. And then that number is used as a signal somewhere down the line, somewhere downstream. But it does take a lot of resources to compute this thing. I have a friend who's running their own version of PageRanker. Basically, he builds a PageRanker based on the white paper that we published, and he's using it for weird things like predictions, like NFL predictions or NBA predictions and stuff like that. And it's surprisingly accurate, but it's very resource-heavy because you always have to, it can be used for any link graph or graph-based system, right? Or graph-based structure. You don't necessarily need the web pages to rank. But it is very resource-heavy because you always have to take into account the whole graph always. And that takes up lots of resources. You need to store that in memory, then you traverse every single node so it can get really hairy, especially if the graph is big. But that's one of the signals that we calculate here based on the links pointing to the page itself and also the quality of those links. More links that are low quality will essentially increase your PageRank less than a few high-quality links. So don't go buying links from Link Farms. So you're saying we should buy links from high-quality Link Farms? Okay, don't buy links. Good boy. I guess. But this is awkward. So let's just talk more about signals. What other signals do you think we have? Page speed? No. Train crossing signs. Wait, what? Page speed is not calculated here. Ah, okay, it's not calculated. So signals that we are calculating here, is that what you're saying? Yeah, in indexing. Keyword density. What? Why would you even say that? To make it awkward, that's what we're here for. Haven't you realized that yet? I thought that you are my sunshine and happiness, but apparently not. Okay, content quality of some sort. No. What main topic is maybe we could figure that one out semantically? Yeah, maybe. That's perhaps not that bad of an idea. Okay, I will help. Save search. Like you want to know at this stage whether a page contains porn or not, because we don't necessarily want to surprise our users in our search results. So when they search for something like buttercup, for example, then you don't want to serve porn. You want to serve what they actually searched for, the literal term. Then another signal that we calculate here is where the page is local to which country or which metro area in certain countries. Which language as well? Yeah, language as well. Thank you for interrupting me. Sorry. But these are all important signals for serving highly relevant results for people. Because if I'm in Switzerland and I search for, I don't know, Kaiser, then I don't want German results. Well, I rarely want German results. But why would you search for cheese in Switzerland and not want Swiss cheeses? What? Oh wait, German cheese. Yeah, German cheese, no German cheese. Just Swiss cheese, right? Yes, that's correct. And then because we know that those pages are from Switzerland, don't make me laugh because I can't talk. Because if we know that those pages are Swiss, basically, relevant to Switzerland, then we can give them a tiny boost in ranking based on the signal that we extracted during indexing. As you said, language is also important for a similar reason. And yeah, I think those are the signals that I'm willing to talk about here. So based on these things so far, essentially, you are a developer and you build a good web page, like a good HTML, good JavaScript, good whatever, then you should be able to do well in search. Yeah, if only. What? So just one quick thing. Are these signals a part of those hundreds of signals that we use? That's a good point. Some of them will end up being ranking signals. Some of them will not. Some of them will. So for example, save search will definitely become one of those ranking signals. Country language, both of them will become. But most of the hashes that we use for canonicalization, for example, which we reserve for future episode, those don't end up as ranking signals. Yeah. Okay, cool. If only people would just write like good stable HTML and... Yeah, everything would be better and without JavaScript. Whoa, what? JavaScript is okay. Yeah. No, no, no, we talked about rendering. Like we heard JavaScript is fine. I just have to refresh the page. But to be fair, there are developers who are basically like using JavaScript as the go-to tool for everything. And then they can paint themselves into corners and then they're like, oh, what's happening? And lots of them discount SEO as something that they should be aware of or they should be like careful about. And that's my daily challenge, basically, to like bridge the gap between these two communities. And coming from a developer angle, I'll probably focus a little more on the developer side of things. But it is a two-way challenge in the sense that sometimes a few people in the SEO circles make very broad blanket statements and that does not exclude us, to be fair. I know that when I am on the go and tweeting, then I tend to say like, oh, no worries, JavaScript will be fine. Which is a blanket statement that I'd rather not make, but every now and then I'm human, I make mistakes. And our general perspective on search is that it's our job to make the world's information accessible and useful. So it's our job in search to be able to index and showcase good content websites on the web to people searching for the thing that the websites are about. And so we'd rather have you worry about making good websites and then us worrying about the technical bits and pieces. But I think developers do their part to this, as well as SEOs working with developers are taking part in this entire equation. And if you say things like JavaScript is bad or JavaScript is great, then that's very simplistic views of the issue or of the challenges that we are facing. And I'm trying to be as specific about this as possible, but even I fail every now and then by making these blanket statements. But I find it tricky to talk about these things because on one hand, if I get too technical with SEOs, they sometimes oversimplify it and then there's like this game of telephone going on where I say something and then an SEO hears part of that and talks to the developers who hear only part of that and then like some weird miscommunication happens that then is really hard to backtrack out of. But on the other hand, I wish that developers wouldn't discount things such as SEO, but not only SEO also use experience, usability, accessibility, design as like, yeah, whatever, this is like simple stuff because it really isn't. There are a bunch of things that you need to keep in mind and that are not necessarily easy to get right. I don't know if you observe this with developers as well, especially when you're at a conference and you're like, hi, I'm talking about SEO. Like everyone's like, oh, I go to another session, I guess. Yeah. But what kind of things do you find developers skipping over when it comes to SEO? Oh, that's so many different things. So one thing is using JavaScript where you don't have to. If you are building a very simple, mostly static website, let's say like a blog that doesn't have that much interactivity, maybe comments, okay, fair, or maybe like an online shop or something like that, they oftentimes go for the shiniest newest technology, which usually is somewhat experimental. For instance, most of the client side JavaScript frameworks of the early days were technically quite experimental ways of making the browser do stuff that it couldn't do at the time when they should have just opted for something existing like a server-side technology instead of doing everything in the client side because the server is a more controlled environment. We are now seeing the pendulum swing back. So all the bigger frameworks have server-side solutions or investigating how they can do things on the server side, which I think is a good thing. Thinking about the architecture of their websites, thinking about how the content should be presented, how you deal with internationalization, if that's a concern for you, these things seem to be like afterthoughts oftentimes. Okay, so it sounds like it's more about adding complexity where there doesn't need to be complexity and making things more brittle through that complexity. It's not that it won't work at all because it sometimes feels like on the one hand, we're telling people, oh, JavaScript will just work fine for SEO. And on the other hand, we're saying, well, you should like avoid using JavaScript unless you really need to. And I can imagine developers are kind of like, just tell me what to do. Yeah, that's what I mean. And I think there are scenarios where it absolutely makes sense to have JavaScript in the client side. If you are building a 3D design app or a product customizer or a catalog system with really fancy interactions or something like that, then I understand that if you build a web application to design things in the browser, I understand that you're using JavaScript in the client side, but it's not a go-to tool. If you're making a marketing website with a landing page and like five different pages that show different products, then why are you using JavaScript in the client side for that? If you are a JavaScript developer, feel free to use the technologies that make sense to you, but try to make it as robust as possible. And yes, if you build a complicated and amazing and fantastic web application that needs all the interactivity, we are here to help you and it will work in search unless you're doing something very fundamentally wrong, but that doesn't mean that you should always use JavaScript in the client side. I think that's a very important distinction there. And the other thing is also sometimes just very basic SEO concerns. If you have a link rel canonical on every page pointing to the home page, then why are you doing that? What was the thinking here? How do you deal with routing? Do you use fragments for routing? That's not a great idea for a variety of reasons. If you are doing different ways of having different URLs landing in the same thing that then customizes the content, you should look at that. Don't just write a robots.txt because you think it's the trivial thing to do and then accidentally block all your APIs from being crawled and then the content doesn't show up. That's not a great thing either. Be careful with no index tags, all these kind of things that developers are just like, yeah, I know how canonicalization works and then they don't. Okay. So basically you're saying developers should take SEOs more seriously. Yeah, okay. But also SEOs should be aware of their responsibility to not cry wolf whenever there's something that they are not fully sure about. Make it a conversation. Have these conversations. Let your developers explain why they chose JavaScript. Hold them accountable for what they are doing and also guide them in when they are making decisions. Don't show up after they made all the decisions and ask them to redo everything from scratch. Then it's too late. Yeah. I would just blame them. No. That's a fantastic idea. The blame game. Lovely. No, it's like you have to work together. Yes. I think there's always room for both sides, but it does sometimes come across to when I talk with developers that they're like, oh, SEO is this voodoo magic thing that you don't really need according to Google. But there are just so many things involved with SEO that from my point of view are critical to making a website accessible online on the web and especially in search. So it's certainly not something that you can just ignore. And kind of, I think your comparison to accessibility, usability, design, that's very apt there. It's not that it's like, oh, I will just make everything light gray on dark gray and then I will be a designer. That's not really how it works. But I found a lot of these things also came up in the recent unconference that you organized, Martin. Yeah. I thought that was really cool. Oh, thank you. How is your impression of the event as a facilitator because you facilitated the session? Yeah, I thought it was really cool. So these were small groups, I think around like 20 people. I took part in the one for mobile first indexing and the one for talking about SEO, of course. And in both of these sessions, there was a really nice mix of people. So there were people who were fairly fresh who were like, oh, I just started with SEO this year. And it's like, I have no idea what you all are complaining about or kind of like all these inside jokes, I don't get them. So that was really cool. From my point of view, I find it a really nice sign that there are more and more kind of new people joining the SEO side as well because that kind of confirms that maybe SEO is not a fad that is just going away. Maybe that will stick around, which is I think good for the community in general and the whole ecosystem. There are also people there who are working with really large sites. There are people there who mostly work with small businesses and kind of getting that mix of feedback was really interesting. Like especially the ones working with small businesses are like, we don't have to get everything perfect as long as we're a little bit better than our competitors. That's good enough. And it's like, that's a perfect approach. You kind of have to focus your work on things that matter and you don't have to be absolutely perfect when it comes to SEO. But we also try to balance our attendees a little bit across the world. So I hope that that also came through. Yeah. Now the different people from different locations was really cool. Time zones, of course, is tricky. I think we had one or two people from India in our sessions and for them, it must have been in the middle of the night somewhere. But otherwise, that was really nice. Also, a really nice mix of genders. It was fairly balanced. That was really useful. And I think it's something that sometimes goes missing with a lot of the online events where you have the same people speaking over and over again and you just hear their opinions and it's like, I work for this really large company and work with these giant clients. And that's one part of SEO. But there are so many people who are working on kind of normal websites, normal businesses. And that was, I found that really cool. So do you think we should be doing more of these? Definitely. So this was like our pilot event and we have been receiving very overwhelmingly positive feedback. And I know that I broke a lot of hearts with sending lots of people emails that they didn't make it, they didn't get in. And I know that that is not optimal, but as you say, like these discussions unfolded in smaller groups and I doubt that they would have unfolded in larger groups. There are only so many people facilitating sessions. We got the conclusions from every facilitator. That is great. So we'll be publishing a blog post about this. Or by the time this episode comes out, maybe there is already a blog post to check out the Google webmasters blog. And yeah, we think with all the positive feedback and also with the time zone challenge, we will definitely run more localized events across different time zones so that people can jump into the one that makes sense for their time zone. And by having more of these and maybe even doing them more regularly like once a quarter or at least like twice a year or something, there will be more chances of getting a spot even though we will have to continue to probably limit the amount of people. Because whenever we are putting out an event, whatever kind of event it is, the lists for participants fill up very, very quickly. Yeah. No, I think it's really cool. I mean, it's also kind of this big gap that we have because we don't have physical events at the moment. It's like it would be really nice to do some kind of real physical event type thing. Maybe we can start doing those early next year. Probably not, I guess. We'll see. We'll have to see. They're very optimistic there. Yes. Everything will get better. Like after Christmas, it's just like one uphill stretch. Oh, uphill stretch in infections or? No, no, no. It'll just get better. Everything will be better. Okay. I'm sure. Okay. That's what you asked for from Santa or? Yeah, exactly. How does that work? We should just all send a note to North Pole saying it's like, make everything awesome again and let us have real conferences. We'll also probably do another virtual event where we don't have to limit the amount of participants later this year. So at least we have that. But yeah, physical events would be amazing. That would be great. I miss it. I mean, as many people as possible can also listen to our podcast. So it's not. That's true. Like we're limiting things there or they can watch all of our videos. It's like you can binge watch the Webmaster channel on YouTube if you really miss hearing us, seeing us. So that's an option. There's also no limits on the seats for the office hours. That's also a thing that we're doing. If you want to join these. Cool. All right. Fun times. Sounds like lots of stuff is lined up. Cool. Okay. So maybe we can take a break here. Thank you for all of the cool insights from rendering and indexing and talking with developers. I thought that was pretty cool. To everyone listening in, thank you for listening to the podcast. I hope you found it insightful and kind of fun. I certainly found it that. And feel free to like and subscribe and follow us for more information as things move along. Send us follow links. Follow links. Yeah. Yeah. No follow links. I only want no follow links. Sponsored. But what about subscribe? Subscribe links. Subscribe. What? What are you doing? What? It's not a thing. Ah, okay. Great. Maybe we should make it a thing. Okay. See you all later. Bye. Bye-bye. This is on Tahla Ashra.