 Welcome, everyone, to the next episode of Search Off the Record, a podcast that we're trying out. Our plan is to talk a bit about what's happening at Google Search, how things work behind the scenes, and maybe have some fun along the way. My name is John Mueller. I'm a search advocate on the Search Relations team here at Google in Switzerland. And I'm joined here today by Martin and Gary, who are also on the Search Relations team. We've done a number of episodes now, and I think we're kind of getting a hang of it, but it's still all kind of exciting and cool and awkward sometimes. But anyway, lots of cool stuff happening, and I hope you enjoy this episode. Martin, do you want to take us off? Yeah, sure. Sure, why not? I was on vacation recently, and while I was on vacation, we announced a cool new thing that we have been working on for quite a while, which is the virtual webmaster and conference. And I was really, really surprised, because one of the things that I was worried about up front was the fact that I knew it would be a very different format, and I was like, no one's going to join that, but that's not what happened. It filled up really, really quickly. That was wild. I'm not sure how the event's going to go. John, what's going to happen if the event goes wrong and doesn't go the way that we expected it to go? Will this be OK? It'll be OK, Martin. Don't worry. I think, I mean, by the time this episode is live, we'll know how it went. But in general, it's good to try things out. And I think, especially when it comes to online events, people have tried various things already, and some people are kind of tired of the traditional things. So doing something different and taking a risk, trying something new out, I think that is fantastic. I really, really hope that it goes well, because I kind of miss, as you say, like there's so many virtual events out there, and most of them are just like some speakers talking to you and you were watching them, and I'm like, yeah, this could be a YouTube video. This doesn't have to be like a live event. So I'm hoping that we get some nice outcomes from the discussions and panels and feedback sessions. Can we fire him if the event doesn't go well? No, no, no. Oh, bummer. Sorry, Gary, you don't get rid of me that easily. Also, if it goes well, I think we will definitely turn it into like a series. We can definitely run these multiple times and with different things. People have asked me like, oh, can you record it? Can we like listen in? And I'm like, no, I don't think that makes sense, but maybe we can try that with later events, just not in this pilot. Yeah, and I think one of the things we also notice is there's just such a high demand for any kind of event from our side. And I think the big takeaway we picked up there also is that we kind of need to do something more traditional along the way too. Definitely. Yeah, so I don't know exactly how we'll do that. I think timing-wise, we'll aim more towards like end of the year so that we have a bit of time to get everything organized, to make sure that we have everything set up properly for the speakers, that we have sessions that are interesting and setups that are a little bit, I hope, interactive and where we can get some feedback from people beforehand and maybe along the way as we go in there. Like what kind of sessions do you think we should do there, Martin? Do you have any ideas? I would love to have some sort of maybe Q&A panel. I think these are very, very useful. I think your site clinics are generally also super useful. I'm guessing that we have probably new things to communicate, so the classical front-centric sessions where we have a talk is probably also a good idea. What I'm wondering about is, so on one hand, I want to make sure that we are not limiting capacity because people don't like that. And I think the beauty of virtual events is that we don't have to limit capacity, but then we have to figure out how we can mix the interactive parts with the less interactive parts and with the non-interactive things, like the regular talks. I prefer them prerecorded, to be honest. I don't know, how do you feel about prerecording and screening with the live chat, like we do with the Webmaster Lightning talks, or do you prefer live presentations as in live, live? I kind of like both. I think for information-rich talks, doing something prerecorded makes it a little bit easier because then you have a chance to really fine tune the message and to bring it across in a way that is easier to understand, where you can match your slides, maybe if you're doing slides as well. But the live-life talks, I think what's interesting there is just, it's a whole different energy because you have all of this kind of pressure and adrenaline from doing something live, which means it's not perfect, but it also, I think, brings across a little bit more the kind of human touch when it comes to content. So especially for Q&A, for maybe a small panel or something like that, doing something live, I think is pretty cool. Doing some of that in a chat format is also an option. I kind of like the video for live as well, but sometimes chat works just as well. And especially considering that not everyone speaks English as a first language, then sometimes having something written makes it a little bit easier to understand what is actually happening. Definitely. And you can also get transcriptions for the videos and stuff that that makes pre-recording a good option. And I think if it's recorded, it makes it a little bit easier for the different time zones, but anything interactive, we still have to figure out how we deal with the time zones there. So if people from Asia are interested in joining in or want to listen in live, does that mean they have to get up in the middle of the night or could we do something that's spread out a little bit more? I don't know. Yeah, that's a good question. Maybe we could do like a two day event, one in one time zone and then the other in the other time zone. But it's like similar content. I don't know. We'll have to figure this one out. This is gonna be interesting. Or we could just do it as a podcast. Then we don't have to worry about the video format at all. I approve of this idea. Of course, yeah. Because Gary, we want you on video, but you really, really don't want to get on video, huh? That is correct, sir. But your class on how search works was so good. I would love this content to be more widely available. And I know that you are okay with being on stage in person, just not on video, right? So we were actually talking about this with our producer with Anna, but I really want to do it properly. Not from a home recording, but in a studio. And we have, well, Anna has ideas about how to make it happen. But you will be on camera? But when we have a studio and I would be on camera, basically if it's worth it, then I can deal with being on camera, but like for events, for example, I just feel so awkward about it that I just don't want to do it. If someone tricks me into being on camera, like for example, there was this aircoats podcast the other week where I was invited to talk on a podcast, again, aircoats. And then when I joined in into the meeting, then they started recording the video feed. And I'm like, hang on, what's happening? And what do you mean, what's happening? It's like, I thought this is a podcast. It's like, well, it's actually a webcast. What does that mean? It's like, you are recording me? Yeah, could you also change your t-shirt because it's like you're in a home t-shirt? But don't you think you should have told me this before? Oh, well. So yeah, I will not cancel on the spot because I don't know, I feel that it's unprofessional. But yeah, that was kind of awkward and not something that I would prefer to repeat. But if we have a studio and we have our producer and her ideas, then I'm very happy to do it for something like Lifehawk Query or House SearchWorks or whatever it's called externally. That is fantastic news. So, Gary, like since we're in a podcast format now, would you be interested in going through some of those details around like House SearchWorks, how indexing works here? What kind of question is that? Tell us a little more about indexing, Gary. Come on, let's do it. Man, I wish I could, but I really don't want to. You should do it. Tough luck, Buttercup. You have to. Let's do it. Well, it's kind of, well, fine. Okay. Let's talk about caffeine. Actually, that's a good topic because there was some confusion about that on Twitter as well. So maybe I can shed a light or two on that topic. Actually, we could do a breakdown of caffeine on these podcasts. Okay. Yeah, we should. Okay, let's do that. So yeah, we have caffeine. That's our indexing system. Only externally it's called caffeine. Internally it has some other name, but that doesn't really matter. And it does many, many things. And I think that's actually not very clear externally, that it does many things. For people it's just like, we have the crawler, which is Googlebot, and then that goes to something, something Google magic. Well, people know that it gets rendered, and then something, something Google magic, and then we have an index. Now, we can actually break down that Google magic, and people in general know that Google magic, or could figure it out if they wanted to, but that Google magic is essentially what caffeine is doing. Basically ingesting, picking up whatever is produced by Googlebot, which is a protocol buffer. You can look it up on your favorite search engine, what a protocol buffer is. And then that protocol buffer is picked up by caffeine, and then we collect signals, blah, blah, blah. And then we add the information that caffeine produced into our index. Now, what's happening inside caffeine? Well, the very first step is that protocol buffer ingestion. Basically, it picks up the protocol buffer and starts processing it. The very first step after ingestion is conversion. What does that mean? Well, it's conversion. Basically, it converts, right? What to what? Do you have a problem with the word or? No, I'm just like, okay, we have a protocol buffer, which has all the information that it needs. What does it convert there? Does it convert the protocol buffer into a different format or? Well, that too, but first, we have to, for example, normalize the HTML. Because as you may have heard or noticed, the internet is generally broken, HTML-wise, but we still try to make sense of it. Now, if you have really broken HTML, then that's kind of hard. So we push all the HTML through an HTML lexer. Again, search for the name. You can figure out what that is. But basically, we normalize the HTML. And then it's much easier to process it. And then there comes the hot stepper, H1, H2, H3, H4, I know. All these header tags are also normalized. Through rendering, we try to understand the styling that was applied on the H tags. So we can determine the relative importance of the H tags compared to each other. Let's see what else we do there. Do we also convert things like PDFs or? Oh, yeah, yeah, yeah. So Google search can index many formats, not just text HTML. We can index PDFs. We can index spreadsheets. We can index Word document files. We can index, what else, Lotus files for some reason. Seriously? Yeah. Yeah, I don't know. Wow. Yeah. Can we index floppy disks? What's your obsession with floppy disks? I love them. They are such a beautiful medium. I wish they would come back. No, they are not. They are actually the opposite of beautiful, whatever it is. No. It's like they get corrupted just by, like, literally you blow some air on them and they are corrupted. But they are iconic because they are the safe icon. Oh, my God. Oh, not sorry about this. Okay, sorry. Yeah, so we convert these. We put out the HTML tokens. We normalize the HTML tree. Wait, wait, wait. So going back to PDF. So PDF is a binary format. It's not that easy to process. So for that, as far as I remember, we license Decoder from Adobe that we use to basically convert the PDF to HTML. And then from there on, we are just working with HTML. And this happens with all the other binary formats that we can index in web search. Of course, those are also normalized. So the HTML eventually will be very well formed. We then start looking at meta tags because there are a few meta tags that we deeply care about. For example, the meta name robots. Keywords. No, not the keywords. We don't care about the meta keywords at all. Like, at all. I'm sorry. I'm not sorry. I changed my mind. But the meta name robots, that's something that we deeply care about. And for example, if we find with a value no index, then we know that we have to stop processing that document or at least don't put in the index. What else do we do here? Something that confuses some people is that if there are HTML body related elements inside the hat tag. Like iframe. That belongs in the body. Like iframe or div or p or span or whatever. Then the HTML lexer will close the head right before those tags and starts the body from there on. And you can also look at this, like how it works in practice, on the W3C HTML validator, which will do something similar. But that's an important detail if you're injecting stuff in the head and you want stuff to be picked up from there. What else we do here? I think that's pretty much that's it. One system that is related kind of to converter is something that we call the Collapser, which is essentially the system that's doing error page handling. As we said previously, the internet is generally broken HTML-wise, also HTTP-wise for some web servers, but we still try to make sense of the internet. And one thing that is broken on the internet sometimes is error pages. For example, there might be not found page, 404 page that comes back with an HTTP 200 status code. It's actually quite common. And what we call soft 404s. Every single page app in the world has probably at some point ran into soft 404s. Oh, really? Oh, wait, you're saying we have to do this for JavaScript sites? Not just for JavaScript, okay? Other people screw this up in different ways, too. Are you saying that JavaScript sucks? No, I'm saying people don't know how to use it without breaking the internet sometimes. So JavaScript sucks according to Martin Split? No. How dare you, Gary? So you're saying JavaScript is an error page? Okay, continue. Why am I even working with you, people? I just... So going back to error pages, which are generated by JavaScript... Gary! Okay, okay, fine. Not every error page is generated by JavaScript, but some are. Thank you. Okay, now that we have Martin's blessing error pages, soft error pages, we don't want those in our index. So we try to detect when they happen, when they show up in our processing pipelines, and that's what this error page handling thing does. Basically, we have a very large corpus... Well, actually corpora of error pages, and then we try to match text against those. This can also lead to very funny bugs, I would say, where, for example, you are writing an article about error pages in general, and you can't for your life get it indexed. And that's sometimes because our error page handling systems mis-detect your article based on the keywords that you use as a soft error page. And basically, it prompts Caffeine to start processing those pages. And of course, error page handling also works on other kinds of error pages, not just 404s. Like, for example, if the server sends an overloaded message, HTML page, but the 200 status code, then we might be able to understand that. We have redirects that are not so obvious, and we can detect those as well. What else? We also try to detect login pages here. I'm not sure why is that useful, but we know about login pages. So with error pages, would we also count something like an e-commerce site that has a product out of stock? Would that be seen as a soft 404 page, or would we try to index that page anyway? I think generally that would depend on the words that you are using on the page. So should people try to hide that then, or is it? Well, I think this is a philosophical question. And I'm sure that there is someone on our team who's trying to figure this out. But in my opinion, you don't want out of stock pages in the index. So you would probably want to get rid of them. Again, I think it's a philosophical question because you can definitely provide some functionality on the page so people can, I don't know, subscribe for updates about that product or something like that. And then it might become more useful, maybe, to have those pages in the index. But if you just have an out of stock page, why would you want to do that? Okay. Questionable value there. Except, I have an idea. Except if you are trolling the users. Basically, you are advertising that you have this product, but when they end up on your site, then it's like, just kidding, out of stock. That doesn't seem very nice. Yeah. Okay. So I guess the answer is, it depends. It depends. Okay. It's one of your favorite words, I know. Or phrases. Yeah. It's the catch all phrase, I think. Yeah. I think that's really tricky. Like the out of stock situation is something that people, when I talk about e-commerce, it always comes up. Like, how should we handle these? And it feels like on the one hand, you still want to be findable for some of these, especially if it's like a vintage item. Like there is only one in the world and you no longer have it. But you have all of the documentation and all of that. Right. And you kind of want that to be findable. But if it's like something that everyone else has and you just don't have on stock at the moment, then for the user, it's more useful if they just go to some shop that does have it. Yeah. I think it depends phrases very applicable here and on many things in SEO. I know that people don't like it, but it's very useful and we should use it more. It's true. It's true, yes. It is just, most of the things depend on the context. Yeah. Oh my gosh. We should use it more. Okay. Fine. So one of the places where I kind of ran into some of these struggles with it depends myself is when I had a friend who got in touch with me because they heard I was at Google and doing some stuff on search. And of course they have a website and like could I just help them to make it number one kind of question, which of course I can't. I can't go there and give you that secret keyword meta tag that will make you rank number one because Gary just said my secret doesn't work, the keyword meta tag. Wait, the keyword meta tag is your secret? Your secret sucks. I mean, everyone knows about it. Oh man. I thought we just tell people not to use it because it's one of those things that makes the magic happen and we just don't want them to know about the magic. No, that's the page rank meta tag. Oops. Oops. You mean meta rank, name rank, content cheese? I don't know. Well, anyway, so on the one hand I was putting together all of the general SEO information that we have that we can give people and tell them what to do like our SEO starter guide also some of the other third party SEO starter guides that are out there. There's some, I think really useful stuff but there's like a ton of information out there and if you don't know anything about SEO it's really hard to get started. So that's something where they also came to me like so how do I find someone to do SEO for my website? And we have a video on that. We have I think some help center pages on that. So that was kind of useful to point them there but it's still one of those situations where it's like well, I don't really feel comfortable just recommending someone that I know even if I know there's some really good people out there and it was more that I was trying to find ways to give them advice on what they should be looking for when it comes to an SEO and in the past when people asked me about this usually I told them to find someone local that you can work together with in person and kind of discuss the problems and questions that you have about your website. Nowadays like you don't really do a lot of business in person anymore because of all the coronavirus stuff but it's still something where being at least in the same time zone and being able to do some kind of a phone call video conference makes it a lot easier to discuss and it makes it a lot easier to kind of figure out what kind of needs you really have and what kind of things the SEO side could be helping with. That was really interesting to kind of point them at that. One of the things that I ended up doing more for my own sake was to crawl their website. So that's something that before I joined Google I wrote a site map generator that basically went off and crawled websites. Of course it did. Which was really fun and it was one of those things that I found extremely insightful because then you notice how broken the web is. Kind of like how you mentioned Gary. On the one hand, the content on the web is really broken. On the other hand, all of the URLs and crawling on the web is just so broken and it's sometimes a surprise that Google get through it all. But I picked up some third party website crawler tool and I played around with it and it's really cool to see how you can just crawl your website and see what the current status is and especially when you compare it to something like Search Console which is based on what we index it is really interesting to see that immediate feedback where you can just crawl your whole website in an hour and see what kind of comes out versus in Search Console where you wait a month and see what comes out. So I mean it's not to kind of like say Search Console is not useful in that regard but it's just, well, Search takes a long time to update everything across all of its indexes and kind of understanding those differences I thought was really useful. So I pointed them at a website crawler like that as well and said like if you're going to make big changes on your website here's one way to kind of test that before waiting to see how Google actually figures things out. So I thought that was pretty cool. It was fun. Yeah, I always find it awkward when people are asking recommendations about SEO and especially if they are close friends and you are like, well, I can't really tell you and then they are like, but you work on Search. So yeah, that's exactly the problem. And then you point them to the documentation and they are like, are you really just sending me to some documentation that you wrote for the noobs? You are a noob. Like what? And then when someone asks for an SEO recommendation like recommend an SEO company then gets even more awkward because I do know tons of good SEO companies that I could recommend but since I am working on Search plus we are the face of Search for SEOs and marketers and webmasters and whatnot how do you single out one SEO company and recommend that it's just so weird or it would be so weird? So yeah, that's also really weird. Yeah, it's an interesting struggle because we are so involved with the whole community and at the same time we really don't want to point at any particular people and say, well, we know them really well and you should go and do business with them. I don't think that would be fair. Another cool thing I found not while looking at their website but just on Twitter in general is this really cool presentation from Tobias Willmann. He's also from Zurich. So, woohoo, Swiss people. I hope he's Swiss. I don't know, actually for sure. Oh, Swiss local community. But he's in Switzerland. He did this really cool presentation and kind of some research on GIF SEO. Have you ever heard of that? I heard GIF, isn't that like a peanut butter brand or something in the U.S.? No, no, GIF. That's the image format. Like GIF. Animated images, yeah, GIF. Oh, you mean GIF? GIFs, yeah. No, GIF. I know that the inventor got it wrong how his own invention is called but yeah, it's definitely GIF. Okay. Am I called Gary or Jerry? You're called Gary. Jerry. I think I'm Jerry. Oh, so cool. Sometimes you're George. I made a terrible mistake. Now we know you're Jerry. Cool. Well, I mean, people never have problems pronouncing your name anyway, so. Sorry of my life. Anyway, so one of the cool things in that presentation was kind of a side note that actually some of these GIF search engines are really popular and apparently one of them is actually the second most popular search engine. What? Yeah, like I never realized. Let me repeat that. What? What? I mean, I don't know if it's really like the second most popular. Maybe Google is second most popular. I don't know. Jugo. Jugo. Jugo. No, wait. So what's up, Jerry? All right. Well, anyway, it's really obvious once you think about where these search engines go, namely on your phone, on your keyboard, when you're looking for an animated image, it goes to one of these GIF search engines, right? Yeah. It's like you're looking for a meme, you're looking for something funny to add to a chat, or to Twitter, or whatever. You go there and you like search for something specific, kind of like a Martin reaction GIF or something. And you include that in your message. And like obviously people are just going off and searching for funny images and not always going to Google to search for something useful or informative. And John, to be fair, we both are meme material when it comes to that. I know that I have a few GIFs of myself that I use as reactions sometimes and people have seen them in slides and enjoyed them. And I think we need more of these. Izzy Smith has a bunch of like SEO memes and SEO funny images. And I think also GIFs, we should probably produce more of that. I'm pretty sure in our video material there's so much that you can use for reaction GIFs or GIFs. It's GIFs. We should support that. We should do more of that. We should do more reaction GIFs. Yeah, that's pretty cool. Or maybe we can do stickers like Googlebot stickers, like the virtual kind. Oh, that's good stuff. But how do you actually make these stickers? It's in his presentation. I'll send you a link. It's pretty cool. Okay, I need to check that out. Also the way to do SEO for these is hashtags, basically keywords. Wow. So meta keywords would work there. Meta keywords is how you rank your GIFs. That's okay. That's interesting. That worked out really nice for early search engines, web search engines. Well, I think, I mean, I don't know, maybe people search differently when they're looking for GIFs. But who knows, maybe all of that will change and it's more kind of a generational question. And at some point people will be like, what do you mean people searched for text? Wait, but you know what the difference is? There's no commercial intent. It's true. Not yet. And how would spam look like? I mean, if there's like advertisement in the GIF or GIF results, then you would just like not click on it. That's it. Well, I don't know. I'm sure there are pictures of spam that you can include in GIFs. I love spam. Hashtag spam. Love spam. Spam spam. Spam spam. Spam spam. Okay, wait, wait, wait, guys, guys. I think we should just cut it off here. This is getting out of hand. Spam. Spam beans and egg. Wait. Spam spam. Spam spam. No. Eggs and spam. Spam. Where's the mute button here? Anyway, thank you all for listening in to our episode of Search Off the Record. We'll be back with more useful and insightful information in one of the next episodes. Hope you subscribe and listen in when we come back. Thank you all. Spam. Bye-bye. Go seichou arigatou gozaimashita. Whatever it goes at. Yes. I mean, Jerry.