 OK. Hold on. Now. OK. Wow, that was recorded. OK, welcome, everyone. Today's Webmaster Central office hours hangout. We have a bunch of special guests today that have chosen to join us here in Zurich. I don't know. Do you want to introduce yourself? OK. I can start. So I'm Ulrika Wieberg. I come from Stockholm, Sweden. I do technical SEO and digital marketing and digital strategies. I run my own agency called Unicorn. Cool. Interesting. My name is Peter Nikolov. I come here from Russia, Bulgaria. I built a stuff for my own company called Mobility Development. So it's, as the name show, it's mostly for development. But last days, we are working a lot of SEO stuff for things like that in those section mostly. Cool. Cool. Yeah, I mean, most of you probably know me by now. I'm Martin Splitt. I work with John here in the Zurich office. And as we have a few guests, I thought I'd just pop in again as well. So if you have JavaScript questions, fire away. Cool. And I'm John. I do these office hour hangouts from time to time. I don't know, since we have a bunch of people here in the room, do any of you want to get started with the question? Or? I think you should start with someone. Like a jump in, if that's OK? All right. Someone from the hangout. Go for it. All right. Hey, hi, John. Hi, everybody. I have kind of like a technical question. So we have a customer who bought a domain name which used to be hacked. So the site used to be hacked. We know that. And the issue is that there are millions of URLs, unique URLs, that are indexed in Google at the moment on this old hacked website. So currently, the site is about 100 pages. It's just a regular site and too fancy with some regular URLs. But we are having issues on our server because Googlebot is crawling all these old URLs and they get redirected. Or they even give a 200 status code or a 404. But my question is, what is the best approach here to, in the first place, get all these URLs out of Google? And in the second phase, make it so that Googlebot doesn't overload our server with requests. Because the past 14 days, I think, we had about 15 million unique requests from Googlebot trying to get these pages, but they are old hacked pages. So I'm wondering what the best approach is here. I can help you. Impossible. So I have a similar situation like you, like years ago or two years ago with different customers. So probably the best way is to make in HGACs stinks rewrite so that hacked URL to return 404 or 410 gone. And this will stop processing the, for example, PHP making calls to the database or stuff like that. This will help because Apache will send that URL and return direct status code to the Googlebot. And this will help you with, for example, when Googlebot go and made a CPU hammering to your website. Another way, but probably you need to be dev opting, you can make, for example, get a virtual machine from someone, for example, like Google Cloud, and make some kind of redirector to the right site. But if URL is infected, that site will return 404 or 410 directly to the Googlebot. So it's not an easy, but if you have to do that, it's actually can take a few hours of work. I think that kind of makes sense. I mean, essentially what you want is Googlebot to stop processing these URLs. And it primarily does that when it sees that these are always 404. So there will be a period of time where we try to crawl all of these like crazy. And then after we see that it really doesn't make sense to crawl these anymore, then we stop crawling them or drastically reduce them. So some kind of a way to bridge that time until that happens, I think, makes sense. And removing them from search is something. So I would first see if it's actually a real problem in that if, for normal queries, these pages even show up. Because if they only show up for site queries where someone actually knows that there is hacked content on the site, then that's probably not so much of a problem. And they'll drop out over time. But if you really need them removed urgently, the URL removal tool is kind of what you would need to do. The tricky part there is you need to do that probably on a per directory level to say those hack directories are things that should not be string-owned in search. And the removal tool doesn't remove it from the index. Despite the name, it just hides them in search. So it will still crawl. It needs to see the 404s. And then it just won't show them in search. OK, thanks. Just a small follow-up question. We are currently rate limiting all bots that are crawling the server, because our server was going down because of the requests. Do you think that's the idea? Do we have to turn off the rate limiting? I think that's fine. So what will happen in practice is Googlebot will crawl less. And we'll try to prioritize better. So we'll try to focus on the URLs that we think are actually important, which should be the normal content. May I extend you? Sure. So maybe I support that, John, but if you have a budget for that, my idea is to get a VPS, to extend that VPS, and to handle everything there to remove that crawling limit. So let Googlebot see and remove them faster. And later, can be month or two or three later, you can shrink down that VPS things to be normal, or even move to the shared hosting again, if possible. I think that also makes sense. I mean, if you can do that, that cleans it up a little bit faster than if you slow things down. You can also slow things down in Search Console, if you really need to, with the crawl rate sitting there. OK, maybe one more follow-up question. Sure. Would it be a good idea to, because as I said, the current site is only a couple hundred URLs, would it be a good idea to use the URL removal tool and just temporarily hide the entire website, to just use slash at the end, and temporarily hide everything, and then start from a fresh index, so to speak. So the index, everything. It doesn't remove it from the index. It just hides it in the search results. So that wouldn't fix anything. OK. Another way, just to extend it, is if site is very small, for example, I have a client with 10 pages, that adds a few million pages. So if site is very small, but is built on WordPress, or WooCommerce, or things like that, maybe it's a good idea to make a static website, to put it on the server, disable all PHP processing, or Node.js, or whatever it is in the background, and let Googlebot access that static files, because this will make zero CPU usage to the hosting, and you can clean it, you can clean up much faster. OK, but this is for small or medium sites, where you have, for example, a thousand of pages, that's OK. But if you have a million pages, that is virtually impossible at large scale. But for small and medium sites, it's OK. OK. Let's work around. Let's work around. Let's Googlebot at that specific IP of the website for the particular time saying, OK, this is hacked. Is there any ranking issue with that? I mean, we try to recognize when it's hacked content, and we'll flag that in search. But it kind of depends on what you're actually seeing there. If you're seeing that it's flagged as hacked, then you can go through the reconsideration flow, which kind of helps with that too. But it sounds like we just know that there used to be hacked content there and tried to crawl as much as possible. But then the rankings will flow back to where it was, right? Well, if you're taking over an existing domain, then you would have a new website there. It wouldn't be like ranking like before, right? It's different content, so you would have different signals and thus different rankings. No, I mean, if you were in the same industry, for instance, if you were in one specific industry and the site did not change. Well, if it literally did not change, then the content, because being in the same industry does not mean that the content is the same, right? If literally nothing changes when you take over that website, then sure, maybe, yeah. Even then, I mean, even then other websites can pop up, and ranking is volatile. Well, I am even commenting on this. It's ranking, I know. OK, thanks, guys. Sure. All right, any other live questions before I jump into the submitted ones? Hi, John. I have a question. OK. Hi, my name is Matt. I work for a well-known image and multimedia brand with hundreds of millions of images in our catalog. We also have tens of thousands of contributors to the site that add their own tags for those images. And the site automatically creates a combination of the category of tags, which has unfortunately made for some difficult pairings. In the last year, the site has become much more well known for not-safe-for-work content adult terminologies because of these pairings. And it's ranking the top three results with a pretty good click-through rate. My general question is, does a site's high rankings and relevancy in a category of terminology that's not safe for work precluded from ranking well for other more innocent or professional terms? It doesn't prevent it completely, but what could happen is that we would see it as something that would need to be under the safe search filter, which means if someone has safe search turned on, it might be that we wouldn't show it. So that's something where usually our recommendation is if you have adult-oriented content and general content, then separate those out as cleanly as possible by directories or subdomains so that we can clearly identify this part of the site should be under a safe search, and this part of the site doesn't need to be under safe search. And then it's a lot easier for us to make sure that we're showing it at the right time. Otherwise, if it's a mix of different things on one site, then it could happen that we think the whole site needs to be filtered with safe search. And then it's not so much a ranking issue. It's more that, well, it's not visible to people who have safe search activated. All right, thank you so much. Sure. All right. Can I type in? Sure. Am I allowed to ask ranking questions? Sure, we can try. If a site is about books and then expands to CDs, would it have equal good chances of ranking in the new vertical given the same quality of content? And would the first category say books suffer from the expansion in verticals? Not necessarily. I think that seems like a fine move. And that's kind of like a natural evolution of a website anyway. Like you grow, you have more different types of content. That seems completely fine. Thank you. Sure. All right, I'll jump into some of the submitted ones. And if you all have any comments or questions along the way, just feel free to jump in. The first one is about international sites. All of our English markets, market products, and recipe pages are being canonicalized to their global English market equivalents. Regardless of hreflang and Search Console targeting for subfolders, local markets, Search Console show, Duplicate Google chose another URL as canonical pointing to the English global version. We believe this is due to duplicate content. So we ran some tests between the global English and the English UK markets. And only a full copy change was effective. As there are limited ways to fully rewrite the same product and recipe content, what would you recommend to solve this? How widespread do you see this canonicalization effect on the web? You've probably seen this in the forums a few times. Yes. That question is asked multiple times. So one of the best solutions is to try to check, recheck the things because many times we have seen very specific, very small issues in implementation, but they totally broke your hreflang issues and make strange weird things like that. So even if you have seen this, check, recheck, and then check another time. That is because most of the time something is burned into a lot of code, a lot of pieces of things. And something is left, like probably wrong canonicals or redirects or hreflang for Australian language is redirect to South Africa or things like that. And this, if Googlebot is seeing this, Googlebot then start to not trust to the hreflang things. And we have seen weird things that can be explained reasonably in some easy way. Yeah. Yeah, I've experienced that as well. Keeping stuff in order is the easy answer here. Yes. I think with hreflang it's really tricky because it feels like something easy to connect. But in practice, it really gets complicated quickly. And I'm sorry to interrupt you. And one of the best answers for us is try to make hreflang things in different domains because if you have seen a country-specific TOD, Googlebot know that, for example, this is for German people, this is for Switzerland people, this is for Australian people. And he know that. But if you have put everything into same domain, most of the time, where things happens inside, where is possible? I think the different domains help, especially for geo-targeting, that makes sense. One of the tricky things with, I think, this question is that when we see the same content, like you have English for UK and English for Australia and it's really the same content, then we think these are duplicates and we try to help the webmaster by folding them together. And what happens then is in the reporting in Search Console, we only show one version. And in the search results, if hreflang is set up properly, like you mentioned, it's sometimes tricky, then we will show the different URL in the search results. So we will show the different versions, but we will report on just one version. And that makes it really kind of confusing sometimes. But that's kind of, from our point of view, that's expected and not necessarily a problem, like you wouldn't rank lower. It's just very confusing in Search Console. Exactly. So, John, so when we've tried this sort of thing, regardless of speed, backlinks, correct hreflang and correct hreflang, it doesn't seem to matter. So when we see this as a business, if you were operating from a supply chain model and you were being folded up algorithmically from the site from Googlebot, and can you imagine the cost of that to a business if, because Googlebot might not know what we're trying to achieve? So you're saying, okay, maybe try sub-domains, but even with Search Console targeting all the hreflang is designed to bring people to the right site. So where we're saying, and plenty of sites around the web have English content for different locales. So how do you know when to roll up that site if it's duplicate content? Because if you're on a supply chain basis, that would affect the business. And if there's no way out of that other than fully rewriting content, what do you do? So it shouldn't affect the search results. So in the search results, we would show the different language versions, like the UK and the Australian versions. It's just in the reporting in Search Console. We would only report on the kind of old data, which would be like one of those. So in practice, the traffic you get is still the same, like the rankings are the same. It's just the reporting in Search Console is focused just on one of those versions. So like the reporting side is the confusing part. And at the moment, I don't really have an answer for how you can separate those things out properly from a reporting point of view. Because the indexing report will be on the canonical URL, which will be one of those versions. And the performance report is on the canonical URL. In the performance report, you can separate by country, which gives you kind of that information again, but it's still on the canonical URL, not on the individual country URLs that are actually shown in the search results. Yeah, okay. It's just very, very hard to get out of that situation. Because I can't either say to the business, okay, you're going to fully rewrite all of the content across the site, but it's the same situation if all that work, how much of that works? Well, again, it's a report, isn't it? It's a reporting problem, not a performance problem. Right, so we're saying there shouldn't be traffic loss as a result of this duplication, this folding up. Exactly, yeah. Right, okay. Thanks. Sure. But it's, I mean, in English, it's pretty common. In German, it's really common. Oh, we have it in German all the time. So we have it. It's something where we constantly need your feedback in people complaining about these kind of things so that we can bring it back to the team. It would be good to see something like that in Search Console as more of a clearer report that says, yes, we're folding these up because of this. You know what I mean? It's not too easy when you go there just to see a canonical report. Right, I know what that means, but like, how do I then explain to the business this is gonna happen because Google sees your content as duplicate? Yeah, yeah. Yeah, and explaining the difference between the reporting side and what actually happens in Search is really tricky sometimes. Okay. People should just have different languages, but... Yeah, so I just like swore. I was hoping it was a magic bullet. Okay. Okay, another international question. Suppose I have a website domain.com, which is set to USA using the geotargeting tool. Can I start domain.com slash India and set it to India? The short answer is yes. You can definitely do that. That's fine with a generic top-level domain. You can set subdirectories or subdomains to different countries however you want. So that's kind of an easy one. Nice. You don't have so many of those. No hand-waving needed. Okay, now we have a complicated one. It seems as if Google still has major problems with subdomain leasing. So it goes into kind of the different things where basically things like product review or coupon sites are hosted within an existing domain. And it's like, why can't Google get its act together and fix this? I don't know. I'm happy to pass this feedback on to your team. I know they've been working on this for a while to try to improve this. In general, the other side that I see here is also that a lot of these more established sites when you start adding more and more unrelated content to your site, you're kind of making it hard for search engines to understand what your site is about. So if you take a news website and you add a big subdomain or subdirectory for coupons or product reviews or whatever, then suddenly it becomes a lot harder for us to understand that this is actually a news website. And then for your core content, it's a lot harder for us to say, well, this is something that we should show prominently in rank and kind of crawl quickly because it's news content because we have this mix of different things in there. So it's on the one hand something that I think we need to improve on to better understand the differences within the site, but also where sites kind of are playing with the fire a little bit in that it can result in us having trouble understanding their core business, which is something they probably want to avoid. So that's kind of my answer there. The question also went into, well, Bing said they will apply manual actions and penalties to the sites and I have no idea what Bing is actually doing. I know they did this blog post about it. So it's hard for me to compare like what other people are doing there. Can I have a follow up question? So would you recommend instead of having instead of a subdomain with the coupons creating a new website, a new domain totally for that instead, that would be more better for everyone, right? I think that's usually the approach that we would take and it's something where if you want to host kind of other content on your site for monetization reasons then I would try to make it as clear as possible that it's really separate and maybe even consider something as drastic as blocking it from crawling or indexing. So that the content is there, you can refer to it if you want but it's not going to get in the way of your normal search results. Yeah, okay, thank you. But I suspect people have different experiences with this so I'm sure it will be tricky but I know that the team is looking into this and as we get feedback around these kind of issues then that's something that will definitely review more. You would say that maybe a subdomain isn't really the main domain and then hence it would be something different. It would be not seen as, but it is. Yeah, I think the general idea. I love that. Yeah, I think the general idea is that people try to fold it into the main site so that they feel like, well, if this is some big website and a part of it is suddenly coupons and maybe those coupons are really important too. And you're kind of tricky, playing tricks there in that it's harder for us to separate these two and that means that it'll also be much harder for us to recognize that this is really important content and this is kind of just like used for monetization, not really what you want to be known for. Like if you're a big news website, you don't want to be known for your coupons, you want to be known for your news. So like a holiday special, whatever, like coupons, whatever, if it's a brand, just robots.txt, that page. I mean, that's an option. I mean, that's kind of like an extreme option if you were hoping for search traffic because search traffic won't happen then, but if you want to host something on your site that you want to use for monetization but you don't care for it to be indexed, then that's definitely an approach. Okay. But I mean, this is something where if we get more feedback and hear more from people, then we'll try to find ways to make that a little bit better on our side as well to really understand these differences. Yeah, but this make another problem. For example, if I have a site for medical news, for example, and suddenly some content for car repairs or things like that is stuffing into the main site, this is making that problem. For example, Googlebot can't understand what is the main content for that website and why two are interliving and it's tricky. I think it will continue to be tricky. So having specific feedback on issues that you're seeing around this is always useful. Sometimes it's very theoretical in that, well, other sites are doing well like this, it's like, should I do it as well? It's very hard to kind of use that as an argument with the engineering team internally. But if you're really seeing issues where we're showing things in the wrong way for specific search results, that's always really useful to have. Okay, something completely different. A few weeks ago, we set a large number of pages to 404. Over the last few weeks, the number of pages visible in Search Console has decreased continuously. Currently, this number is stagnating at quite a high number with a slightly increasing trend. That doesn't fit together. What could be the reason? Any guesses? For me, probably Googlebot have seen that 404 arising and it suddenly started to crawl more and more just to see what part of content is staying there and what kind of content, what pages for example, has been removed. So this is probably tricky if you have to removal at large scale. That's why probably it's good to remove the content in small batch of pieces just to check how it's going on. But if you have removed, for example, half of sites suddenly, this can went to where age situation in bot. There's also the sheer statistics of it if you have, I don't know, a thousand pages and you have just removed 200 of them and in the first crawl that we're gonna do after the removal, we're probably gonna hit more of these removed pages so they're gonna fall out of the index after a while and then eventually we're hitting less and less because we're just maybe crawling other parts of the site depending on your crawl budget. And then as you said, it depends on if we think that the structure of the site has fundamentally changed, then we have to crawl a lot more anyways and then we might spend more time on pages that are no longer. As in like on pages that you haven't removed, we might spend our time there as well and eventually it's just, we remove a lot in one place maybe and then like eventually we have less likelihood of hitting the remaining bits as we crawl the entire site and eventually they're gonna drop off as well. And if you set them to 404, it doesn't matter that much. They might still show up but eventually they're just gonna drop off. There's not that much that you need to take care of. Oh, yeah. Can I come with them? Sure. So what's the difference between a 404 and a 410 which is the bad? 40410 is like a nuke that page. It's like throwing a nuclear bomb against it. Googlebot, when I see 404, it's regularly returned to that page to see maybe that... Exactly, it comes back. It's coming back. But if you throw 401, it's crawl a less and less more that page and on my tests, after first crawling, that page is gone. It's totally gone. So if you want to remove a full section, is it better to put a 410 on the pages than a 404? In practice, the difference is relatively small but it's basically what you're doing is, as Peter explained, you're giving us a different signal. 404 means not found, 410 means gone which means I made this go away versus I don't know what you're trying to do here. So you give us a more clear signal but in practice unfortunately a bunch of people are giving us incorrect signals when they do not want to. Like they give us a 410 when what they meant was a 404 or the other way around. So there is like, it's not that, it's not like we literally just gonna ban hammer this but it is a hint for us to not do that and you might see better results with 410 but it's not a guarantee. Okay. And there is another problem that's Martin probably explain it better. What if you throw 401, 4010 to some page and after two months you change your mind I want to return content to the same URL. Hammer. It's banned from the things and say it's harder Googlebot to get that crawled and later to get ranking. So 410 is very, very, very specific. I mean, you can always submit it for indexing again. So that's not a problem, but... Yes, but if Googlebot doesn't crawl that over and over just like 404. It wouldn't return itself automatically but yeah, like you can always, it's not that we're like marking it as do not ever do this again but like we're just not automatically crawling much more. So you can always bring it back. It's just more effort with a 410 maybe. So my opinion is controversial in that I think it's overrated the difference. I think in the past, I mean theoretically like you're right, we remove it a little bit faster with a 410 but I think in practice it doesn't matter. Because we crawl pages on such a different schedule over time that if we have to crawl it twice then it's still going to drop out anyway. It's not like you remove a half of the site and the half of the site is gone tomorrow but it'll be removed over the next half year. And if you're talking about the next half year then if something is removed within two weeks or within three weeks, it doesn't really matter that much. And from that point of view, I think us kind of focusing on the difference between 410 and 404 is something where it's very easy for folks to get kind of stuck on or hung up on the specifics. We're theoretically sure it's better but it's not going to break something. So I've sometimes seen SEOs go and complain that talking to developers, they use a 404 instead of a 410 and it's like, our site is doomed, they're stupid, why don't they understand things? And in practice, it's easy to spend a lot of time and energy on this small difference where in practice the difference is maybe I don't know, a couple of weeks over the course of a couple of months where I was like, it hasn't really changed that much but it's bigger for somebody. I use it more to sound clever, but. Yeah, I think there are a lot of these things around SEOs where technically it's like, sure, that's the correct way to do it. But in practice, there's so many other things that play a role that's like doing it technically correct. If you can do it with the same amount of work, fine, go for it. But if there's a significant more amount of work involved in doing it technically the correct way and the end result is the same, then you have to kind of make a choice there and say, well, I can't personally do it, I have to hire developers to make it a 410 instead of a 404 and it's a lot of work and a lot of time also that you spend in reviewing their work and making sure it's a 410 that you could be spending focusing on other things within a website. So that's kind of tricky in that sense. I mean, SEO is to some extent, very much based on all of these small things adding up. So I understand that too. Okay. Yeah, you have a question? No, thank you. Okay, cool. We have more subliminal questions or do we have more live questions? No live question. Okay, fine. I have more. I've got one potentially, if that's okay. Okay, go for it. More JavaScript, I guess. What's the easiest way to look at a shadow DOM if you were to review a site and go, okay, I saw you rendering talk on a tech boost. That was really good. Thank you. I just need to know what's the best way to look at a shadow DOM and compare it to what Chrome sees or something like that. Do you know what I mean? See if there's any differences that could be happening. Yes, so there's two ways of doing that, of inspecting the shadow DOM. The easiest way is within the developer tools. If you go inspect, you see the element. So basically shadow DOM is usually used with web components that have their own HTML tag. You find these because they have to have a dash in the name. So normal HTML tags like, I don't know, header, footer, H1, P, whatever, they do not have a dash in the name. But if it's like content dash accordion or content dash carousel, and that's a very good indicator that this is actually a web component. And then you can right click Inspect. And then within the inspector, you see like a hash shadow root. And you can expand that. And pretty much whatever is in there is what the shadow DOM contains, which is also what you probably see in Chrome. As in like you see the site, you know what you see there. If you want to inspect it the way that Googlebot sees, then you can use any of the testing tools in like the mobile friendly test, the live test in the search console, the rich results test. And on the other of these, you see the rendered HTML. And within the rendered HTML, you see what we would be seeing. So we are flattening whatever is in the shadow DOM. So if you use techniques that actually copy the content from light DOM into shadow DOM, you would see that. But you would also see the shadow DOM content. But you don't see where the shadow border would be. OK. Did that make that was very technical and very hard to explain without a visual, but does that make sense? Yeah, is there some sort of, I guess, I know it's kind of hard to say. But if you had to quickly look at a page, is there a plugin you might recommend to help break that down easier to see the comparison or do you recommend there's nothing you can't do like that? So what's exactly the goal you're trying to accomplish with this? Just to see, sort of simulate the kind of web rendering view of the page. So we have got the shadow DOM and the DOM. You kind of want to say, well, we also need to be mindful of that shadow DOM. So I'm just saying, is there a way to say it? Normally, it doesn't matter. And if you want to see that, again, use the mobile-friendly test or the rich results test because there you see the rendered DOM, including what was originally in a shadow DOM. So if you see content not showing up in the rendered view, then you can use inspect in Chrome to see if there might be a shadow DOM involved. OK, cool, thanks. You're welcome. Good question, thank you. Cool, OK. Now we have more controversial questions. Oh, no. If a page is marked as no index, but also as a canonical to an indexable page, other than being a mixed and kind of pointless signal, could there be a risk to the no index being transferred to the target of the canonical? That's a Dave question, isn't it? No, I mean, no is the answer. No is the answer, yeah, no is the answer. But yes is a Dave question. That's a Dave question. Good question, Dave. How did you know? It sounded like one. In the reverse logic, if a page has a canonical link pointing to a URL, oh, no, it's cutting it off. Let me refresh the page. Yeah, it's standard, yeah. Yeah, I will find it. So if there's a canonical link pointing to a URL that is no index, would that be considered no index too? Probably, probably. We would think that it's kind of like a redirect to a no index page, and we would drop that page. Nice. Can I just ask as well? I assume if you've got a 404 page, it doesn't really matter if it's any canonical or anything on that, because it just be ignored. It just doesn't really see past that 404 header. Yeah, yeah. When we see a 404, we ignore the whole content. Yeah, brilliant. Thank you. So that's something that I recently ran into someone who had staging sites get indexed, and the whole staging site would be returning 404 and actually showing normal content to users. And if we start to index that staging site instead of their existing site, then suddenly everything would disappear. Does Google consume structured data better if it's coded in a graph v separate snippets of code in JSON-LD? I don't know what a graph v is. I'm not sure about that. You've never heard that before. I guess that's a no. Do we have that person in the Hangout? Do you want to clarify the question if you are here? Paul. Maybe not. No. OK, fine. Maybe you can clarify for the next Hangout and then we can get to that. I'd like to know if duplicated pages, alternate page with proper canonical tag, are marked excluded in the coverage report? Don't know. Off-hand. And at the same time, those pages still appear in the search results. Oh, OK. Like m.pages or AMP pages, does that mean they're excluded or are they not excluded? So what happens here, similar to the hreflang question in the beginning, is we will index the primary version of the page. And we'll use that for indexing. But we will swap out the URL against the alternate version when we show it in the search results. So it would be excluded for indexing because we index the primary version. But we would know to swap out the URL as appropriate in the search results. So it's confusing when you look at the search results. But we try to do the right thing by just indexing the main version. So yeah, I suspect that looks weird in Search Console. In mobile usability, if the content is too small to read, but it isn't the main content of the page, for example, terms and conditions, Google Search Console will throw errors. So will that impact the ranking of the page? I don't know what threshold we would use there. So I imagine in Search Console we would flag that because technically these pages are not mobile friendly. So they'd be flagged in mobile usability report. And it might be something that we would kind of not show as highly in the search results because we think the pages themselves are actually not mobile friendly. So that's something where usually I'd recommend if you have content on a page that you want to use on a page, then make it readable, or just don't put it on a page if you don't want it readable. OK, I'm just going to continue here until someone stops me. OK. OK. Yes. OK. So I have this group of websites. They are all hosted on the same server. I mean, not server, but hosting company. And from time to time I get some warnings of those security warnings saying that my content has been hacked. I tried going really, really deep into those. Tried looking at server logs and everything else. And I even tried using the mobile friendly testing to see exactly what Google was seeing on the page. But it's still the same. I copied the source code, diffed them, and beside some minor changes like variable names and stuff, they were all the same. So I'm thinking that maybe our anti-bot protection system is maybe flagging the Google bot or Google security system as a bot. And then it's showing a capture page to it. So maybe then the content will be really different to the security program than for the actual Google bot. But I can't debug that, or I don't have anything. I don't have a source code that Google is seeing, so I can't tell what's happening there. What exactly is the message that you're seeing? Content injection. Content injection? OK. I don't know. Have you seen that? No. I have some ideas. OK. Have you audited the JavaScripts that you are using instead of potential third-party content that you don't have control over on the page? There are content like third-party Cretio stuff kind of like that. Third-party, excuse me, what was that? Third-party, what's that? Third-party, I'm missing the name in English. Advertising platforms? OK, ads. OK. Clouds actually platforms just tracking. I thought that happened to one of my Google News websites. And basically, it was a plugin issue that is hosted on WordPress? Nope. No, OK. Most of the time it's something that it's basically it's a code that you need to kind of look for and remove it. I would if I knew. I mean, I didn't do everything. I did the source. I did the DOM. Have you tried looking at the source in one of our testing tools? Yes, mobile-friendly testing tools. Right, and there's nothing fishy in there either. Nope. So I think what could be happening, if you're pretty sure that it's really not hacked, the capture page would not be the issue. That's kind of just first off. So even if you were to show a capture page to Googlebot from time to time, that would not be flagged as hacked or content injection. What could be happening is that we're getting confused by some of the content on your pages. So for example, if you have, I don't know, like in extreme case, you have an e-commerce site and suddenly you have one page with pharmaceuticals on it, then our systems might think, well, pharmaceuticals is something that is often used by hackers to inject into websites. And this website looks like it's not a pharmaceutical site. So maybe someone is injecting pharmaceutical content into these pages. But it could be the case that you're actually displaying this content purposely, that you're saying, well, it's like we have this special offer in, I don't know, maybe something that isn't even pharmaceuticals, I don't know, you have a t-shirt with, I don't know, Viagra branding or something crazy, where you have those keywords on those pages. And when we crawl and index the page, we think, oh, like all of these pharmaceutical keywords on a t-shirt site, it sounds like it's hacked. And then that might be something that we would flag. And usually what happens there is as we reprocess those pages, we realize, oh, it's fine. And the warning goes away on its own. So if you're really sure that there's nothing hacked on your site, that the JavaScripts are OK, that nothing is being injected there, and you have some content that kind of touches upon areas which are sometimes used by spammers or hackers, then I would assume that it's going in that direction. One thing I would do there is maybe bring this up in the Webmaster Help forums so that people can escalate those specific examples from your site to the team. And we can take a look to see if there's something we need to handle better algorithmically. Because sometimes we flag things incorrectly like this. It can get a bit confusing. And it just helps us to improve our algorithms if we have more examples of where we get it wrong. OK, I'll do that. But just to clarify, are there any plans on showing us the actual source code or what Google is seeing there? I think we try to do that. And in some cases, we can do that specifically for malware, like hacked for malware. We try to show the snippet of the page where we found out. But for some kinds of issues where just like, this doesn't feel right kind of thing, we don't have anything specific to pinpoint. OK, thank you. Sure. Let me just double check some of the next questions. And we can run through those quickly. So I'm planning to replace my WordPress site with a normal HTML-PHP site. So WordPress is PHP, I don't know. What should we do so that we don't lose any rankings and visibility? There basically is no answer for this. If you're changing your website, you're changing your website. And we'll reflect that in search. That includes things like the backend and the UI, the internal linking, any of that that can theoretically affect the crawling and indexing of your site. It can also be a positive thing. So you can also improve your site by replacing it. There's an undesirable site link showing up for the Google search for Facebook support. How can we tackle that and get rid of it or fix it? So site links are algorithmically generated. There's no manual way to tweak things up and down and to say, I don't want this and I do want that. So there are essentially two things. On the one hand, try to see if this is really something that the majority of people are seeing, that if it's really a problem or not. Sometimes it's just something that you see with personalized results because you always go there. And the other thing is if it's really a big problem that this page is being shown at all in the search results, then probably you'd need to know index that so that the page really drops out completely because it's essentially a normal organic search result. Other approaches you can do are things like changing the title of the page to make it clear what it should be about. And then, wow, so many more questions. But we're like so short on time. Maybe I can just open up for more questions from you all. Any of you? Hi, guys. Hi. So quick one. Well, I hope so. So let's assume we have a very big e-commerce website, let's say, I know 500,000 products, something like that. Of course, a lot of categories holding those products. And assuming that most categories or assuming every link to a given product has a tracking parameter added to it, so for example, they want to see what position was the product within the category when it was clicked. So it has a P equals a number or something like that. And all of these product URLs do have a canonical tag to the product without those parameters attached, so they're canonicalized correctly. However, there's nowhere on the website a way for Google to actually get to those canonical version of URLs directly, only through the site map. What would be the biggest downside of doing that? And I assume one of that would be the crawl budget because we're kind of forcing Googlebot to continuously go over those URLs, even if they're canonicalizing elsewhere. Yeah, I think crawling is one thing. The other thing is maybe we will index the other versions anyway. And also canonicalization between that page is also can be like an issue. So maybe the answer for me is to not showing that to the Googlebot that URL parameter for the product URLs. It's kind of like cloaking, I guess. Yeah, I was thinking of using the hash instead of the Chrome method to think of JavaScript calls. OK, I mean, one thing you could do is use hash URLs for those keys. Maybe that would work. That could kind of make it so that the key is still passed, but it would need to be processed in JavaScript. And from there, you can do something with it. The other thing, I guess, is the URL parameter handling tool. You could work with that to just say ignore these. The other is kind of like if you leave it to canonicalization and you're OK with sometimes the decision going the wrong way, then try it out. But kind of keep in mind that it could happen that we canonical to those kind of keyed URLs instead of your preferred ones. OK, my solution is, for example, if you have categories, you know, on Google AI or previous years, there was an excellent video about that, that Googlebot track only URL with anchor. But if you make that anchor is visible, but you make a superset like a diff that will work on on click, Googlebot won't see that on click URL and won't crawl them. Oh, that's fancy. OK. Yeah. OK. Just out of curiosity, if you use the URL parameter tool to force Google to ignore, well, not ignore. I mean, I guess we would choose that the content isn't changed as an option. Is it the same as like using a hash symbol for parameters? Is it basically the same thing? Would it achieve the same effect? Yeah, pretty similar. So we would still occasionally look at those URLs to double check that it's OK, but probably we would just follow that, yeah. OK. And since that wouldn't involve any development work, that would be like the ZS parameter we have to take. OK, awesome. Thanks. Cool. OK. So I think we have to get out of this room soon. So I'd just like to thank all of you, all of you who came here in person, all of you who joined virtually. Thank you all for coming. Thanks for submitting so many questions. And I hope to see you all again in the future sometimes. I guess some of you tomorrow. Yeah. Yeah. Cool. All right. Thanks, everyone. Bye. And I think we have to stop recording.