 All right. Welcome, everyone, to today's Webmaster Central Office Hour Hangouts. My name is John Mueller. I'm a Webmaster Trends Analyst here at Google in Switzerland. And part of what we do are these Office Hour Hangouts, where webmasters, SEOs, publishers can join in and ask any kind of website search-related questions that might have come up. Looks like a handful of people are here already, but I'm sure we'll see more over time. A bunch of questions were submitted already on YouTube, so we can go through some of those. But like always, if any of you want to get started with the first question, you're welcome to jump in now. Or if not. Hi, John. Go for it. Hi, thanks. Yeah. Hi, Martin, great to see you as well. John, I have a quick question about cloaking, basically. And the context is that we want to add tracking parameters to our internal link structure. And like it's always said that adding too much might not be good because it delays crawling, and it might be not good for a million URLs website, which we have. So the data team has suggested that when someone clicks a URL, they add it on the go, and they block the bots from accessing those parameters by identifying user agents. So is that a permissible technique? Because the content remains the same. Just add the tracking parameters added for the users and not the bots. In general, I mean, it's permissible. Like, you can do this. I think it's probably not a great practice. But in general, it's doable. It's essentially similar to the old session IDs that used to be used in the old days, where per session you would add parameters to the links on a page. And then when a user clicks on those links, those session parameters would be passed on so that you can track that a little bit better. In general, for crawling, that's something that makes sense to suppress so that search engines don't run across those. What I would not do is block them by robots text, but instead maybe use a rel canonical on those pages to point out the canonical version or maybe even redirect to the canonical version if you can do that. In general, I try to avoid doing this kind of tracking through the URL itself, because it makes everything a little bit messier. Because then when you look at your log files, it's a lot harder to tell which are the pages that actually get the most traffic, how do users in general navigate around my website. Because you always have these kind of tracking parameters attached to the URLs depending on the user, depending on how they came in. So I think it's probably suboptimal to do it like that. It's not illegal in any sense or against the Webmaster guidelines or anything like that. It's just, I'd say, not a great practice. Thank you so much. And I hope everyone is safe at business. Thanks. Cool. Any other questions before we jump in with the rest? Hey, John, I have a question. Hi. Yeah, so my question is just that I want to clarify how keyword cannibalization actually work. Is the whole concept of keyword cannibalization is just referring to a concept to describe. Is counterproductive to target the same keyword with two-page, or is actually an algorithmistic penalty for two-page target the same keyword? So for example, if I rank for top four for a page, if I publish another page targeting the same keyword, will it drag down the top four ranking? Or is just counterproductive to publish another page to target that? Yeah. So there is no penalty for doing that. It's not that there is any kind of manual action or any algorithmic action that would say, oh, you have two pages. Therefore, that can't be as good. Most of the time, the issue just comes up in that you tend to have two pages that are somewhat kind of midterms with regards to how good they are, how strong they are. And if the alternative is to have one page that is much stronger that could potentially rank better than any of those pages individually, then you're kind of trading off having two pages shown in search, but they're a little bit lower in rankings versus one page that's shown a little bit more visibly in search. So in that kind of a trade-off situation, often you will prefer to rank a little bit higher just because you're a little bit more visible in that case. There are many cases where essentially there would be no change in ranking, where maybe the top results are so strong that there is no chance for you to kind of jump in there by combining your pages, or maybe you're already ranking number one and number two, or number one and number three, then it's not that you would be ranking better than number one if you just had one page. So from that point of view, it's always something where I think it's a good idea to look at this, but also to keep in mind the context and to think about what the alternative would be. So don't just blindly see it as something that is bad and you need to fix, but rather think about what would the alternative be? Would it be better for my site if I had one page? Or does it not change anything at all, perhaps? I see. Thank you so much. Sure. All right. Any other questions before we jump on in with the rest? I have another question if you don't mind. All right. Go for it. So is it a bad idea that we always use non-canonical URL in the internal link? So for example, especially when we implement an internal link, it's always have a parameter that and that parameter canonicalized to the one without. And I see a lot of websites doing this kind of link throughout the site. Would they hurt their page rank flow? It doesn't hurt the page rank flow or the page rank of the pages in that sense, because we see those multiple URLs and we see they're the same content. They have a rel canonical, so we treat them as one page. What can happen, however, is that we pick one of these URLs to use as a canonical instead of the one that you have specified as a rel canonical. So that's something which sometimes you'll see these URLs in search. If you do a site query or if you search for them specifically, sometimes you'll see them in Search Console and the reports. And it essentially just makes it a little bit harder for you to kind of keep track of things because you're looking essentially at both of these URLs and saying, well, in my mind, I have to keep in mind that these are the same URLs when actually they're multiple URLs where you could make it easier just by having one single URL. But that's essentially something where from a ranking point of view, it wouldn't change anything. It's really just making it easier for you to understand your site. OK. Also make Google easier to identify which one is canonical if I linked them directly. Yeah, yeah. When we pick a canonical, we use the rel canonical. We use internal and external links. We use redirects, a bunch of other factors as well, to pick a canonical. So if the rel canonical points at one URL and the internal links point at another one, then we're kind of in a conflicted situation. I see. So follow up on that. So in Google Merchant Center, they have a feed, right? Is that a bad idea to use canonicalized link in the Google Merchant Center link? Or should we use the actual product page that is canonical in the Google Merchant Center feed? I don't know for sure. I need to kind of check out all of these details with the Merchant Center team. Because they've kind of opened things up and gone towards the direction of kind of more organic search results, at least in the US. But I don't know what their recommendations are in detail. So I need to double check that everything is the same. I see. So if I have a question specifically for Google Merchant Center, what is a canonical resource that I can go to? Is that a person or a group that I can get over a date of insert? I don't know. I don't know at the moment. I need to look that up. It might be that as a first step that maybe the Google Ads team would be a good guess, because that's kind of where the Merchant Center so far has been located. OK, thank you so much. Sure. John, can I follow up a bit on that last question regarding internal links and canonicals? Sure. So let's say that you have this internal linking where all links point to those pages that also canonicalize to other pages, but you figure out canonicals and respect them, and everything is OK from that point of view. Do they still use crawl budget? Is this fixing that and putting the actual canonical version would help with the crawl budget, or that doesn't make any difference? You still need to crawl those non-canonical versions every now and then, because they're within the internal linking architecture, or do you kind of figure out, oh, I know what this is about. I won't really bother crawling that page. We do still look at them from time to time, but it's not as often as we would crawl normal URLs. So usually what happens when we have a set of URLs and we pick one URL as canonical for that set, then we will mostly focus on that single URL and crawl that one primarily. We'll still occasionally look at the other ones, but it's not nearly as much as the normal crawling would. So when it comes to crawl budget, it's something where initially we would look at all of those different URLs. So if you went to a big site and you started adding session parameters again to all URLs, then initially we would get lost with crawling. But fairly quickly, we would figure out, OK, these are the canonical URLs. These are non-canonical URLs. We will focus on the canonical ones, and that should more or less still work out. But do internal links play any role here? So if you have internal links or not to those non-canonical URLs, does that affect how often you'll re-crawl, even if rarely, those versions? The difference between canonical URLs is we will focus on the canonical URLs. But if it's 50-50, we choose these or we choose those, and it's possible that we will re-crawl those non-canonical URLs a little bit more often than URLs that we think are completely irrelevant. OK, so is that the case with redirects as well? So if you're internally linking to a page that always redirects to something else, does the internal link, should you replace that with the final version, or you'll kind of manage it on your own? And it doesn't matter if you internally link to the redirect page or to the final destination. I imagine you would see something similar. I think in practice, this difference is more theoretical than it is practical. If you have access to the server logs of a bigger site, you can probably find pages like that where you know you're linking to a redirecting page, and you can double check to see what the amount of crawling actually is there. I imagine, for the most part, we just figure this out. And we pick the canonical and we focus the crawling really primarily on that. So I mean, I don't have any numbers to throw out, but I wouldn't be surprised if it's, I don't know, 30 to 1, or some really strong ratio like that, where we'd say we really focus on the canonical URLs. And every now and then, we'll still look at the non-canonical ones, regardless of where they came from. OK, I'm just asking whether you know if you have a very big site, and it's linking to all of these pages that either redirect or canonicalize to something else, whether it's worth the effort going URL by URL, or maybe some automatic way to try to replace everything so it leads directly to the final version, whether that's worth the effort. I think if you have a really large site and you're doing this on a large scale, I would clean that up. Like, let's say, I don't know, if YouTube had redirects from all of the old pages to new ones and the internal links still went to the old pages, that's something I would say is worth cleaning up, because there are just so many of those URLs. And if we keep running into the old ones and keep trying them, then it's just adding such a mass of unnecessary crawling. But if you have a normal-sized website, even kind of like a mid-sized e-commerce site with, I don't know, a couple million URLs, I don't see this playing a practical role. OK, thanks. Sure. All right, let me look at some of the questions that were submitted. And feel free to jump in in between as well. Or if you have more questions or comments on the questions or answer, we can take a look at that. I heard Googlebot sometimes submit forms. If so, could it do it if the form is in an iframe as well? And what if the iframe is hosted on another domain? How would that impact crawling, indexing, and so on? So it's extremely rare that Googlebot would submit a form. It's something where we primarily did this way in the beginning when websites were structured in a way that we could not crawl them properly. In particular, we saw this issue on a lot of government websites where there was a lot of content on the site, but to find it, you had to go to a search form to actually find links to that content. And for sites like that, pretty much the only way to get to the detailed content was to go through the search form. However, for pretty much every modern site, we can crawl normally. And people are used to creating a structure where we can crawl with categories and subcategories where, essentially, we never need to go through any of the forms. So I would imagine most of the people who have sites, who have logs that they can look at, if you look at the server logs and you look at Googlebot, you would probably never see Googlebot submitting any of the forms that you have on the site. So that's something that's really extremely, extremely rare and something usually where when it does happen with a website, it's kind of a sign that we can't crawl normally, where we realize there's a lot of content, but we can't actually find that content at all. So that's something where if you're seeing this happening, I would kind of go down the direction of, like, what am I doing wrong? What could I be doing differently with regards to my site's navigational structure? That's, I think, the primary aspect there. And with that in mind, adding more complexity, like iframes or other domains, I suspect a lot of that would just not happen, just for practical reasons, because we want to avoid running into a situation where we accidentally enter things like credit card numbers and accidentally Googlebot goes off and buys things or fills out some contact forms with random information. All of that doesn't really make sense. And it causes almost more problems than it helps anyone. So that's something where I imagine if you have a configuration with iframes and other domains, you would probably never see Googlebot go through that. The one thing where sometimes you will see something like this happening is if you have an Ajax-based website or JavaScript-based website in general where you're using some kind of a post request to get data to load on a page, then that's something where Googlebot might be executing that. So if that's a part of your page's rendering, that it does a post request to an API and then it gets some answers and displays those answers, then that's something where when we render the page, we might be doing that. It's not that Googlebot is crawling those post requests for the form data, but just that in the process of rendering your page, if there's a post request, we'll try to do what a normal browser would do. And we might show that. And in a case like that, we follow, I imagine, the normal browser security guidelines. So doing things like cross-domain, I believe, is just a lot harder. I don't know offhand what the defaults are there with modern browsers, but that's something where it's just like you're adding more complexity again. We had two goals in our mind. The first is showing ratings as rich results for our seller pages when someone searches for a seller name plus reviews or ratings. And the second is showing a search box when someone searches for our brand. In February for case one, we implemented organization schema on our seller page. Added organization attributes like brand name, URL, logo, address, reviews, ratings. For case two, we implemented site link search box schema on our home page. The result one, in Search Console enhancement reports, we started seeing logo section, where we have around 250 valid items. But in performance, we haven't gotten any rich results for seller pages yet. And the second, in Search Console section, we started seeing one valid item. But we're not seeing any search box when we type a brand name in Google Search. My boss is upset with me now. My thoughts for the next step for the seller pages are remove all organization schema and implement ratings only. One of our competitors is doing the same. Is there any hidden mistake that I should take care of? Our main focus is the search box. So for reviews, you definitely need to watch out for the guidelines that we have with regards to review markup in our developer documentation. In particular, reviews are only available for a certain set of items that you mark up on a page. So it's not the case that you can take anything and just add reviews. And Google will show those reviews in Search. But rather, we only show them for a certain set of structured data elements. So I double check to make sure that you're following the guidelines there, and that the information that you are marking up matches the policies that we have for reviews. So that's probably the most important part. With regards to the idelink search box, this is kind of a tricky one in something I see people struggle with from time to time. The hard part here is that adding the markup does not make it more likely that a idelink search box will be shown. But rather, if we were to show one, we would use one that's based on your markup. So it's very rare, or I don't know, it feels very rare that we would show a site link search box in general for queries for sites. And only for those cases where we would show it, if you have the markup, we'll try to use the markup. If you don't have it, we'll just use a default setup. So that's something where if you're currently not seeing a site link search box at all, then adding the markup for that will have no effect. A question on the FAQ schema, I've seen in documentation what qualifies an FAQ and whatnot. But for my website, the FAQ was showing for some pages. Don't know how it has disappeared. But in search console, I can still see I'm applicable to the FAQ schema. It's really hard to say what exactly was happening here. So on the one hand, the policies are definitely important. I'd watch out to make sure that you're following the policies. On the other hand, with structured data in general, just because you have marked something up does not guarantee that it's shown in search. So there are various things that come into play here. On the one hand, it has to be valid markup. It has to follow our policies. It has to be a site that's kind of a reasonably high quality so that we can kind of trust it. And all of these things have to do a lot. I mean, it's possible that way. And things like some of the other more visible types of markup, it doesn't make sense for us to show this. But every URL that's shown in the search results, because otherwise everything would just look really messy. So that's something where it's also kind of worth looking at the queries that you're actually targeting and seeing what the results look like there. So that's kind of the direction I would go here. Question about maintaining multiple TLDs that all contain the same English content. If a company has a .com and expanded to multiple TLDs 10 plus years ago using the same content written in English and the content is generic, it doesn't make sense to remove the CCTLDs and fold them together with .com via site migration. Maybe redirect them back to .com. Again, the English content definitely applies to all of those countries. And Google is already choosing the .com as canonical for many of those URLs. Some of the CCTLDs do rank in their country now based on hreflang magic, where Google is choosing .com as canonical and displays the CCTLD in the search results. Maintaining those additional CCTLDs is tough for the company and it doesn't make sense to keep a complete duplicate of the .com sitting on multiple TLDs. So yeah, it sounds like you've already figured out that the CCTLDs aren't as critical for that particular case. So that's something where maybe it does make sense to fold things together and to kind of pick one version as a canonical version. In general, I would use country code versions or just generally local versions when you really have something unique that you want to do on a per country basis. So if there may be policy reasons or legal reasons why you need to have individual country versions or if you have different content depending on the country versions, have separate URLs, but essentially it's all the same content and you just have the different country versions for historical reasons, then it probably makes sense to fold all of those together into one version so that you have one clear, one strong version, makes it easier for tracking as well, makes it a lot easier for maintenance, definitely. So that's something where I think you've already kind of analyzed that all of this is duplication that you don't necessarily need, so I would tend to fold those together. Can I just follow up to that? Sure. Just so basically, if a current brand doesn't want to expand globally and just stay locally and sitting on a .com for 15 years now, is it worth it to go ahead and redirect the .com to the .ca from a TLD purpose? I know you guys count that as immediate geotargeting. There's no need to do that in Search Console, right? So is it worth it to go ahead and if they don't want to expand? I think that's fine, too. So a .ca can be active globally as well. It's not that it's suppressed in other countries. So from that point of view, if you just want to have one version and the .ca is your kind of primary version, then that's perfectly fine, yeah. But the ranking will tank for a little bit and then it'll come back? I think if you're folding things together, then I wouldn't worry so much about the ranking because you're kind of taking one existing site and you're just adding to that. It's not that you're taking two sites and changing them, but you're kind of building up on one existing site. So probably from a ranking point of view, that should be fairly stable. OK, thanks. I have another follow-up question on that. So I have a question about X before in like a multi-lingual site. So for example, there is a lot of website. They actually redirect their root domain to ENUS version. But they put X default to the root domain. So they are basically putting an X default to a page, pointing to a page that being redirect. So does that make sense? I mean, that page is not 200. And I think a lot of time, the root domain actually has more backlink than the ENUS URL. So in this case, is that actually a good idea to just put a root domain as the ENUS alternatives instead of redirecting to ENUS and put ENUS as ENUS alternatives? I think in a case like that, where you have a root URL that's redirecting, depending on the country version or like the country of the user, that's something where we've said in the past that using the X default is fine. Essentially, what you're saying with the X default there is that on the root URL, you have your own logic for handling users of different countries. And if you tell us these are the known country versions, then we will send people there when we recognize they're from there. But if people from other countries come to your website, you're telling us, please send them to my root URL, and I will decide where they should go. And from our point of view, that's perfectly valid decision for you tonight. So from what I hear is that if they're using browser detection, then it's OK to put X default to a root domain that being redirected. Yeah, yeah, definitely. The only thing I would watch out for is the individual language and country versions don't do the browser detection there. So on the root, do the browser detection. But if you send them to the ENUS version, then keep them on the ENUS version even if their browser happens to be in French suddenly. Don't do. Oh, sorry. Another follow-up question is that I heard that it's actually not a good idea to do browser detection from Google's guideline. So should we prevent just not doing it on root domain also? So for example, root domain is just the content of English, right? So should we just redirect it the root domain to ENUS no matter what, or maybe just put root domain as you ENUS instead of redirect it? If we want, we don't have to do that anymore. I think that's up to you. From our point of view, if you have one page where you're doing this kind of browser or country detection, that's fine, as long as we can really access the other version. So that's kind of the tricky part, where sometimes we've seen people on the ENUS version, they also check for browser and location, and then they redirect, which sometimes results in Googlebot finding, for example, the French version. And Googlebot crawls from the US, so it gets redirected to the English version. And then we would never be able to index the French version. But if you only do this kind of detection on the root URL, where there's actually no content, then that's not a problem for us. I see. And that's not. So in this case, we detect a browser when people go to a root domain. And root domain has a very high backlink, right? But it redirects different places according to their browser or IP location. How do the page rank work in this case, then? So it's tricky. Usually what would happen is we would see the redirect from the root URL to the English version. And that would be the one that we would use for kind of page rank calculation. But with hreflang, we can swap out the different URL. So it's something where usually that just works out. It's not that you have to do anything special with regards to page rank in a case like that. OK, thank you so much. Sure. Hi, Mr. Mahler. Hi. I'm Vahan from Search Engine, Germany. So we were experimenting with Google Newsstand app and built their sections of our website by submitting either an RSS or web location. So on certain categories, we don't have an app. And we have to submit their web location. But when I opened a Google News app on the phone, opened the section on the app, I have found that articles are not sorted by the freshness. They are sorted by kind of like relevancy. And the first article I see, I see articles from March. And the most fresh articles, in that case, are pushed like a bottom. But sometimes I see fresh ones first. So is this something that was done intentionally or is it a kind of bug? I have no idea. I don't know how the Google News app deals with those kind of things. Intuitively, I think it's something I mean, I only see this in Discover when I look at Discover. But in there, I feel that's something where we try to understand what the relevance is. And sometimes it's just not the newest stuff is the most relevant. But I have no idea how that's handled on the Google News side. So that's something where it's probably worth checking with the Google News team. I believe in the News Publisher Help Center, there is a contact form that you can use where you should be able to reach out to someone from the Google News team. And maybe they can help you with this. I mean, it's a bit of a tricky question because it's almost like a ranking question. But especially if you find something that looks to you more like a bug rather than kind of like a misunderstanding, then that's something I would let them know. Thank you. Thank you. Sure, Mo. I've seen that as well and have given the News Team feedback about that. It used to also be on desktop. You only had articles on Google News from the last 30 days. And now sometimes they'll be months old. And it's like it's old news sometimes on Google News. When we did use a feed as a section source, it does pull order correctly by the freshness. But when we did switch to a web location, and I want to add to it that we have at least item schema implemented in our category pages. So it states the order of the articles by that schema as well, which can help Google understand the order. So web location kind of was different. Yeah, on the Publisher Center, the web crawl, the web location does not seem to work as reliably as crawling it from a feed. Yeah, if you have generating the articles by the feed, it's much more reliable. Yes, but the question is, the problem there is if we generate non-amp articles from feed, it is going to add inside the news app. But we want traffic to come to a website. And for that, you have to have the AMP. Yeah, we have AMP. But on certain categories, we don't have AMP. So it is a problem. Oh, man, Chris, I think we need to do a special session just with you for all of the new publishers. Here form here. No, no, this is perfect. It's always like people coming in with these kind of news publisher questions, and I have no idea. And you're one of the people that has the most experience in this. So cool to have you. What's the status of Google's decision on the mobile-first indexing after September, any delay due to COVID-19? So far, we're still seeing lots of sites shifting over and getting ready for mobile-first indexing. So we haven't completely made up our mind that we will stick with this date, but we want to give it a little bit more time to settle down and see how it works out. My guess is within the next month or so, we'll make that call and go one way or the other. It's also really useful to have feedback from any of you who are running into issues around this where you're seeing that your development team just isn't available to make these kind of changes or anything around that. The more feedback we have, the easier it is to just make a decision on this. Because otherwise, we just see sites are being improved, and we think, oh, well, people are still doing their good work kind of getting things ready. So maybe we don't need to change the date. But send us feedback. Question about cloaking, the case that we have with the site with millions of URLs is growing, and we want to manage crawl budget efficiently. Oh, I think this is the parameter question that we looked at in the beginning. Yes, sir. That's it. OK. Is there anything more that you have that we should cover in that regard? Yes, sir. So just to make a statement and say that, this would technically be called as cloaking in terms of statements, right? Yeah. I think it's tricky with just that name, because there's so many different ways that a website can be dynamic and subtly be different. For us, what is really problematic when it comes to cloaking is when the content is significantly different. So when, for example, you have a web page about cars, and when Googlebot crawls it, it's a web page about pharmaceuticals, then that's really hard for us. And that's something where the web spam team might get involved. On the other hand, if you're just adding parameters to individual URLs, that's something where we tend not to worry too much about that. But generally, you would probably advise to not do it and have a better view of solving it. Yeah, that's generally what we advise anyway, just because it makes all of the maintenance and debugging so much easier. So it's very common that we run across situations where maybe someone from the indexing team will contact us and say, please contact this website and tell them to stop cloaking. And then they give us details, which basically say they're showing Googlebot something broken and they're showing users something that works and they're shooting themselves in the foot. And they don't realize it because when they look at the website themselves, it looks fine. But when Googlebot looks at it, it looks broken. So that's something where it's very easy to break things and stuff the ways. So if you can stick to one version, it just makes it so much easier. Got it. Thank you so much, John. Hey, John, I have a question that's related to the unknown traffic that we are receiving. I don't know if it's coming from a spam bot, but while checking analytics, the traffic is coming from California. We are based out of India. And also, I've checked the source. It's coming from Facebook. And the average session duration is less than 0.2 seconds. Even I checked the IP log data, but I'm not able to arrive at a solution. So is there any way that I have missed, or can you just suggest me any solution? I don't know. So the IP address comes from Facebook, or the referer comes from Facebook, or what are you seeing? The IP log data that I've taken from our website. We have not checked anything from Facebook, right? OK. I don't know. So in general, if you're seeing something like this where it's not coming from Google crawling, and it looks maybe it looks like bot traffic, or it looks like traffic that is irrelevant for your website in general, then from our point of view, just purely from a search side, you're welcome to block that traffic. Because at least from our point of view, if we feel that normal users, when we send them to your website, they're seeing a normal experience. And if you're blocking traffic that you don't want to have access to your website, that's essentially up to you, between you and those users that you're blocking. And if you feel that these requests are not actually made from users at all, and they don't provide you any value for your website, then maybe that's something that you could just block. Is there any way that I can just block them? Because I'm not able to get the IP location. So how is that I blocked them? That depends on your server and on the setup that you have. So sometimes you can check the server logs. Sometimes that's something where you can just block an IP range directly on the server and say, don't respond to these IP addresses. It might also be the case that your website is set up in a way that you don't easily have access to that information. You can't easily block things. And sometimes a workaround might be to use something like a content delivery network in between, which does give you a little bit more functionality in that regard. So that's something where you need to check it out with the technical people working on your website and work with them to find ways to block that traffic. OK. Hi, John. Hi. I have two questions for you today. So the first is actually about Ajax calls. So I was happy to see that Martin is here. Hi. So not actually a dev. So I might get some of the details a little muddled. But so we have an infinite squirrel implementation that uses an Ajax call to fetch the, it's for a category page of articles. So the Ajax calls to fetch more articles. And the variable for the Ajax call is like a URL, I guess, that doesn't exist. It's not a page, it just responds with a JSON response. And those URLs are being indexed by Google. So they were appearing in Search Console. Initially, we had a bug where when there was nothing to, when there was no data, so for category pages that didn't have enough content, let's say, they were producing a 500 error instead of the JSON response. So that's when we noticed that these were being picked up by Google. So the 500 errors are fixed now, but the URLs are still being indexed. So I don't know what the best way to handle these URLs are, or if we should even do anything to them. Would it be best to just let them be indexed? Because I'm not sure how that affects the rendering of the page. If we let them be indexed, is it the case that then Google can see the additional articles that are being loaded with the call? Is it the case that we should try to prevent those URLs from being indexed and crawled at all for the sake of crawl budget or whatever? Yeah, I'm just not really sure what the best way to handle it is. OK, so I think there are probably those two aspects that are important there. On the one hand, the rendering, which is something where I would not block them in a way that they would block rendering just so that you really have your pages rendered completely. So as much as possible, we can get to all of your content. So don't block those JSON requests with something like a robot's text, because that would prevent us from actually loading the content. That's the one thing. The other thing is with regards to indexing, where one way that you could block those from being indexed is to use an Xrobots tag HTTP header, which is kind of something that you specify in the response on the server side, where you say, this file, here's the content, but please don't index it. That's something that you could do from a technical point. Sorry, if we go that route, will it not no index the main page? No. No, OK. That's what people were talking about. So I was a little unclear about that. Someone said they did that with a robot's no index in the Xrobots header, and it prevented the parent page from being indexed. No, that shouldn't be happening. If you see something like that happening, that would be a bug. That would be something that we'd want to fix. So that definitely shouldn't be happening. The other thing that I think is also worth mentioning here is that probably it doesn't matter if these pages are indexed, because most likely the content in those JSON files will be content that you have in the rendered version of your page already, and the rendered version of your page is a normal HTML page. And we can understand that page really well. So if someone searches for words in that content, we will show your HTML page. We won't show the JSON file. So technically, it'll be indexed. And theoretically, you could do a query and make that appear in search. But for all practical purposes, it doesn't matter at all. So it's something where I'd say you can use the Xrobots tag to block it from being indexed, but it's not critical. It's a nice way to clean things up, but it's not going to make or break anything for your website. And also on crawl budget, it doesn't make a difference on crawl budget, because we have to make the request anyway. So if we do it in indexing it and then put it in the cache, and then when you do the Ajax call, we use it from the cache, it doesn't make a difference. If you would no index it, we would still have to crawl it when you make the Ajax request. So that's why you should not put it in robots.txt, because if you put it in robots.txt and disallow us from crawling, then we can't fetch that content anymore and then we're not seeing the content in the JSON file. So yeah, as long as it's not ranking on anything that you care for, it doesn't make a difference. I have a question on the crawler budget. Can I? Sure. So we are the news publisher, and we are filing 350 stories every day, every 350 stories. And suddenly, some big news came and we filed 500 to 1,000 stories on that day. So after the 350th story, that will crawlable on the same day or not, or what is the best practice for that? Probably it doesn't matter. So yeah, so if you look at your server logs, I don't know your website, but I'm guessing you see that we do multiple thousands of requests every day from your website. So if you have 300 articles, you have 500 articles, that difference is so small compared to those several thousands of pages that we request every day. So from that point of view, I don't think that would make a big difference. I think it would be different if you went from 300 articles on one day to 30,000 articles on the next day. That's something where it's like, OK, there's a significant difference, and then we might have trouble keeping up. But just from a practical point of view, I think that would be hard. OK, I have one more question. We are the news publisher, and we have the brand name. So if people search our brand name in search, then suddenly we see our website news article appearing on top of our brand in Google. The first, if I search my brand name, the news article of third website, which is also publisher, we both are the publisher. So the third party URL, above our brand name. And we are the bigger than that website. So it's very difficult to how we can handle this, because we are the bigger brand. Then we are the double than which publisher ranked above the nurse. OK, so I'm bringing this to you, and you responded. And I'm not feeling, I question over there. So this is the best platform to ask you and your help. Yeah, so I think it would be useful to have some screenshots of what exactly you're seeing, because it's hard for me to understand. But it's also something where what we would not do is make any kind of manual adjustments for those kind of cases. So it's very possible that in some situations, if you search for maybe a company name, that a news article about that company will rank above it. It's certainly possible. I think seeing that regularly or always with different news articles about a company, that would be kind of awkward, especially if it's a reasonable company. But seeing it like once or twice, I think that would be kind of expected. It's started after the COVID-19 coverage. OK, and only for the, it's applicable for only COVID-19 live blog. That publisher published live blog on COVID, and that live blog appear above the nurse. Even on brand name, only we are using brand name, our brand name, her article above the nurse. OK, that sounds like something where it would be useful to get some examples. If you can post maybe some details in the chat or if you can send them to me on Twitter, then I'd love to take a look at that and pass that on. That would be great. Yeah, I would. Cool, thank you. Sure. John, can I ask you a quick question? Sure. What's the difference between no refer and no follow? Because I see that sometimes when I look at different coding of things, and is there one that's preferable in certain circumstances? So they do different things. The no follow is essentially something that we use or certain genes use to not pass signals from one URL to the next one with a link. And the no refer is something that browsers use to block the refer from being shown on the other page. So usually, if you click on a link that goes to a different website, it would say, this user came from that referring website. And the no refer attribute basically just says don't say anything about where you came from. And usually, that makes sense for cases where maybe the URL itself is something that is perhaps private data. Like, for example, you could imagine a social network where the URLs have a long ID. And if you go to that ID directly, you can see the content, even if it's not for you directly. So you might want to have cases like that where someone clicks on a link, not include that kind of URL itself. So that's something where some people choose to do that. I think some CMSs have some defaults as well with regards to how they mark up the different links. I don't know if no refer is one of those things that people just use. There's also no opener, which is another attribute that a lot of CMSs use. I forgot what that one does. I don't know if it prevents it from opening in a new tab or something weird like that. I don't know. OK, thank you. Sure. Hey, I have another question. So this is just a question about the session navigation. So a big website that I purchased, what I did is I blocked all the session navigation in Robo.txt. But I recently read in the Webmaster blog, they say that it's actually not necessarily the best for clicks because sound of higher level of session navigation might be useful to be indexed and also that don't consolidate the authority of those pages. So if I want to change the Robo.txt row right now, there will be a bunch of URLs suddenly being crawlable. Well, how hard it will be to hurt my crawl budget. And should I care about crawl budget since Google is so powerful now the day? And also, should I only let higher level parameter to be crawl or should I just open all of them to be crawl and then canonicalize them? Yeah, so I think with crawl budget in general, most sites don't need to worry about this. It's something where I would say just to give an order of magnitude, if it's less than tens of millions of pages, probably you don't need to worry about crawl budget. If it's more than that, then it's really something where if you're suddenly duplicating or triplicating all of the URLs on your website, then that makes it a lot harder for us to crawl things. But if it's a couple hundred thousand pages and suddenly there are twice as many, then we can usually just deal with that. With regards to faceted navigation, if you were to unblock that in robots text, I think you would see a kind of a rise in the amount of crawling that we would do, at least temporarily, until we figured out how these parameters actually work. And that's something where, depending on the site, you might see that affecting the rest of the crawling as well for a period of time where I'd say, I don't know, maybe a couple of weeks, something around that time. So that's kind of where I'd say, if you want to make that kind of a significant change, then maybe pick a time where crawling from search is not as critical for your website. So if you have maybe a season where there's less traffic from search, where you have fewer new products, fewer new articles, then that would be a good time to change that around. Whereas if everything is under fire and your website is already running at its limits, then that's probably a bad time to make significant changes like that. So would you say that on a better way to deal with faceted navigation, it's not blocks and robots.txt is canonicalized? Yeah, to canonicalize or use noindex, that's essentially what we would recommend. OK, thank you so much. Sure. All right, I need to jump off to another meeting. It's been great having you all here. Thank you all for joining in. It's so cool to have so many people in. John, one more question please. Thank you. One really quick one. One really quick one. We have a new website called lullantop.com. Earlier, that website appearing in Google top browser. But since three to four months, the website is not appearing in Google top browser on any of the keywords. So just want to know what we have to do for that and what is the next process. And I didn't get any information regarding how to come back in top story browsers. So please help. I don't have any kind of magic tricks for top stories. So that's something where it's an organic feature and we use normal ranking algorithms in there. So what I would recommend doing in a case like this is going to the Webmaster Health Forum and getting as much input from other people as possible and thinking about what makes sense, what doesn't make sense for your particular case. OK. Not that I have, like, something I can do. Cool. All right. All right. Cool. I really need to jump off. Thank you all for sticking around. I have the next ones lined up for Friday and Thursday if you speak German, so we can stay in touch then. Thank you. Stay safe. Thank you, John. Thank you, John.