 OK, welcome, everyone, to today's Google Webmaster Central Office Hours. My name is John Mueller. I'm a webmaster trends analyst at Google in Switzerland. And part of what I do are these office hours hangouts with webmasters and publishers. Looks like a bunch of you made it here already. There are also a bunch of questions that were submitted before the hangout. But maybe, as always, if any of you want to get started with the question, feel free to jump on in now. Or maybe it's too early. Everybody needs some coffee first. That's fine, too. OK, I'll just run through some of the submitted questions. And as always, if you have any comments or questions around these questions and answers, feel free to jump on in. If anything comes to mind over time as we go through these, you're also welcome to jump in and ask new questions as well. And we'll have time towards the end for anything more specific from you as well. OK, so we're starting off with the most fun question. Whoops, someone has a bit of echo. Let me just mute you. Last week, Gary said that pruning content won't work. And this week, you said, well, what's up with that? I think we discussed this pretty much in detail in the last hangout. So I would just go back and look at that. I think there are cases to be made for both sides there. And definitely, if you can improve your content in any way, then that's what I would strongly recommend doing. If it's something that you almost regret having put up on your website in the end, then obviously the clean way to do that is to take it down or to remove that content. So that's essentially the short version. But there's a bit more discussion around that in the previous hangout, so I'd check that out. How does Google treat links to images on my site? Let's say I see people linking to images on my site. How does Google treat them? Is that a normal link? Because it's only a file and there are no other links pointing to that image deeper to my website, from that image file to deeper in my website. We see these essentially as normal links, and we use them in a way to understand that this image is associated with that page that's linking to the image in the sense that for image search, we could theoretically show that image in the search result together with some context from the linking page. So you can have images show up in image search either by embedding those images in a normal HTML page, which is kind of the normal way to do it with the image tag and just specifying the image URL. But you can also link to images from an HTML page, and we'll also associate that image with that HTML page. And then we can use that pair for image search. And if someone is searching for some content that's kind of like the context or more information around that link to the image, then maybe we would show that linking page as the landing page for that specific image. So that's essentially the primary connection there. It's really mostly around image search. What is the best way to handle unavoidable expired content like job postings? Should we 301 them to relevant available job, 404 them with a custom 404 page? Should we keep expired pages as a soft 404? Or is there something else that we should be doing? So maybe first of all, you kind of say unavoidable expired content as if that's a bad thing. I think that's perfectly natural on the web that some content exists for a short period of time. And afterwards, it's not relevant anymore, and you remove it. That's kind of how classified sites work, any kind of listings that are temporarily available. That's completely normal, and that's something that I wouldn't say you should see as something bad. I think kind of like with the first question here, removing content when it's no longer relevant, I think is kind of the right approach here. So obviously, if you have something that you're replacing, like if you have an old listing and you're replacing it with a new listing and it's really the same thing, then 301 redirecting is the right approach here because you're replacing one thing with something that's essentially the same. If you don't have a real replacement for that, then doing something along the line of a 404 or no index is probably the right approach here. I've seen some sites kind of experiment with this in the sense that they try to find the right balance to make it possible for users to find some expired content, but not to keep that forever. So one approach might be to say, after this listing is expired, I'll keep it online as a 200 normal HTML page together with a text that this listing is no longer valid. And after maybe a month, take that page down with a no index or a 404 so that it's completely removed. That way, kind of during that short time where people might be interested in that expired listing and might want a little bit of information about what they just missed, they'd be able to find that in search. And in the long run, you're still kind of cleaning things up rather than building and building and building and building more URLs to your website that aren't really relevant to anyone. So lots of variations there. I'd say there is no absolute answer to this. It really kind of depends on your site, the kind of content that you have on your site, how relevant it is when it's expired or when it's just shortly expired and what you want to do with that in the long run. So those are kind of some of the things that you could be looking at there. Hi, John. Yes. Can I jump in with the question, please? Sure. OK. So currently, we are trying to do a strategy with microsites on our website. And we are essentially already 11 million pages on Google already. And we are like a directory in Australia. And we are trying to gift microsites to every advertiser on our directory. And we are trying to increase more interest of the advertiser on the website so that they could do stuff on their microsite. So what are the things? I mean, first of all, would you suggest a microsite? That's the first question. Second thing is, what are the things that I should? I mean, I've read some blogs by Vanessa and some videos by other people. So I just wanted to confirm on what your thoughts are on this one. I think it can be tricky with microsites because it often ends up looking like a collection of doorway pages. So that's one thing I'd kind of watch out for and really think about what you want to achieve in the long run. And if these are sites that you want to build up in the long run, then maybe that's the right approach there. But if you're just creating a bunch of sites for a ton of different businesses that really don't have any value for the long run that are just driving traffic to another site, then that's something I try to avoid doing for search. So one thing you could do is say you're using these as landing pages for advertising and just no index them in a case like that. That's something that I think can make a lot of sense. But if you're really kind of just like creating a ton of different microsites for all kinds of different businesses, then I could imagine the web spam team looking at that and saying, this is essentially a collection of doorway pages. We're just going to remove them all. Similarly, I could imagine our algorithms looking at that and saying, this is essentially kind of the same thing, also a collection of doorway pages. And we should just be demoting them all in search because they have no value of their own because the real business actually has their own website, for example. So that's kind of what I would be looking at there. I'd be reluctant to just say, OK, we're just going to create thousands of microsites for our clients or our businesses that we interact with. And we don't really care about everything else because that might be a lot of work with very little actual gain at the end. OK. And are we also at, I'm sorry, can I ask another question related to this? Sure. OK. So are we also at a risk of kind of losing domain authority and the clicks if we have microsites? Number one, second thing is I imagine since we are doing microsites for thousands of advertisers, and if they start putting the similar content on the microsites, then are we also at risk of duplicating the content on those areas? So I think domain authority is a concept that various SEO tools have put together. So I wouldn't worry too much about that because Google doesn't use domain authority like that. But you are competing with yourself. If you have a normal website and you have a bunch of microsites, then all of these are competing with your content, with your business, essentially. So that's something where you're spreading out a lot of really thin value across all of these microsites instead of concentrating it on the one URLs that you really do want to have visible in search. So that's kind of another one of those downsides where you're kind of diluting the value that you have on your website across all of these different microsites rather than concentrating it and making it really strong on your main site. OK, John, thank you. Sure. All right, let me run through some of the submitted questions. And I promise I'll get back to you for more questions towards the end. We'll definitely have more time. Could Google measure my dwell time? If I didn't click the Back button, but close the page, result browsers tab because I opened it up in a new tab. So I think that kind of goes into the theme of Google uses dwell time for ranking. And for us, we use these kind of signals primarily in understanding different algorithms to figure out which algorithms are working well, which algorithms are not working well. And when looking at it with that kind of broader view, these kind of edge cases with regards to how you interact with a page, like you open it, and then you close the whole browser, or you open the page, and then you turn off your computer without the browser being able to respond to anything, all of these kind of edge cases usually kind of even out when we look at it in a bigger picture. So it's not so much that you really need to worry about what exactly Google is measuring there because we're trying to, essentially, AB test our algorithms. And for that, we do expect a lot of noise, and we kind of have to live with that. Should I have privacy policy in terms of service pages set to index or no index? I would definitely leave those indexable. That's normal content, content that people might want to find in search. So I would leave those indexable. I don't see any reason why you'd no index this type of content because people might be searching for it, and they expect to find something on your site around that. Some web design agencies are keeping expired domains from previous clients without redirecting them to a site. As a result, multiple duplicate URLs are live. Are those considered doorway pages? I suppose a link from those satellite domains can be dangerous. So first of all, I think it's a terrible practice for any kind of agency to keep a domain name hostage, essentially, from a client. I think if you're a client interacting with any agency and they have this policy that the domain name belongs to the agency and not to the client, I think you should probably go to a different agency because this is terrible practice. If you create content for your website, you need to be able to keep that domain name. That's where kind of you want to have that stable place on the web regardless of who's currently working on the website. So I think the first thing to kind of mention here, if you're putting content out on your own domain name, then make sure that it's really your domain name and not something that you kind of have to pay an extra enormous fee to an agency to actually get that in the end. And if there are situations where an agency is actually getting a domain name for you, you might want to take the extra step and say, OK, I'll get the domain name and let the agency just set up the hosting and do all of that. That way you're also sure that things like emails continue to work because it's a terrible thing when all of your clients are emailing a domain name that essentially is not yours and that at some point might get turned off, might expire, or might be kept in a mail server running somewhere the emails arrive, but they're not delivered to you as a business. So I don't know. I find that a terrible practice. With regards to these kind of older versions of a website being considered doorway pages, I generally wouldn't worry about that. That's not something that the web spam team would look at and say, oh, they're trying to manipulate the search results by creating lots of different sites because usually it'll be clear that this is like an old site and here's the current site. And that's generally less of an issue there. With regards to link from those old domains, I think that's also irrelevant. Is right click open in a new screen seen as a normal click? Yes, we do track that. And we do show that as a click in Search Console and the Search Analytics. So regardless of how you open a URL, we do try to recognize and track that as a click. Sometimes you can see that in the search results when you try to copy and paste the URL from the search result with the right click that you see this long tracking URL associated with it. Sometimes we can do that with, I believe, the ping feature in HTML5, where we ping the tracking URL to make sure that we record that click and show it to you in Search Console. What is no opener and no refer links? Do these carry any SEO value? Those are completely normal links from an SEO point of view. This is essentially only a practice with regards to forwarding a refer to the site that is being linked to. And I believe WordPress or one of the other CMS systems at some point started adding these micro-formats to links in their CMS. And that's essentially completely normal. That's something that they can do. I'm sure they have really good reasons to do this. So at least from an SEO point of view, I wouldn't worry about this if this kind of micro-format makes sense for you, provides additional value for you or security in some sense, then go ahead and leave it like that. If you don't know what they do and WordPress is kind of setting this up automatically for you, I double check with WordPress. But my sense is the folks from WordPress are really smart people. And they probably have a good reason to do something like this if they're doing this by default. We switched our site from HTTP to HTTPS, but forgot to update our site map file. As a result, all of our pages dropped out of the index. We fixed the issue, but we still only have two pages indexed since weeks now. So I think first of all, if you forget to update your site map file with a move to HTTPS, your pages should not drop out of the index. I suspect if they did drop out of the index, then there's something significantly broken somewhere with regards to the technical setup of these pages and not with regards to the migration from HTTP to HTTPS for search. So in the worst case, what can happen with this mixed migration with HTTP, HTTPS is that we have some URLs index as HTTP and some URLs are index as HTTPS. And it's not that these URLs drop out of the index completely. It's just that we have kind of the old version, because the site map file says the old version. And we also have the new version, which we found as well. And there's some competition between these two, which one we actually use. But it's not that the ranking would change. It's essentially just switching the URLs. So if you're seeing things drop out of the index completely, I would look for other sources of issues, like maybe no index tags that you accidentally activated, something in the robots text file that's accidentally in there, maybe URL removals that you accidentally put into place. With URL removals in Search Console, you need to keep in mind that it removes both the HTTP and HTTPS versions as well as dub, dub, dub and non-dub, dub, dub. So that might be something to kind of look out for. If you can't figure it out why these pages dropped, I recommend dropping by the Webmaster Help Forum. The folks there are really good at recognizing these kind of technical issues and can probably show you right away where these pages are being dropped and why they're being dropped. What are your thoughts on backlinks from websites with a high domain authority that people pay for? So as I mentioned before, we don't really use the domain authority that various SEO tools use. I think it can be useful information from an SEO point of view to kind of get a rough overview of how these tools see those sites. But you need to take it with a grain of salt and understand that Google is not looking up the domain authority based on what these SEO tools create and not using that like that in Search. But again, it might be useful to kind of understand for a new website, if you're just getting started with a client, how they're currently standing, how they're connected with other sites. So I wouldn't say it's a bad thing, but take those metrics with a grain of salt. With regards to backlinks from those sites, paid backlinks paid backlinks are against our Webmaster Guidelines. So that's something where it doesn't really matter if it's a high domain authority or low domain authority or just any other website where you don't know the domain authority. If these are paid links that are passing PageRank, then that would be against our Webmaster Guidelines. That would be something that the WebSpam team might manually take action on or that our algorithms might look at and say, this is a bit problematic. Maybe we need to just ignore those links that are set up like that. Thank you. Yeah, that was a question somewhere or comment. No? OK. Could I ask? I wasn't asking the question, but can you hear me? OK. Yes. OK, so my question has to do with bringing it back to microsites. I was wondering about I've got a dentist who contacted me. He says, oh, my SEO company says, we'll create you a name for cosmetic dentist your city and then Invisalign your city and dental implant your city. Just go on EMDs across the board and that they're going to use supposedly unique content, but maybe duplicate content. But duplicate content aside, what do you think about that strategy? I mean, obviously, you want one domain. You want it to be whole. You want to have backlinks going to it and have a brand. But what do you think about that strategy when in lower competition markets, the argument is, it works. And it's easier than building a brand and building backlinks and actually doing things the right way. I don't know if you could really say it works that easily. But it's something where I think you're kind of playing the game of doing some things for a temporary gain that might be more problematic in the long run. So all of these kind of microsites that you're creating, if someone from the website team looks at that and says, well, these are all microsites, we'll just kind of take them all out. On the one hand, that might be something that affects your main site as well, because it's kind of a part of this collection of microsites. So that's one thing I'd kind of worry about there. The other thing is that by spending so much time on creating all of these individual sites, you're kind of neglecting your main site. And that's something where it might happen that other people compete with you in a stronger way in some of these markets. And then suddenly your main site, because it's just kind of OK and not as great as it could be if you had spent more time actually working on that, suddenly you have more stronger competition that's actually outperforming you in search, because they're more focused on their own site. So that's kind of what I would look at there. And I do know we have some algorithms that explicitly kind of look for this situation where we see kind of lower quality EMD sites, and we try to figure out how they're connected and say, well, all of these are essentially the same thing. We're just going to demote them in search and pick maybe one of these to show in the search results. So in the end, what might happen is you spend so much time creating all of these microsites, and only one of them is actually shown in search, which isn't really what you're trying to do there. The thing where I could see microsites being useful is if you're using them for advertising. So if you're not explicitly doing them for search, but rather you're using them maybe for print advertising landing pages where you kind of want to have a short and unique URL that points to a specific part of the services or the content that you have. And for that, I definitely see kind of a good use case to have kind of these EMD-style domain names. But again, for search, I'd really recommend trying to focus on one main site and just kind of bringing all of that content into that one site and making that as strong as possible. That makes sense. There is one question, one more question, totally unrelated that is just totally a buzz in the dental industry. Everyone is wondering, where have my Google reviews gone? And I don't know if this is the appropriate form for that. I don't know if it's another department at Google. But some people are wondering, there's a lot of theories going around. But some people are wondering if it has anything to do with automated tools that allow you to text to your patient or your customer or email to them. And it's a real review. But they're given an easy way straight to Google reviews. And some people are wondering if it has something to do with using those programs. I don't know how these reviews are handled. I do know there's a lot of spam that's happening in the review space. So I'm sure there probably is a lot of work happening on Google's side to try to figure out which reviews are really relevant and which reviews are really worth showing on these kind of location pages. But I don't really know what specifically is happening with dentist reviews. I believe there is, on the one hand, a help form specifically for Google My Business. I post there, maybe give some information, some sample sites, queries, where you're seeing this has happened. Maybe screenshots if you have anything like that so that the people there can kind of give you more specific advice. Because just going there and saying dentist reviews are missing makes it really hard for people to try to figure out what exactly might be flagged there or what exactly might have been picked up there. Thank you. Yeah, it's just kind of like a it's being treated like a pandemic almost. It's like, oh, I got 500 reviews and now I got 400. And it's all happening at once. So I guess it probably has something to do with the filtering. The algorithm is maturing, I guess. Yeah, I also don't know what the policies are there with regards to how you're allowed to kind of promote reviews to your clients. I don't really know. So I really double check with them. Thank you. OK, what is spam score? And does it really matter for penalization risk where there are trusted tools to check the spam score while building backlinks? So I don't know what the spam score is. It sounds like something maybe from an SEO tool somewhere. So I'm not really sure what specifically you're looking at there. I'm not aware of any kind of external tool to that provides a spam score for a page. The other part there, while building backlinks, I think kind of sounds like you're going off and creating backlinks yourself, which might be kind of problematic. So I'd be kind of cautious there when it comes to just focusing on a specific metric and saying, well, I will just place my links on all sites that have a low spam score, assuming that's something relevant. If you're placing those links yourself, then those are not really natural links. So that might be something to keep in mind there. We have recap chat on our site, but we're unsure which Google account was used to set it up. Is there an easy way to find out from the key? As far as I know, the API keys are set up in a way that you can't kind of track back to the email account that was used to set that up. So maybe it makes sense to just set up a new one and use the new one. Otherwise, I don't know if we have any specific recapture type support available. There might be maybe on the Google Developers API, Google Group, there might be someone there who can help you find out a little bit more. But in general, if you just have the API key and no other information, then I don't believe you can track back which Google account was used for that. I've done a 301 redirect for Page for five months. When can I delete the redirected Page, which doesn't affect my SEO score? So we don't have an SEO score either. In general, a 301 redirect is called a permanent redirect because we kind of expect it to be in place permanently. In general, we try to use this to recognize which URLs have moved on to other URLs. And for that, I generally, kind of as a rule of thumb, recommend having that in place at least a year. Especially if you're still seeing traffic to that old URL, either from search engines or from users directly, then I would definitely keep that up. If you don't see any traffic at all going through that redirect, then maybe it's a sign that search engines don't really care too much about that redirect or users are not seeing that redirect anymore, then that's a sign that you can just drop that as well. Page speed is extremely important to me. When using GT metrics, one recommendation always comes to hunt me. And it's regarding to leverage browser caching, specifically for the Google Analytics JavaScript. When are you guys going to fix that? Or should we just ignore it? I think it's probably tricky there. On the one hand, I don't know exactly what GT metrics is measuring there. I think it's good that you're using tools to find issues or places where you can improve the loading speed of your pages. But I would use those tools in an educated way in the sense that you're using their output and interpreting it and not just blindly following metrics that are in there. So that's maybe one thing to kind of keep in mind that you need to really understand what these tools are telling you and to be able to interpret them. The other thing is that I know a lot of the JavaScript that we serve is served in a dynamic way, in the sense that we try to recognize what user is actually accessing the JavaScript. And we serve a kind of a locally optimized version of that JavaScript to those users. So what could happen is that some of these testing tools might be accessing the JavaScript file in a way that doesn't match what a normal user would be doing. And in a case like that, our servers might be defaulting back to a simpler setup, which might have different caching settings, which might have different kind of minimized content, different kinds of compression setups. So it might be that these testing tools are not actually showing what a normal user would see in the search results. So that's kind of another thing to keep in mind there. Let me double check. I think there are some questions in the chat. OK, I've redirected the page with a 301 redirect. When is the best time to remove that? I think we'll cover that. How do we check if a domain, if that domain name we're getting isn't tainted? Tainted. I think that's tricky to kind of test for in general. If there's a domain name that we remove completely for web spam reasons, then that would be because of something that was hosted on that domain name. So if you go in there and you create new content on our domain and it has previously been removed from search completely because of maybe the old content on there, you can do a reconsideration request and essentially you have that domain name back, essentially just as it was before. The tricky part, I think, is when you have a domain name that was used and that has a lot of problematic links associated with that, then that's something which might make it a bit harder for you to kind of build on in the sense that you have kind of all of these connections by default just by kind of buying that property in kind of a bad part of the city, essentially, that's associated with your domain. And cleaning that up can sometimes be tricky, can sometimes take a lot of time. So that's something where you can use various external tools to look up the links to a domain name and kind of make a judgment call there and see was this domain name used in a kind of a reasonable way in the past or was it used in a really problematic way? And if you see that it was used in a really problematic way, then you can still try to make a decision based on what you're actually trying to do there. If you're saying, well, this is a really awesome domain name and I really want to use this for marketing reasons and there's so much value that we get out of this domain name outside of just search, then you might say, well, it's worth it to spend some time to clean this stuff or it's worth it to say, well, it might take a bit of time for this actually to be visible properly in search because everything kind of has to be built on top of this shaky foundation. But you might also say, I don't have time to focus on all of this cleanup stuff. Maybe I'll just use a different domain name. So those are kind of the decisions that you can make there. And sometimes a really awesome domain name is something that you don't want to pass on just because it has a bit of weird history and maybe that's something you can clean up on the side while you're still using your previous domain name and then switch over when you're sure that all of the old stuff has been cleaned up. Oh, and Jeff links to Spam's core tool from Moz. OK, I didn't know that. Cool. What's the best practice for moving sites to HTTPS, considering them as properties in Search Console? I would take a look at this in our help documentation. We have a lot of information on moving from HTTP to HTTPS. So instead of reiterating all of that, I'd recommend checking that out. It has all of the steps you need, essentially, also with regards to setting up Search Console. What is mixed content? Mixed content is generally referred to when the page itself is on HTTPS, but you have some content within that page that is not HTTPS. So a common situation is you have the landing page on HTTPS and an image that you pull in from HTTP. And then the browser will warn you and say, actually, this is kind of secure, but you're leaking some information because all requests that go to the HTTP URLs that are embedded, they're essentially unencrypted. They're in plain text. So that's what browsers would warn you about. That's why it makes sense to really look at the HTTPS pages and make sure that you don't have this kind of mixed content issue. I have 25 categories, my e-commerce site. We have some products that are in different categories. Should I worry about that with regards to duplicate content? Is there a duplicate content penalty, essentially? Or will Google see this as thin content? From our point of view, this is something that is completely natural on a website. We've had to deal with this for a really long time. It's more of a technical issue for us to deal with than a quality issue for you. There is definitely no duplicate content penalty with regards to that on a website. The only downside is we will index all of these different variations. So we'll need to crawl a bit more on a website. Normally, that additional crawling is no problem, especially when you're talking about one product being in a couple of different categories. I don't see that as exploding the amount of crawling that we have to do for your website. And with regards to thin content or not, that's also generally not an issue because these pages by themselves are valuable as well. What you can do is use the rel canonical to pick one of these versions as your preferred version. That makes it a little bit easier for us to focus on that preferred version. And it's generally the best practice there, but it's not the case that we would penalize your website if you don't do it exactly like that. So no need to panic. There's some things that you can do to help clean that up. There's a question here about Google News with regards to updating articles and changing URLs. I don't know how Google News handles that. So I'd recommend checking out the Google News Publisher Help Forum. There are some awesome folks that are active there who can probably give you more information there. Also for Google News, there is, I believe, the ability to contact someone from Google News directly through the Help Center. So I checked that out as well. I noticed something when we take a site to HTTPS, the nav bar links get updated automatically, but all the internal links, including footer links, still point to HTTP. Is there a negative impact? Could this hurt us? Is an easy way to update all internal links besides manually? So updating all of the internal links besides manually, you'd have to look at your CMS to see if there's a way to do that. I know, for example, for WordPress, some people have fancy MySQL scripts that they use to kind of automatically just switch everything over. That might be an option, depending on your CMS. In general, what happens there is we kind of get conflicting signals. Usually, you will have enough signals to tell us that you want the HTTPS version index, and we can do that. But anytime we have different signals or different URLs leading, essentially, to the same content, we try to make a decision on which one of these URLs to actually use. So the decision there is usually based on things like redirects. Is there a redirect to one of these versions? What are the internal links like? Are they pointing at one of these versions or not? So in this case, the redirect would be going to HTTPS and the internal links to HTTP. So it's kind of this conflicting situation. Rel canonical, if you have that on the pages. If you have that on the HTTPS pages, then that's kind of an extra point for HTTPS. Sitemaps, if these URLs are in Sitemap file or not, maybe they're all ready for HTTPS as well. So a lot of these things are probably pointing at the HTTPS version, and probably will pick that. But I still recommend cleaning that up so that we can really clearly understand that everything is HTTPS. Also, from a user's point of view, if those links go to HTTP, then they have to go there and then get redirected back to the HTTPS version. So it's not awesome. It's not terrible, but it's something that you could probably clean up. Internal linking practices proposal is to write a script that automatically created an in-body link for the first instance of every word on every page to an internal link to the commercial page on that topic. To me, that sounds kind of spammy and bad. Yeah, that does sound kind of weird. I think that the main thing that I would find kind of weird there is that from a user point of view, this would look really weird if you have all of these kind of internal links within the first sentence of every page. Doesn't seem very useful. With regards to crawling a site in general, if we were able to crawl your site normally, there is no need to do these kind of internal linking games. If we can understand the context of individual URLs within your website, then that's really what we're looking for there. There is no need to artificially link every page with every other page on a website. I see in Search Console, Google is crawling 2 to 300 pages per day. But once a month, at least, or at least once every second month, there's one day where Google is crawling 3,000 to 4,000 pages, almost my entire website. Technically, what is happening there? Is this some kind of reconsideration from Google? How should I take that day? I think this is just our algorithms having a weird day every now and then. So I don't see this as being anything kind of problematic that you need to worry about. It essentially means that our algorithms are seeing that they can crawl a lot more from your website. And then maybe at some point, they're like, well, we haven't checked all of the pages on this website, and we could crawl a lot more. So maybe we'll just kind of get that through as well. So I wouldn't see this as anything problematic. I suspect it's just algorithms doing weird things which happens every now and then. If we add pagination with rel next and rel previous, is it necessary to also add rel canonical to the page itself? No, it's not required. Sometimes it's a good practice, kind of up to you. Hi, John. Yes. Hi, can you hear me? It's my first time here. I've just joined. Oh, fantastic. One question you got to write. It's about page rank. So one thing that we've done is we've tried to limit the number of pages that we published to Google by essentially leaving out some long tail kind of pages. And the idea is to concentrate more of the page rank that we get from other external sites to a smaller set of more important page. So the question is, is this idea valid or is it outdated? Thank you. So it sounds a bit like page rank sculpting, which is something that probably doesn't work in the way that most people externally think that it works. So that's something where I'd kind of be cautious about something like this, especially depending on the way that you set that up on your website. So if you're just not publishing at all some of the content that you have available, that's clearly up to you. If you prefer to have a smaller site, that's clearly up to you. What will usually happen there with a smaller site is we don't have a lot of content from the site. So we don't have a lot of kind of different queries that we can guide there if it's really a smaller site. Whereas if you have a larger site that has a lot of good content, then people who are searching for this specific aspect here might have a landing page on your website that's really relevant to that. And we could guide them there. So that's something which I'd say is more a question of what is your overall marketing strategy or online strategy? Do you want to create a lot of content that covers a lot of ground? Or would you like to create a concentrated version with the understanding that maybe some of these long tail queries, they don't end up on your website? However, there's no cause of relationship between leaving out long tail pages and then getting better ranking signal for the remaining ones if I adversely tried. Yeah, I think there is an element of dilution that's kind of involved there as well, where if you're kind of artificially taking pages and saying, well, I'll take this one big page and split it off into 20 different smaller pages to kind of cover more ground, then that's just diluting the content. Whereas if you're adding additional content, then that's kind of just adding additional value to your website overall. So I wouldn't see that as something that would be problematic for the rest of the website. Thank you. All right, let me just double check to see that we have most of the questions covered. And we can go a bit longer as well. Using password protection on existing pages will affect the pages' ranking. If you put a page behind an authentication or behind a password, then obviously we can't really crawl and index that page. So that's a bit of a problem. I think the main situation where this kind of works is if your whole service is behind a password. So if you're an email service provider, then obviously that login page is what you need to have ranked. So we can try to figure that out and show that login page. But obviously any content that's behind a password, we wouldn't be able to access because Googlebot doesn't have your password either. We're nonprofit and people like to help us with product tutorials, but they might also post this content on their own blog, which is duplicate content. What's the best way to deal with this as they want to help, but we don't want to be penalized? Good question. So the best way to deal with this is to pick one of these URLs as the canonical and to have the other versions of that content with a rel canonical pointing to your main version. That said, sometimes that's not possible. I think this kind of setup where you have people who are creating content for you and also publishing the same or very similar content on their own site is probably something where you don't want to tell them, hey, you can publish it there, but you have to put a rel canonical on there and thank you for creating this content for us, because they want to see something out of it as well. In general, what I'd recommend doing there is just making sure that the content that's created is clearly kind of focused on you on your side as well. So regardless of where users see that content, you get something out of that. So for example, if this is content about a no profit or if this is ways that people are interacting with a nonprofit and they kind of mention your nonprofit and the video or whatever content that they're creating is kind of about your nonprofit, then regardless of where users will see this content, they'll understand that it's associated with your website and you can kind of get some indirect value out of that. Not in the sense that we would rank your website higher automatically because of that, but in the sense that people know your brand, know the name of your nonprofit, and can go and visit you directly if they feel like doing that. So that might be one way to kind of look at that. What are some technical reasons that may affect the page cache? So there's a no archive meta tag, which is pretty much the most direct way for pages in JavaScript. Then we only cache the HTML version so we don't see the JavaScript version in the cache, but we don't show it in the cache. And sometimes we just don't show a cache at all. That's something where sometimes for technical reasons we just say, well, we don't have a cache version of this page available, and that's completely normal as well. Why does Google recommend no indexing tag pages? Is it because of a lot of pagination? Can't Google handle that? So usually tag pages tend to be fairly low quality and they tend to look a lot like search results pages. And usually it doesn't make that much sense to show those in the search results. And instead, what we try to do is focus on the actual content on the website, which is kind of why people tend to no index tag pages because they don't really provide a lot of extra value there. Oh, wow. More and more questions are coming. We have an e-commerce site with hundreds of products that are identical in specification and only differ in print design. We can't combine them all on one page for practical purpose. Can we set a canonical to the main product category? So what usually happens with the rel canonical is that our systems try to look at both of those pages and figure out if these pages are really equivalent with regards to the content. And if they're not equivalent, then we generally try to drop the rel canonical to avoid causing any issues. So what might be happening there is you have this one version, maybe, of a t-shirt, and you have this general t-shirt category page. Our algorithms might say that these are not equivalent enough to actually merit trusting that rel canonical, so maybe we wouldn't actually use that. So my recommendation there would be to just have all of these separate. If these are really unique in the sense that you can't put them on one page, then I would just keep them separate, keep them separately indexed so that, for example, in the case of a t-shirt, people can search for, I don't know, Star Wars pork t-shirt and end up on a landing page that is specific to that new creature that they put in Star Wars. But otherwise, just using a rel canonical to the category page, I would recommend not actually doing. I mean, you can try it, and maybe we'll actually follow it. But for the most part, it's kind of if we would actually pick that up and follow it like that. I generally recommend doing when it comes to things that are more equivalent. So if you're talking about t-shirts and you have a specific t-shirt design, then like the sizes, or maybe the t-shirt material, the color of the base t-shirt, maybe that's more of an attribute than an actual different product. And in that case, I'd recommend putting all of those different product attributes on one single product page. So you'd have one landing page for the Star Wars t-shirt design, and it just lists all of these variations of different sizes and backgrounds and materials on that page so that you have one page for all of these different variations, and you have that as the canonical. What's the difference between .tech and .com and .com.ar? So .tech and .com are generic top-level domains. All of the new domain names or top-level domains that are coming out are seen as generic from our side. So you can set geo-targeting to whatever you want. .com.ar, I believe, is for Argentina, which means it's already geo-targeted for Argentina. You can't set geo-targeting to other countries. You can put an international site on an Argentinian domain name, but you can't, for example, take a .ar domain and say, I want to target users in France and set my geo-targeting to France because we would kind of block that in Search Console. Hi, John. Hi. So since we are a small company and we have a website, we are content more than 500 content blogs are available now. So if you every day, my question is that we are using my website, the blog is available, and there is a 3 to 400 content also is there. So these contents are a day by day going, it is going old now. So there is a chance to reuse that content for optimizing the site or improvement of the site. So you'd like to reuse like an old blog post. Yes. And publish it new again. In general, you can do that. I would avoid just reusing old blog posts and republishing them. Instead, what you could do is either take the old blog post and just enhance that so that you have one version of this content that grows in value over time. Or you could redirect the old URL to the new URL and build the content out on that new URL. So sometimes, for example, you have a date in the URL itself. And you want to make sure that that date reflects the current date because you've added more information to this page. And in a case like that, you'd probably want to redirect. But if you're building out content on a site in general, I'd recommend just using the old URLs and keeping those as much as possible. Anytime you make URL changes, it takes a bit of time to actually reprocess that. So I'd recommend trying to keep the old URLs as much as possible. So we have to add the content or we have to change the date stamp on it? It depends on what you want to do. Yeah, you can add new content to it. You can add an updated date on the page itself so that users, when they go to this page, they recognize that there was actually something changed on this specific date. I think keeping the old date on there can also be fine, saying, well, originally published on this date and last updated on this date, for example. That can be reasonable as well. OK, so since some content, sometimes it is not possible to produce as a text. But it is a video content. So when I'm posting a video content as a blog, it is 100% required to add the text or a video blog also. It's good enough for the SEO. So we don't understand what is actually taking place within the video content. So for video search alone, that might be fine. You have the description for the video. Maybe you have a title on the YouTube landing page, all of that. That might be fine. But for web search, we don't understand what is happening within the video. So if you want this page to rank well in web search, then you need to have some textual content on there. And that's the same with images. So a really common scenario is that a photographer will put their photos online, but they don't have any additional text to those photos. And then for us, it's really hard to say, well, what should we be ranking this page for? It basically just says, Nikon Camera, and these are the settings on this page. So having textual content, in addition to any of these multimedia content, is a great way to make these pages relevant for web search specifically. Yeah, thank you. Thank you, John. Thank you. Sure. All right, more questions from any of you. What else is on your mind? Sure, John. OK, so I wanted to get your thoughts a little bit on the crawling, basically. So my question would be that what influences crawl in terms of sitemap versus the HTML sitemap? Because what we're trying out and experiment is that we've submitted 1,200 URLs via an atom sitemap. And we are also trying to do an HTML sitemap. Now, we've managed to index 600 only, but things are not moving after that. So we are wondering on how to improve that and how to keep the crawler happy, basically. OK, how to improve the crawling, essentially. So in general, if you're submitting these URLs with a sitemap file, then you're already telling us that these exist. What helps, and in addition, is for us to be able to recognize those URLs within the website. So when we crawl the website, then we find the internal links to the same URLs as well. That way also helps to make sure that we use the same URLs as in the sitemap file. And you'll see that reflected in the index count for the sitemap file. The other thing to keep in mind is that we don't index everything that we run across. So it's completely normal for us to say, we get 1,000 URLs by sitemap file, and we don't know this website. We don't really know how important it is to actually index all of this content. Maybe we'll just index, I don't know, 500 of these URLs, or 100, or something like that. And this is something that improves over time, of course. As we recognize that your site is really important and really good, then we can start to crawl and index more and more of the pages on your website. But especially in the beginning, we might be a bit more conservative and say, well, we see there are a couple of thousand URLs in the sitemap file. We don't really know how important this website is. We don't know how we should treat this overall. So that's something where, if that's the case, if this is a newer website, then I just continue working on the website in general. There's no technical thing that you can do to push all of these URLs to the index. And even if they were indexed, then chances are we wouldn't be showing them in search because we don't really know what to do with them yet. So that's something where I wouldn't blindly focus on the index count there. OK. So how to know that? And shall I ask one more question, please? One more question. Sure. Yeah. So this was about duplicate content, again. So I have page A showing content and page B showing similar content. And we have basically the pages in terms of thousands, where we have similar content. But then it is very difficult to actually find out for a webmaster or a developer even because these pages are dynamically generated. So do the HTML improvements section, the duplicates, titles, the duplicate meta descriptions, are they a good sign for a webmaster to find out the duplicate pages in a site? Or should I be using a different strategy to flush out the duplicate pages? I definitely use the HTML improvements section there. But I'd also keep in mind that this is just a first glance at essentially a bigger website. And if you find any systematic issues there, then I would use that to try to see if there are ways that you can find the bigger set of issues within your website in a more scalable way, especially if you're saying these pages are generated dynamically. Then I worry that maybe the algorithms are set up on your side in a way that you're generating way too many pages than you actually need to generate and that you need to tweak the underlying server setup on your end rather than just adjusting individual pages manually. OK, thank you. All right, I think there's one question here in the chat as well. Does Google recognize and crawl the site archive? And will the old content in the archive keep the link equity and the ranking? Yes, we generally can try to crawl that archive of a site. And we try to keep that content available in search as well so that if people are searching for older content, they can still find it there. I think the general aspect of an archive versus the newer content also hints at the newer content maybe being more relevant at the moment. So I think it would be normal for content to, especially if you're talking about in news or blog situation, I think it's normal for content to get a lot of traffic and a lot of impressions and clicks from search when it's fairly new, because then it's really the most relevant and for that to kind of settle down to a lower level, sometimes a much lower level over time as you have that in your archive. So that's something where I'd say, yes, it does keep the value, but that doesn't mean that it remains as relevant in search and that it would normally be getting as much traffic from search over time as well. Then this question, does mixed content affect ranking? No, it doesn't affect ranking. It affects how we recognize if a page is on valid HTTPS or not, but in general, it doesn't affect ranking. It's more of a user-visible aspect there in that people go to your page and instead of seeing the secure lock icon on top, they see kind of this mixed content symbol. And maybe at some point, I could imagine browsers being more critical about that and saying, well, this is not really a secure page, and maybe they should even be showing a warning to users to make that much clearer. Jeff asked if that's like browsers like Chrome. I think pretty much all modern browsers are kind of taking a stance on promoting secure content. So I've seen Firefox do a lot in that regard as well. I suspect Safari and Internet Explorer are also doing similar things. So it's not just the case that Chrome is out to get the webmasters. It's really all of these browsers want to make sure that users who are using them are as secure as possible because only when users know that they're really secure, will they be able to kind of use the web in a more free way? So it's in the best interest of all players of the internet to kind of make sure that you're providing content in a way that matches the current kind of best practices. All right, any last questions from any of you? What else is on your mind? OK, can I ask one more question? Sure, go for it. OK, so I was asking about microsites some time back. And I was kind of trying to save the question for last. So if we are creating a microsite, would Google consider it as the same entity or different entity? Because the SEOs that we have here kind of are in two different opinions. So once you get your thought on it. I mean, if this is a separate domain name, we would see that as a separate domain name. So it's separate in that sense. I mean, obviously, you're probably interlinking these with other sites that you run for them. So that's something where the connection is generally clear. But overall, we would see that as a separate website. That doesn't mean we would show it for separately in the search results. We do filter things out in the search results, even if we know that there are separate websites. Hello, John. Hi. I would like to ask a product, for example, a product linked from more than one category. May wait for more in ranking signals because it's linked to that product? I don't know if you can say it's like higher value because it's linked from multiple categories. But in general, when we recognize that something is more relevant within a website, then we'll try to give that more weight as well. So common scenario is when something is linked from the home page or from the higher level category pages, then that tells us this maybe lower level page is actually not just any random lower level page, but it's actually an important one within the website. So with that in mind, it can help us to understand that these are actually relevant articles within this website. Just by having it listed in multiple category pages, I don't think that would be kind of like that strong of a signal. But obviously, these things do add up across a website. OK, thank you. All right, I think there's another question about expired content. What's the best way to handle expired content? If we unpublish the article, then it will create a gap in the content and the traffic loss. So it looks like an archive could be the only answer here. So in general, if it's expired content, it's kind of up to you to figure out how important this content is in the long run. And you might decide to keep it for a certain period of time and then remove it. You might say, this is a part of our company history. We want to keep this forever in maybe our news archive or in our blog archive. And then you'd want to keep that. So that's kind of the trade-offs that I would make there with regards to kind of a loss of content and traffic. Obviously, if people are searching explicitly for that piece of content and you remove it from your website, then we wouldn't be able to show your website for that piece of content because it's no longer on your website. But in general, when it comes to kind of archive material that you put into the archive, usually it's less something that people are still searching for. But again, it really kind of depends on your website and on what people are doing to try to find your website. And that's something you can look at in Search Console, for example, in Search Analytics, especially if you have your archive set up under separate URLs, like maybe slash archive, or you have your content organized by year, then you could look up that year 2001, 2002, and pull out those pages separately in Search Analytics and then see how much traffic is actually going there or which of these pages are actually driving the most traffic for that part of your website. And based on that, you can make a decision and say, well, I am going to clean this out because it's really something that nobody ever cares about anymore. We wrote about, I don't know, a puppy that visited the office in 1999. And maybe that's not so relevant anymore. On the other hand, it might be that people are actually interested in some of the older content that you put out, and that's not something that you'd want to remove. So kind of up to you. Hi, John. One more question, please. All right. One last one. Yeah. Thank you. If we allow Google to index the tags, for example, are the content generated from this seen as thin content? Not by default. So it's not that we say this is a tag page. Therefore, it's a bad page. Sometimes tag pages can be useful, but a lot of times, especially when people use tons of tags across all of their pages. I know it's about the crawling, the crawling time from Google, but I'm not sure if it's about the panda. Yeah, so if it's just about crawling, then having a no index there generally doesn't affect crawling that much. So we probably crawl those pages a little bit less if they have a no index over time, but it's not that we would never crawl them. So usually from a crawling point of view, that's not something you need to worry about. And from the panda point of view? From a general kind of quality point of view, it really depends on how you set these pages up. I've seen some really good category pages, which are essentially tag pages as well. And it definitely makes sense to keep those on a website because they have kind of a bigger theme and more information about that tag or category. But on the other hand, I also see a lot of tag pages that are essentially just random search results pages, which don't really provide a lot of value, which are essentially only useful to crawl the website and kind of find different articles on the website. And for that, you'd usually have a normal URL structure anyway, so probably you don't need them that much. And probably, or sometimes at least, you don't need them for users either. So if you look at your analytics, like you can kind of tell, are users actually using these tag pages or not? Or are we just kind of creating tag pages because we thought creating tag pages would be a good thing? OK, thank you. All right, so with that, let's take a break here. Thank you for all of the questions, and thank you for sticking around for so long. I hope I was able to answer some of those questions as well. Maybe I'll see you all again in the future. I will be in Mountain View next week. I'm trying to see if I can set up a hangout from there as well, so if you're watching this from an American time zone, then maybe we can have something more local next week. And maybe I can also set it up to get a guest if there is anyone in the Mountain View area who would be interested in joining a hangout live in person. Feel free to let me know. So with that, let's take a break here. Thanks again, everyone, for joining, and I hope to see some of you again next time. Thank you, John. Bye, everyone. Bye. Bye, thank you, John.