 I think that a lot of people see canonicalization as kind of topical grouping, which kind of isn't right at all. They need to, the pages need to be either identical or near. Near identical, exactly. Yeah. That's what it boils down to. Canonicalization is about duplication management. So basically you want to remove duplications so that we don't have to crawl things multiple times and we don't have to like render and index things multiple times and we also do not serve them all the time like the same things basically in three different URLs. That's not good search results really, right? Hello everybody and welcome to another episode of SEO Mythbusting. With me today is Rachel Costello. You are a deep crawl technical SEO and content manager. So what is it that you're doing every day? So I basically, well I used to be a technical SEO myself and now I've moved into more of the content production side of things. So writing white papers articles to educate the wider dev community, digital marketing community, about technical SEO and the impact it has. Awesome, that's really interesting. So you're seeing like a bunch of misconceptions and confusions and stuff and we picked an interesting topic, didn't we? We did. What's the topic that you want to talk about today? It's all about canonicalization. All right. So what are like the top myths and misconceptions that the community is dealing with? So I think the first thing is that people think it is a directive. You set a canonical tag. It's going to be accepted. Another one, yeah, exactly. Another one is that they kind of use it like a redirect. So if you have a product page that goes out of stock, you add a canonical back to the category page, which doesn't really work that way because I've heard that the content needs to ideally be identical if not very similar. So lots of things like that. Oh, interesting. All right, let's start with the idea that it is a directive because it's not new. It is a signal for us. So when we talk about canonicalization, we're talking about detecting content or the same content or very similar content that exists under different addresses in the different URLs. So we can do many different things to basically identify these things. We can just crawl multiple pages and see like, oh, this is actually the same content. We can also probably see if the same links and like the same kind of context is used, but also we can use the canonical tag, right? It's a signal. We're using many different signals to figure out if something is the same content or not. And canonicalization with the canonical tag is just one of them. So putting a canonical tag on pages that are not the same is not going to work. Putting a canonical tag on each of the pages that are exactly the same is also not going to work. It is a signal. It helps us identify what we want to canonicalize, but it doesn't say like, you have to use this. That's a big one, I think. And you're right. And you should not use it as a redirection either. It's not a redirect. I think people just want to group link equity wherever they can. And it's maybe a bit of a desperate act to try and keep all of their link equity in one place. It is. It is. Again, like canonicalization makes sense. If you cross-post the same content on different, I don't know, platforms or different channels in slightly different locations for whatever reason you're doing that, that's where canonicalization comes in. But if you are having something that goes out of stock, you should either redirect it to something similar that makes sense for the user at that point, or you can just tell us this is a 404 for the moment and might come back. But do not just think that you can, no, it's not the same as a redirection. Also, you're wasting crawl budget that way. Because we are just not understanding like, oh, so you're saying this is the same as the other page, but it clearly is not. So we're just going to continue doing this. But if you have two pages that are identical and you're not canonicalization, or you're not canonicalizing them the way that it makes sense, then we kind of have to look into both as well. And sometimes you get these like flipping canonicals. Yeah. What are the typical problems that you're seeing that people are having besides these misconceptions? Like, what are people doing that you think makes no sense? So I think people are just not quite sure. We've been trying to piece together what these different signals are that play into effects. You've got redirect, site maps, backlinks and things like that. I think people are trying to weigh up how many of these signals they should add. Maybe they're kind of doing it like a maths equation, like if I do these two things, then this will mean that Google picks my canonical tag that I want. But it would just be interesting. I'm always interested to know more about how the signals are weighted, which ones are more preferential to others. Because sometimes I see that maybe this is just my theory that maybe Google puts more weighting to signals that are more likely to have been implemented by human rather than maybe an auto-generated setting. I don't know if that has any. Well, duplication and deduplication is actually done without much human interaction. So this is all automated signals. But we do content fingerprinting. We look at things like what is the gist of it really? What are the information here? How does this relate to the site structure? What does it say in the site map? So we're looking at a bunch of different factors, but they're mostly technical factors. And we are basically scoring them on an ongoing basis. So it's not that we're determining it once and then just stick to it. We are always looking at the fresh content that we got from crawl and then have a look at like, does this change? If it changed, is it now very close to what it has been before? Now maybe something that has been a duplication is no longer a duplicate because it has changed its content. So that's absolutely possible. But sometimes, especially when pretty much everything is showing up in the same URL structure and it's maybe like different language versions of the same thing, but it is the same content, then we might end up with a scoring that is very similar. So we have both versions and let's say like 1 is 0.49 and 1 is 0.51 of what we think is a duplication of the other, then it's really hard to pick which one will be the canonical. And that can change, right? A change in, I don't know how we crawl things or how the crawler has fetched data and how it has been fetching the other pages beforehand might influence us to have like a tiny little bit of a jump in these two numbers and then the other one is the canonical. So make sure that you're trying to give us as clear a signal as possible and not confuse the algorithms that are working with figuring out which one is the duplication of which other thing. Because if we are having two equal pieces of content, then how do we know which one we should pick? Exactly. And you don't want Google to be in that position where they feel like they have to pick for you or Googlebot feels like it has to pick for you. And it makes everything more complicated on your side as well, especially if you're using things like Search Console, right? We're gathering data and showing you data based on the canonical. So if it's, so it's flapping between two URLs, then that's going to look really weird. So anything else that you would say is unclear about it or is there something that makes your life really hard when it comes to canonicalization? I think it's figuring out the certain thresholds you need to get to override Google's decision on what is the preferred URL because we can align all of our signals on site. But I saw that John Mueller on the Ask Google webmaster video about canonicalization. He said that there's two aspects. You've got kind of the onsite signals, but you've also got what you, what Google thinks that the user would most like to have a look at. And that depends on a bunch of different things. So for instance, we might canonicalize one language version over the other if you're telling us that all of them are canonical at the same time and they have pretty much the same content, especially if it's in the same language just for different countries. Then we might show the version to the searcher that the searcher is in the country of. So if we have a DE version and an AT version, so the German version and an Austrian version, that are pretty much the same. They use the same currency. They might have even the same price. If you're unlucky, we might show different URLs to searches depending on where they are from. It makes more sense for a customer in Austria to see the Austrian version of the website rather than the German one, even though the German one is the canonical. So that might be a little confusing and misleading. Any other questions from your side? Yes. So there was one question I had in that. So if Google accepts the canonical tag on a page, that it will ignore any unique content on that page. But then that's interesting because surely the pages have to be identical in the first place. This is something I've heard. If there's any unique content on the canonicalized page, it will be ignored. So how would that work? Would the canonical tag not be accepted then because they're slightly different pages? So that depends on how different the unique content is. If you have mostly the same content and then maybe have like one sentence that's slightly different, then we might still think that it's pretty much the same thing and then we would not see the unique content necessarily if we think that it's just a copy of another page that is canonical. If this page has the canonical, then we will probably see the unique content there as well because it's the page that we picked. However, if the content is completely different or different enough for the algorithms to decide that this is not a duplication, then the canonical is pointless. Unless there's another page that happens on another URL that happens to point to the exact same page, then it becomes interesting again because then we have two different pointers to the same thing. And we get that oftentimes that people are linking two pages and accidentally have some URL parameter that basically gets ignored or doesn't actually matter or there's a slightly difference in the way that the URL looks like. Maybe you have like a slash DE something something and then like a slash DE something something, question mark, cache equals false or something like that. That doesn't really matter. Then we might canonicalize to one of these pages and probably the one that does not have parameters and stuff. But that also is debatable. It might also happen that we've canonicalized something with parameters. But that way you are again making it harder for us to pick the A canonical because if you're not saying like, oh, this is specifically the canonical we want then it's back to guesswork. And I think that's the problem. People are just trying to group pages topically maybe with canonicalization. But that's not how it works. Not how it works. No, thank you for confirming that canonical tags and canonicalization is about reducing duplication. Yes, that's what it is for. Exactly. Awesome. Rachel, thank you so much for being here and talking a little bit about canonicalization with me. And I think that was useful and I hope you enjoyed it. Have a good time. Bye bye. Hey, everyone, I hope you like the previous episode. Next episode, me and Glenn are going to discuss site moves, right? Site moves, domain name changes, URL migrations and more. So stay tuned and check it out.