 This next speaker is one of the most exciting people I've met in the SEO industry recently. Based in the UK, ARIGE is a SEO consultant with over 8 years of experience who focuses on all things technical and on-site. She has done an incredible job to help increase the involvement and representation of women in the SEO community. She is the founder of the women in tech SEO community and has spoken in industry events such as Moscon, SMX, and Brighton SEL. The only thing she probably loves more than SEO are the Beatles and chocolates. You can find that on our website. If you run a large website, this is one presentation that you want to pay close attention to. Over to you ARIGE. Hey everyone, thanks so much for joining. My name is ARIGE and I'm here to talk about taking charge of your indexability. I really hope that I get to see everyone in Seattle next year and for now I'll be around in the chat. So I'm here to talk to you about something that took me a little while to wrap my head around but spending the last 18 months working on a 40 million plus index page website forced me to finally get it. We are fully in control of how Google chooses to crawl and index our website. And yes, we are in control but it can feel extremely scary because if we don't do it properly then we're not actually going to benefit from the many things that we are doing right. And when I say many things, I mean things like all the awesome content and off-page work that you might be doing. So if we're recommending SEO fixes, whether they're small or large, let's just make a pact right now as an industry to drop the 100 page audits and focus on prioritizing recommendations that are actually going to have an impact. So I'm going to start off by setting the scene. This is a completely fictional example that I'm going to be using throughout the talk today. Let's envision that we all work for a car aggregator website. Now our users, when they actually log on to the website, there's a number of different things that they can do. Firstly, they can search for cars near their location. They can filter these search results. These search results can be filtered by make or model or year, color, price range, everything you can possibly think of when it comes to cars. They can view car listings and they can contact sellers directly through the website. They can view car history information and they can even read the latest car news if someone is interested in doing so. These are just some of the features that users can do on our website. Now most webmasters, they tend to set up websites in one of two ways. This is specifically with aggregator websites as well. They can either attempt to index absolutely everything to capture as much ranking opportunity as possible or they can decide to no index a huge chunk to avoid suffering from index bloats. What is index bloat? It's when a website has an excessive number of low value pages indexed. So let's go ahead with the first scenario now. Let's say we decide to attempt to index absolutely everything. If we do that, we might end up indexing every possible view location, filter combination, car listing page and car history information page. Why would this be a problem in our specific example? Let's try to think about it from Google's perspective. Would Google even bother crawling every possible view location, filter combination, car listing page and car history information page? And if they decide to crawl it, would they even bother indexing it in this case? And more importantly, do you actually want to rank for every geolocation, filter combination, car listing page and car history information page? The answer to all these three questions is most probably going to be no. If we attempt to index everything, this is probably what's going to end up happening. Google might not end up crawling all of our pages. Google might not end up indexing all of our pages. Our valuable pages, the ones that actually matter, the ones that convert might not be crawled and or indexed. And our valuable pages might not even end up ranking. I'm being really generous by using the word might. So let's go ahead and say this scenario is out of the wall. This is not going to work. Let's go ahead and try the next scenario, which is no indexing a huge chunk to avoid suffering from index loads. Well, in this specific case, what tends to happen is that we miss out on ranking potential. So we miss out the potential to rank for every possible geolocation, filter combination, car listing page and car history information page. So this scenario doesn't really work either. So I guess what I really want to talk about is how and where can we actually draw the line between these two scenarios? Before we make any decision on how we can go ahead and control the indexability of our website, we need to know the answer to this one question. What is the high level KPI that we're actually being measured against? And whether you come into the business as a new person or whether you've been there for a while, there is a possibility you don't actually specifically know the answer to this question. And it's very important to know who to speak to in order to understand this from a business perspective. It most probably isn't going to be organic rank. It's not going to be organic sessions either. And it most definitely isn't going to be some third party tools visibility chart. We see a lot of those. And if it is one of those, it probably shouldn't be. It's most likely always going to be organic leads or revenue. This is what the business is measured against. And this is what an SEO function is also measured against. In our specific scenario with our car aggregator website, an organic lead happens when a user contacts a seller about a potential car. So why don't we try to tie these two together? Let's go ahead and tie index pages with organic leads and see what that looks like. So I want to go back to this list now. We know that this is what our users can do. And we know that this is the main functionality that they get out of our website. I tend to understand things better when they're in table format. And so do a lot of other SEOs. So let's try to translate that into a table. Okay, this is what it's going to look like. This might look very, very familiar to a lot of people who work within aggregator websites or even e-commerce websites. These type of templates are always very similar for one to the other. Specifically for our car aggregator website, we have four main templates. The first being our car search pages, which I'm going to go ahead and abbreviate as our CSPs. These are our result pages when a user actually searches for cars by a location, and it includes all of our different search filters. Secondly, we have our car listing pages. These are our CLPs. These are the listing pages when a user creates or picks a specific car and goes ahead to learn more about it. Thirdly, we have our car history pages. Now, if someone wants to go ahead and try to search car valuation or history, they enter a car registration number and they get all the information they can about it. And finally, everything else can be grouped into another template. These tends to be things like our homepage or blog about and so forth. So I need to now understand how can I figure out which template from these provides the most value. So I'm going to go ahead and analyze some data. And that's usually what we tend to do at the very beginning when we have any kind of case scenario or any technical SEO project that we want to get signed off for. It's actually understanding the data behind it. So what data am I going to actually need per template? Well, I've tried to split it into those three things. Firstly, I want to understand the number of index pages. Secondly, I want to understand the number of organic sessions. And thirdly, I want to understand the number of organic leads. With each one of these, it's very important for me to also understand what's the percentage from total so that I can understand the scale of this data that we're working with. And I'll go ahead and get all of this data for the past 12 months. Because if we only get a few months or so forth, it might be impacted by seasonality and it might not be as accurate as you possibly want it to be. So these are all fictional numbers. Let's go ahead and imagine we've done a whole bunch of analysis and we've come back and this is what the breakdown looks like from a template by template basis. So first up, you can see our car search pages. They, as an example, represent almost a third of our index pages, but they actually provide us with 75% of our organic leads. And then if you look at the very end, you've got your car history pages. And these are a completely different spectrum. So these almost have half of our index pages, but they only bring in 1% leads. If I try to color code it a little bit to make sense of it, you'll find that every other template as well as our CLPs, they represent around 10% each and they bring in a total of like 25% organic leads. But there's definitely something problematic going on here when you look at our car history pages. And the main question that I tend to ask myself is, wouldn't we be better off spreading our link equity across more valuable pages? And wouldn't we be better off having Google crawl more valuable pages? And the most important question of all is, is it even worth doubling up our number of index pages for only 1% of organic leads? Websites do not and most probably should not in most cases need to have every single page indexed. Instead, focus on indexing pages that have the potential to provide good search results for your users. And I know that this feels like a fairly straightforward example, but I promise you that every single aggregator website tends to have one of these. And it's not enough to simply be fixated on our car history pages and leave it at that. There is so much more that we can dissect on a template by template basis. For example, if I look at our car search pages, you know, from up high, it looks great. Third of our index pages bring even 70% for organic leads. I would think, you know what, that's fine. It seems to be efficient enough. I can just drop it that way. But if I go ahead and dissect it, I find that this is the amount of index filters that actually make our car search pages. And this is how many filters are indexed within each one of these filters. So for example, let's say you've got 10 colors, you've got 500 sizes, you've got four models. If we calculate that, we find that there is a total of over 800,000 potential index pages that are purely based on our search result filters. So it's fundamental that we break this down to the next level. It's not enough to just leave it at that and say that the data looks fine, but to dissect it into a piece by piece. But just like we did with our top level, I go to the next step and actually start to understand filter by filter for our car search result pages. What does this look like for index pages for organic sessions and for organic leads? The more we slice and dice our data, the more we're going to be able to make informed decisions about what is worth consolidating, what is worth the indexing and what kind of technical SEO projects that we undertake will actually have the impact that we need. So we're going to have a little bit of an intermission here. And we're going to go ahead and look at a typical meeting in the life of an SEO. And I think there's going to be a lot of familiarity around this. A lot of us have been through this before. So we're going to say hello and meet our senior stakeholder. Their name is Mark. So Mark comes up to us and says, oh, car history information pages, they are critical for our consumers. They can look up any car registration number and they get all the information that they need. So as an SEO, we come in and we say, well, yes, that's true, but they're very useless for Google. They're thin pages, they provide low organic value and they double up our index pages. So Mark says, well, you know, I don't care about Google. I don't care about this. Never mind. What does this actually mean for leads? So we go in and we're, you know, we want to show off all the cool analysis that we've just done. And we're like, you know, we're really glad you asked. I spent the past two weeks. I did this huge piece of analysis. I used BigQuery. I grouped our page templates. I calculated organic leads per template. At this point, you've completely lost Mark. He doesn't really care about what you're saying. This is of no interest in whatsoever. So you just go ahead and you summarize it. You say, well, it represents 1% of our total organic leads. In a dream life scenario, at this point, Mark would be like, is that it? You know, just de-index these pages immediately. You can take all the resources you want, whatever you want, go ahead and do it. These pages are pointless. You're right. In a real life scenario, what tends to happen is that Mark goes ahead and says, well, you need to work with team A. You have to build a strategy that can convince team B to convince team C to consider convincing team D to potentially de-index these pages. And at this point, you just have to take a super deep breath. Joking aside, I promise you there is a purpose to the story. I know that this is something we all relate to, and these are meetings that we've all been in. And it doesn't matter whether we're agency-side, and we're discussing this with clients, whether we're consultants, or definitely specifically whether we're client-side and in-house. As SEOs, we need to speak the same language as senior stakeholders to get sign-off for these technical fixes. So let's go ahead and try to recommend our recommendations now. So I'll go back to this table. We know that we want to first focus on core history pages because we know that represents the biggest opportunity for us. If I take a look at this closer, I guess I get a little bit worried because I'm like, okay, yep, that looks fair. 45% index pages, it's a lot. Organic leads, 1% that's too little. But I don't know, 10% of total organic sessions, that feels like a scary thing for me to simply get rid of. I'm personally very risk-averse. In this specific case, what good is 10% that doesn't even convert? Every aggregator or classified-type website tend to have something very, very similar to a core history page. And as Mark told us, these pages are extremely useful for our users because they want to look up different cores and different information via the core registration number. But let's be on the safe side and let's make sure that we've done all of our checks. So let's go ahead and do a few checks on our core history pages to see whether they're really worth the indexing or if we're going to lose out on some opportunity that comes from them. So the first question I ask is, do they take up lots of crawl budget? Because yes, they do represent an indexability problem, but do they also represent the probability problem? So in this case, yes, I have analyzed our logs and in an average week, 50% of Google bots' crawl requests are spent on our core history pages. So that's a lot of crawl requests spent on it. Second question, do they rank for important terms? So I analyzed our ranking data, core history pages barely rank for any keywords that we even care about. Third question, do they have unique content? Yep, but it's very thin and it's only useful for users on a personalized basis and it doesn't even convert. Fourth question and final question, do they have high quality backlinks? Nope, they barely have any backlinks. I've done a link called it. Barely anything is pointing at them and the stuff that is pointing at them is not high quality. Okay, so let's summarize everything we've looked at so far. So we're saying that our core history pages represent 45% of total index pages, 50% of weekly crawl requests, 10% of total organic sessions, which you know, it's a bit of a risk there, very thin content, minimal ranking, minimal backlinks, and most importantly, only 1% of total organic leads. Now do we feel more comfortable de-indexing them? So I walk you through this because it's really, really, really important for us to do that. It's not enough for us to simply look at the high level data without seeing whether there's anything else that might be potentially losing any form of opportunity or impact by de-indexing or consolidating certain sections of the website. So things to remember here, which is super, super key. Number one, make sure your site maps are set up to automatically remove any no index pages. I know this seems like a given, but it's something that a lot of site maps are actually not set up in that way, especially site maps that depend on a lot of legacy codes. This is an example of something that might feel very easy in principle, and we might take for granted and think that set up, but it can actually be extremely complex, and it might end up deterring or delaying the project that you're doing by several weeks or months while engineers try to fix it. And our current history pages, they will not de-index overnight, and that's something anyone who's been part of a consolidation or a de-indexing project is very familiar with. It can take a few months or more for Google to decide to flush those out of the index. The larger the website is, the longer the time it might take to the index. And we know there are crawlability ways to ready because we've done that log analysis, in which case it is really important for us to be blocking them for robots of TXT. It should be on the roadmap, but it should not be done anytime soon. We need to wait a few months to make sure that they actually flushed out of the index, or else it's not going to know that they've become de-indexed. So let's go ahead and do some prioritization. I go back to what I said at the very start, because I think this is something really important and something we all need to bear in mind. Let's please make a pact and drop all of these 100 page audit documents. Prioritization is key when it comes to working with technical teams, engineers, senior stakeholders. The first question they always ask, what's the priority of this? I personally like to use t-shirt sizing. I think it's really simple. There are tons of prioritization matrixes out there. Some are more complicated than others. I like to keep things really simple, and this is something that technical teams that I've worked with tend to use a lot of as well. So at least we're using the same frameworks, like one another. So I tend to split this into two things. Firstly, I look at SEO impact. So with SEO impact, it's this idea of how much of an SEO impact will this recommendation likely have on our overall organic leads? And then with tech effort, it's about how much technical effort will this recommendation take to actually implement? And please do not answer this on behalf of engineers. Now, this is a mistake that all of us SEOs have done before, where we try to assess the effort of something that we are not experts in. We wouldn't want engineers to come up and say, I reckon the SEO impact of this is a larger, medium or small. We wouldn't want to do that for tech effort either. So for every identified issue, what I tend to do with my prioritization matrix is I look at what is the SEO impact? What is the dev effort? And together that dictates what the priority is. This is an example of what a normal table that I have added to a separate tab in my Google Sheet tends to look like. For example, at the very top, you might find that, you know, if your SEO impact is large, but your tech effort is small, that's a high priority right there. That's a really quick win. And that's a really good win that will actually have an opportunity. On the absolutely other extreme, you might find that if the SEO impact is small, but the tech effort is large or X large, in this case, this should definitely be a low priority. If you want to learn more about my full framework, I've got a guide here all about getting tech SEO implemented. And it shares a lot of these resources. And you can just visit it on bit.ly forward slash reach dash. So let's communicate. So let's envision, you know, several months have went by, we've gotten that recommendation ahead, it's been implemented with the indexed, our CHPs, and, you know, we've got all of that signed off. So what is the first thing that tends to happen once an SEO recommendation gets implemented? Well, your senior stakeholder, yep, Mark is still around. They tend to show up and they go like, you know, our organic leads up yet. That's the first question they're going to ask you. Our senior stakeholder, Mark, will be expecting results, and they're going to be expecting it really fast. So it's really important to manage expectations. And you need to do that by not just communicating after something's rolled out, but communicating before and during an after. Underpromise and overdeliver is always how I tend to do it. It's so important, you know, don't go in with like over promises of this is going to happen to conversion. And don't go in by underdelivering what the actual output is going to be. We know that the indexing, our car history pages, was simply a stepping stone for fixing technical foundation. It's not going to magically increase our share of voice. Because projects like that with technical SEO are ones that that should have been right from the start. That's your foundation. Communicating updates is one thing, but communicating updates that are actually backed up with data are a real game changer. Let's go back here for a second. This is how we manage to get sign off in the first place by tying our index pages with our organic leads. We need to make sure that we keep reporting on these metrics, because these are the metrics that everyone is familiar with. And these are the metrics that people actually signed off with. So keep everyone informed. Update these metrics on a weekly basis. Show week on week change, one-on-month change, year on year change. And make sure that your dashboards are accessible and open. Stakeholders feel better knowing that the information is there. Even though in most circumstances they don't actually bother checking it, but if they know that these reports live somewhere, they feel better about that. And it's going to serve as a benchmark, and it can be used as case studies for future sign off. The next time you need to de-index the large section, or the next time when you dissected all of these index filters and know some more problematic, if you go ahead and you use that as a case study and a benchmark of, when we did this to our index pages and CHPs, this is what happened to these, you're able to then go ahead and set that as a benchmark to get signed off for future projects. So one final thought before we wrap up. I've personally never even worked on a Car Aggregator website yet. This was just an example. This methodology though can apply to all forms of aggregator sites. They can work on fashion, jobs, properties, you always have a CHP that's lying somewhere. When I first pitched this talk, my aim was to actually go into detail about the technical SEO requirements. I was going to go on about like parameter handling and what happens to every filter on a case-by-case basis and so forth. But you know what? Like the actual technical recommendation, this is actually the easy part. What I find really tricky is the backing up of your recommendation with data. It's the stakeholder management. It's the meetings for the sake of meetings. Those are the really, really hard parts. And those are the ones that if we don't get right, we're not able to get sign off for some of these projects that might be hindering our performance. The first time I personally worked on an aggregator website, like I completely panicked, I felt completely out of my depth and I didn't even know where or how to start. But right now, like a few years down the line, these are actually the type of websites that I enjoy working on the most. Over time, I've come to realize they all have similar problems that require similar solutions. And once we fix the problems, you know, that actually feels really rewarding. And it can be rewarding, yeah, from an SEO impact, which of course we constantly measure ourselves against. But more importantly, it's extremely rewarding in terms of achieving higher level of trust with stakeholders. And that just makes the process and it makes our life so much easier the next time round. So the past 132 slides, yep, 132, they were based on an example that simply touched on one simple metric. All I did was I tied number of index pages, which is a technical SEO metric, to organic leads, which is the KPI that actually matters the most to the business. And this same concept can be translated across far more SEO metrics. As SEOs, it's fundamental that we do not work in silos. And we can achieve far more if we measure our wins using metrics that our business actually cares about. And please remember, it's okay to feel overwhelmed working on something new. We've all been there. Thank you so much. You will find all of my slides here. They're on Bitly, Areej-moscon. I am around on the chat. If you have any questions and you can find me on the website, areejapali.com, you can find me on Twitter. So please do get in touch. Huge, huge, huge. Thank you to Moscon for inviting me to speak. And I hope to see you all in Seattle next year. Thanks.