 Hi, everybody. Cyrus here. It is my pleasure to introduce our next speaker, Miracle and Menti Archabong. Miracle is head of SEO at Aridite, one of the smartest SEO agencies that I've ever known and worked with in Europe and internationally, great SEO agency. Now, Miracle is new to Mozcon, but speaking about a subject that nearly every SEO touches upon at one time or another, but with a cool twist and that is keyword research. Can we speed it up? Can we make it smarter via automation and APIs? Miracle is going to show us how. The talk is titled, Harnessing the Natural Language Toolkit for More Productive SEO. Sounds really cool. Please welcome Miracle and Menti Archabong. Hi, everyone. Today, I'm going to talk to you about harnessing the Natural Language Toolkit for more productive SEO. Now, what does that even mean? In the nutshell, it's all about speeding up your keyword research, and why would I need a keyword research? Now, for anyone who's doing SEO, no, it always starts with keyword research. Keyword research is the process of finding out how users access your site, what are they searching for, what do they go to Google to type in, what questions are they querying before they buy your product, and usually this takes time, and I wanted to see if I can speed up this process. Now, when do we usually use this keyword research? Sometimes you need to do a big bulk of keyword research when a client comes to you at the beginning of a digital transformation project. Sometimes this is when you're trying to optimize pages, but basically you always want to understand the user behavior when they're searching for products that are related to you, and so you want to get a deep dive into categorizing things like that into topics, teams, just to get more understanding of your IA. So, how do I do this manually? Usually, I get a bunch of keywords from a suggestion tool, depending on whatever industry I'm querying now. This keyword here relates to the medical industry, and then I take it because I need to group it because I get this bunch of keywords, there's so much of it, I can't just pass it on to the client. I need to group it to understand what the user's intent is, and so what I do is I categorize it, and I categorize it by what kind of keyword is this. For instance, a simpler one would be if I have dresses, long dresses, tall dresses, dresses for evening wear, those kind of things. So I categorize it so it's easy for me to understand that, okay, it's dresses for location, it's dresses and color, dresses and length. I also look at the frequency because the frequency determines like the size of the opportunity. So whichever frequency, if colors are appearing more frequently, then we know that we need landing pages based on color type. And then further to that is the intent. Do we need a product page? Do we need a content page? What is the user looking for when they search for these keyword groups? So I can't go into all of that now because I've got 15 minutes. So you can click on this link and read a blog post on how to do keyword research manually. Now, this is an example of the frequency cluster I tried to achieve, and this is done using a pivot table just to show the client at a glance the size of the opportunity. So now, roughly categorizing like 10,000 keywords can take me 28 hours. So that's about four days cleaning, categorizing, analyzing. And I wanted to speed up this process even if by half it's going to make my life so much more easier. So cue the natural language toolkit. Now, what is this? This is a platform for building Python programs to work with natural human language. And basically what this platform does is it looks at things like classification and that is choosing the correct class to an even input. So when you put data into it, it says, okay, what class should this data be? And how does this do that? So it does it by tokenization. So it takes the data, breaks it into little chunks called tokens. And then it uses things like stemming to like reduce the number of inflection in that data. So when I say inflections, I mean things like prefix, suffix, plural. So if you have things like sleep, sleeps, sleep in, it knows that those three sets of keywords relate to sleep. And then tagging. Tagging is process like where it takes those keywords and then it classifies it by parts of speech. It knows this is a noun, this is a verb. And it does it so it knows that, oh, this verb, it helps it determine the meaning of the sentence. So if it knows that even a noun that comes before a verb usually means X, Y, Z, and then it chooses that. Passing in a simplest form is just getting machines to understand human language. So taking that language as you would say it, passing it through and letting machines then it just like things that a chat box, they use that kind of function. And semantic reasoning is just understanding the sentiment behind the words. So the two key things that this script is doing is using classification, tokenization to pull out the nouns and verbs from a list of keywords and tag them up. And why do I pull out nouns and verbs? Because I found out that most of my categories and my classification ends up with these two parts of speech and that's why I wanted to pull them out. Now, how does the script work? The script works by instructing by to look through the list of data, pull out nouns and verbs, and then categorize them in three sections. The first section is the highest occurring. So it's looking at the word frequency. How often does this word apply to? If it's the most occurring word, then it gives it the first category. The next one goes to the second category and the last category. So how do you use the script? Getting started, you need to download this Python library and I recommend this particular one. It might work with others, but the script was written with this library. Then you also need the following Python libraries. You need pandas 1.2.3 and that just helps know the script to understand structured data and pull it out. Again, I have the textbook, which is used for the classification and the tagging. So it's the one that looks through the data and says this is the part of speech that it needs to pull out. I also have XLRD and that's for formatting things in Excel because I knew that most of the tools that we have for keyword research give us like an Excel data dump. So this is an example of a script and what it's doing using like text box to look through the search terms and pull out now single-announce pro is a really simple script to read. So when I share the link with you, you can go through and read through the script. Now there's a really quick way to install the libraries. You can pip install this libraries. So I've provided you with the requirement text that just lists the libraries. So just download it and note where you've saved it. Then open a windows command and then run a command prompt. Now copy from where it says pip install and highlighted in red some part two is to show you need to paste where you saved your own requirement text. And now you go to the command link and then you copy all of that and then you run it. And that will install the three Python libraries that I spoke about earlier. Now to run the script, download the script and save it. Again, note where you save it, then prepare your keyword list. So because I've written this script to call setting functions in mind, now you need to follow the naming conventions that I've used. So there's keywords list, which is what you should tag your keywords. The first rule of your keywords, it should say keywords, okay? So there must be no spaces, it must be clean and it must be an Excel file. Not CSV, Excel, SLX, okay? Now copy the script from where it says Python C and then I'll explain it again. So some parts two is where you save the analysis script. Some parts to test is where you save the script you want to run and make sure you name it test. And then some parts to test output is where you want it to download the output file to. Now you don't need to name an output file, you just need to say, okay, download it to desktop. Now, this is an example of my script that I've run. You can see that my script, I saved it in downloads. The test while I want to run is saved in my desktop and then the output file is in my downloads. Okay, so this is a file that I've run and this is in the spots industry and this is just one section. And now it looks a bit jaded, you can see it's pulled out the nouns, it's pulled out category one to three to make sense of this data, you can delete the noun and the result column and then sort it in Excel. So you go to data, sort and then you start by category one, two, three, and then search for it. And now you can begin to see trends emerge from the data. You can see people are looking for, you know, breakfast for bodybuilding, fitness, regimes, programs. Now, there's an even easier way of running the script. You can run it via Google collab. Please, please, when you open this link and go to the folder, make a copy and save it on your drive because this is on my own drive. So make a copy and save it on your drive and you make yourself the editor and you can do whatever you want with the script. Again, upload the keywords data, you click on the folder icon and then it will show you place a download, upload your keywords data and then the first rule as well should be keywords, it should be an Excel file and the file should be named test. Okay, you can look through the code as I showed you earlier, it's a really simple code to retrieve and you can change the file reference or if you don't want to, just save it as test and then run it as normal. Now, click on tools once you've uploaded your file and then click on run all. And then after a few minutes, once it passes the file, the output script will show in the folder icon and then you just right click and download it. Now, sometimes Google Drive and Google collab can be a bit buggy, so just wait a while if it doesn't, if you don't see the output file, just run it again. Okay, now to the second part, which I showed you, which was the frequency part that helps you quickly spot opportunities. Now, this is for two industries, this is coffee and dresses. So it pulls out the most of current words, nouns and verbs and it tries to form a cluster. Now, for the coffee research I was doing, I didn't put a lot of keywords and so you can see it's split it down so granular while for the fashion industry, I put so much keywords that it makes more sense so I can see the trend here is like the length of a dress, whether it has sleeves, looking at weddings and the occasion and things like that. So you can change the amount of clusters you want, you can increase it and I'll show you that further down. It also pulls out a word cloud. So for this coffee, it says, oh, people are searching for scales, they want to buy coffee beans online, mock-up pot. So it pulls out those kind of quick snapshots that you can use in your presentation. Again, to use this, please make your own copy, save it so that you can edit it, break the code if you want, check whatever you want to do with it. And to use the files the same way as slides 32 to 35, make sure it's Excel, make sure it's clean, keep on naming conventions and the first column should be keywords. To change the number of clusters as I said before, click on cluster analysis to view the code and then where you see best results, I use five clusters there, just change it. What it's doing is that it's dividing the number of keywords by 20, getting the average and whatever is the highest average, that's what it's using to populate the cluster folder. Now there is nothing to download here, so whatever tools you use to take pictures from the screen, you can use them to grab a picture of the clusters you have and the Word Cloud right at the bottom of the script. Yeah, so thanks for listening to my presentation. I would like to get feedback. If you look through the code, you improve it or you break it, let me know what you would like to see. And I just want to say a big thanks to Eniola and Mehdi for helping me with this. Thank you.