 All right. All right, let's stretch it out stretch it out. Everybody stand up. Ah jumping jacks Okay, our next speaker. Those are terrible jumping jacks. I apologize our next speaker. You're excited I'm excited. I've introduced him so many times and every time he kills it without fail Yes, I am talking about Michael King aka I pull rank runs. I pull rank agency in New York He's a former rapper. He's a polymath. He does everything Mike that Mike wrote a blog post a few years ago called the technical SEO renaissance Still one of the best SEO articles ever written In the history of SEO still relevant today and go read it on the Moz blog. I'm happy that he's speaking here today Let's kick it over Mike King. You are must was right Artificial intelligence is definitely going to destroy the world It just needed a little help from me when I saw that his company open AI had a technology that could generate messages as Though a human wrote them. I was intrigued not impressed But when they said it was too dangerous to release to the public. I knew I had to get my hands on it Now here we are I've had enough machines running GP T2 to clock every protocol and Messaging system in New York City from the stock market to air traffic control with coherent gibberish for the past The world has fallen into chaos And no one has proven savvy enough to solve my puzzles and take back control. It's simple really Step right up to my three-ring circus and try your luck at ranking in my virtual search engine Send in your so-called SEO gurus your marketing geniuses. Let's see what they're made of. Oh Looks like our favorite repeat contenders back. Let's see if her 71st attempt is any better than her others It's been a hundred and eight days and no one has beat my game Only the future of the city depends on it Oh This time I know I've got it. I've seen all of the combinations. So I'll definitely make it through Okay, it's the e-commerce site First let's get data and a list of competitors run screaming frog from the command line and extract some key data points and Parse it out to BigQuery since the site is so big lots of duplicate content Two page types are exactly the same Hmm Last time I canonicalized those the net effect was a traffic loss of about 15% Let's set up a split test to make sure that the page-to-page canonical is worth deploying Okay, no to the canonicalization. This is actually pretty fun Normally a test like this could take 30 days, but this game simulates that in a few seconds Looks like there's been some sort of migration, but the internal linking structure has not been updated Luckily, he made this easy. All of the URLs are the same except for the removal of the product sub-directory. I Can fix this quickly on the database layer. It's basically like find and replace in a Word document Let's back up the database in case something goes wrong Okay, let me get these going quickly in a spreadsheet Remove the product string from the URL and set up the mapping Now I can run a batch string replace in my SQL from node I'll just open the spreadsheet as a CSV and loop it through BAM. It'll take a while to run, but everything is fixed She's getting better this time She knows about submitting a request for a crawl increase from the crawl team in search console. Who knows Maybe she'll finally get to the next ring There can't be much more for me to knock out on this challenge. This site is pretty much optimized on page Now let's set up some Google ad scripts with a few business rules based on absolute rankings Serf features from SERP API and external conditions to get integrated search going I'm sure that's got to give me some brownie points. You get me out of here. I Wish I could make the site's internal linking structure update in reaction to these changes, too I forgot that external data is tied to the real world and the real-world data is No, better than the first thanks for playing we have lovely parting gifts Actually, we don't but you knew that already And hence I can help you beat this Who is this based on some of your techniques, you know exactly who this is you're right But I haven't learned anything from you in years my cool King. Listen. You're the only player left now What do you think happens when this mad man gets bored of you I Went to meet up at your office once. Is this still in the same place? Of course. I'll be there when you're ready to win this thing You guys making plans without me now I get to hang out with some has been SEO Liberty so we can save the world together This better not be a waste of my time. I haven't been here since that I pull rank meetup This place is really falling apart Mike What did you invite me here for to film some whiteboard Fridays? Where are you? You know magic is just creepy after a certain age man. It's nice to meet you, too. We've met I know Kasey I also know why you keep going in there. Cool. Keep it to yourself. Why don't you tell me what happened to you me? Yes, you you used to be the go-to guy Honestly, I really admired your work before or after I lost my entire team to that mad man Oh, so you did notice Yeah, I mean I didn't take out an ad in the New York Times or anything But I wish it was me instead of them every single day You're gonna have to do better than that. I don't I have a family. It'd be irresponsible of me to go in there I thought I taught them all better and I thought they'd be able to beat the game But they couldn't and I got to live with that help They all got further than you but they don't have what you have my superior skill set and coding ability Nope your relationship with the clown. We're not gonna talk about that. Do you know why you keep failing? I'm guessing you're gonna tell me and then we break into some sort of training montage This ain't a movie Mike. If you know so much, why didn't your team make it the whole way through? You keep failing because you're thinking logically You're not accounting for the accelerated chaos that the dark clown is introducing into the serps He's got all the same rules and all the same systems is what you'd expect in real life It's just that everything is faster and there's a layer of randomness All the sites you're competing with in the simulation are being immediately reactive to your change So why am I not getting out of the first ring? Well, here's an example you fix for the internal linking structure changes was smart But I think you know you can go a lot further than that and if it's possible then the dark clown's game expects it From what I've observed he's looking for the most complete solutions possible Have you ever seen what eBay does with their internal linking structure? The legendary Dennis G blocked about this a couple years back There are parts of it that are directly reactive to what happens in the serp If something ranks in page two they programatically build enough internal links to that page and pop it up the page one a Lot of the big e-commerce sites have figured this one out But every site has a different threshold for the number of internal links that you've got to build See position changes say what now let's take a step back for a second The web is just a series of programs The output from a program that is your web server and front-end stack is the input to the program that is a search engine Obviously our programs need some composite metrics Those can be used to drop some changes to the logic of the output The first step is identifying keyword oils in other words How do we identify which URL on a site owns which keyword you can pull position page authority and clicks or traffic and devise a keyword ownership? And I didn't send my team in there. They all went in there on their own accord Some of them were the first to go in but none of them were given a second chance only you got that Taking those three values position traffic and page authority or whatever URL level link authority metric You're comfortable with we run this equation on all the pages ranking for a given keyword Using that we can determine the best URL for each keyword throughout the site And then we can adjust the weights needed to get the best values. What about the on-page targeting? What if a page is better targeted in the page title or on-page copy than the one that the score determines as the owner? Yeah, this is just a quick and dirty But you could add edit distance from the page title entity salience or LDA scores that augment the keyword ownership score Oh that easy, huh? Well now you have a way to quickly know which page should own which keyword with that list of keyword owners You could bulk update the anchor text for links across the site on the database level like you did with the URLs It'd be a string replaced on the database level within the primary content block based on the anchor text Although you may want to review it a little closer than just a string of place You may want to use a DOM person and you can add to subtract links at scale based on where pages rank If something ranks on page two inject a wealth of links into the body copy and other link blocks and voila You got page one rankings, but you should a B test until you find that sweet spot in the number of links that you need to build first Okay, that's a solid tactic. What else you got old man? Hmm. Well the dark clowns game is just a simulation. We have one too We built it on the back of the common crawl and the way back machine data It's just one of the many use cases of the common crawl and if you're not familiar with it The common crawl is an incredibly powerful data set of billions of pages in their meta data You could use it to build a link database or for broken link identification Or web scale data extraction or to build a seed set of URLs for anything really I can show you some things that will help you get through all three rings of the circuits But first let's talk some more about composite metrics Those will really help you drive scalable optimizations Okay, two more composite metrics These help with the scalability of making decisions around whether or not you should optimize or retire a piece of content When you perform a quantitative content order in a tool like url profiler There's a lot of quantitative metrics to choose from If we take some linked metrics like page titles meta descriptions and word count And then combine them with numbers of images and videos and then sprinkle in some analytics metrics like Time on page unique page views and bounce rate Then add the total number of links and social shares as well as the mobile and page feed scores We can compute a composite content performance metric. You're sure you got enough metrics in that formula Similarly, we can take the ranking search volume and keyword difficulty of keywords associated with a given URL And then you can calculate a keyword performance potential Let me guess y equals something something times l n something plus something It's fundamentally the same equation with different values plugged in R is the ranking score and the values change based on the ranking Now once you compute the cpp and cps values You compare the two and determine a threshold of whether or not something is performing well enough And if it's not does it have enough potential for you to optimize it further or should you delete it? The rules can be as simple as if cps is less than 50 and cpp is less than 50 then we delete it If cps is greater than 50 and cpp is less than 50 Then we keep it as is if cps is less than 50 and cpp is greater than 50 Then we optimize. Okay. Okay. Cool enough with the scores though. Give me something else. I can use You know, you don't have to do things in the order that everyone else does them, right? What doing things the way that everyone else does them is going to get you everyone else's results Find your own way You don't have to start with the first ring. I I I didn't know that Okay, I've mentioned getting data out of google search console a couple of times now Yeah, i'm surprised you like that data so much It is what it is and it's the best keyword level data We can get as far as your site's organic search performance the ui limitations are obvious You can only get a thousand rows per filter on the api side. You can get 25 000 rows per filter The real question is how do you filter? Okay, so how do you filter? Actually, the first question is where do you want to store it? The answer to that is big query because when we talk about manipulating this data We want to make it easy to pull things from across the google ecosystem. Yeah app script data studio sheets make sense In fact, all these metrics we've been talking about should be pushed to big query That way we have persistent url and keyword level metadata that can be used Programmatically to improve a website big query, right? Okay, now back to getting data out of search console It's done most effectively using a data structure called a tree or a keyword tree The thing about using trees though is that you end up getting a lot of the same repeated keywords And you need a seed set of keywords to calculate trees from so to shortcut that I have a json object with the common prefixes and suffixes to use as filters Keep in mind that you can do 200 queries per minute and 100 million queries per month Okay, that was cool Now let's speed up your analysis with app script since we're storing all our data in big queries We can create a series of google sheets and functions to automate the analysis You can use big query to do heavy lifting for your etl functions and bring back the roll-up data to your google sheet You should consider creating template tables and aligning that with the output from the tools You can make sheet templates and build functions that you can run regularly For instance, you could calculate all those composite metrics and store them after you've collected the data Or you can open a csv run a series of functions and then print the results to your sheet What are some of the common things you're doing with data? De-duplicating counting summing looking for patterns searching Contatinating the usual Okay, good. How about filling in reports preparing deliverables email and data? Well, I am an seo Sheets drive analytics docs slides calendar gmail all have services you can connect to via app script You can basically have anything in the ecosystem be a trick Ah, I'd only been using ad scripts to do integrated search work based on your absolute rankings model I was pulling data from apis, but it looks like this app script ecosystem is much more built out than I knew Yeah, and it's all basic javascript. That's why I recommend it over vba or trying to tie everything together in python and using colab Okay, let's talk about entities and topics You're gonna need text analysis to do optimization and scale and to understand a relationship between pages There's four things you need to master named entity recognition lda edit distance and ingram This seems like a spot where I should be taking notes Well, those are the most actionable things to compute when you're considering optimization and where to drop links There's some of the primary nlp operations that search engines will use when processing pages We do this with a combination of libraries tools and coding functions If you want a shortcut you can use phrases open search api to quickly derive this based on search If you aren't technical Their front end will give you just the same data in the simplified manner. Yeah. Yeah, give me the technical version Okay, I personally recommend doing the name entity recognition using google's nlp api Because your data will then be limited to the entities that google knows and cares about Everything else you can use spacey and a couple other javascript libraries to compute You'll need the spacey lda edit distance wink nlp Tills and ingram packages to be able to build out enough of an nlp pipeline I love how this is all javascript. Yeah, python is great and all but node is much easier to get into production Okay, so from competitive comparisons You want to pull SERP data using SERP api? Then crawl the results extract the content run it through that nlp pipeline and then store the scores for comparison With ingr site you do the same thing and then use that to inform internal linking And surface the data in the cms for content optimization And then I could do edit distance against the keyword owners in the various tags and use that to further determine how optimize the pages Exactly the use cases for this are pretty crazy This really helps me turn content into a more technical effort Yeah, we call it technical content optimization at iphone rank Okay, ahrefs has an endpoint for identifying broken link targets You can monitor that check the content of the dead pages in your database the wayback machine or previous common crawls And programmatically set up redirect rules at the edge in your application or in the server config If you have the content make sure you look for the closest entity matches on your site Because it's highly likely that google is comparing the redirect target to what they had index Oh That must be why when sometimes we redirect broken link targets. There's no impact It defends That's the longest i've ever heard you speak without saying that All right, let's talk about scraping if you're anything like me You spend more time fussing with x path than actually extracting content X path is the only thing that annoys me more than regular expressions Yeah, I prefer css selectors. They're more elegant and more intuitive Chrome will give you x path or css selectors for a given element in the inspect tab, but css selectors just naturally click them Yeah, I don't know why I wasn't using these before I know right Chrome will also generate the commands for making an actp request and node curl and the command line You can find it in the network tab by each request. Nice. I didn't know that was a feature Yeah, I just noticed it myself recently Okay, so we got two types of clients for our purposes curl, which is an actp request library and a headless browser Which is a browser. Yeah, I've been hearing you talk about these for almost 10 years now Yeah, yeah, okay curl is something that every sco should be familiar with It's a command line library that powers many of the tools used to crawl the web and it's available for every program language I use it mostly as a low overhead way to check things like actp headers and response bodies I find myself using it most often to check for cloaking. People are still cloaking. Oh, absolutely It's much more of a gray area than it was before Here's a few commands that I run to do those sites checks seems easy enough I take it this doesn't come native with my machine No, but it's pretty easy to find and is a JavaScript library So you can grab the server side render code and store it as needed Now for headless browsing we use a JavaScript library called puppeteer. It's made for controlling headless chrome I typically use it in nodes. So let's install it real quick CDNs and some reasonably sophisticated websites have started to aggressively check for headless browsers So you want to get the puppeteer extra stealth plugin to make sure that you don't get blocked I always knew you had a little hacker in you Mike Can't leave hacks alone the game needs me Okay So if puppeteer unless you're building a crawler what we really want to do Is capture the full dom of all the pages that we want and save them the big way Then we can manipulate and extract the data that we need later The coupling tasks like this really helps you when you need to scale Now that we've got the pages in the database we can take our time parts in the html Using the css selectors using a library called cheerio It's basically jQuery for the server side If you're having any trouble scraping open the page right click the element then grab the css selector Pop that into your code and then abstract the text likes up Nice that's really easy to get anything out of any page especially when they're davis script transformations Exactly Is this what working at ipo rank is like I feel like I just drank from a fire hose And I just want to play with all these tools and libraries Depends who you ask All right, I'm getting some sleep You keep practicing and committing what we discussed to memory. We'll make a game plan in the morning After tomorrow night my hack will go global and cast us back into a technological dark age I have something for you. Don't you want to know what happened last night? Yeah, yeah, that tells me everything I need to know Google glass they still make those Of course now I notice that they operate on a protocol that the dark clown is somehow not in It'll make it easy for me to communicate with you and see what you see This time we going together. Yeah. Yeah, don't get all dizzy moving on me. I'm ready I see you in a different order this time Yeah, yeah, yeah, he gave me the media site this time Okay, after you run your screaming frog crawls from the command line pull your gsc data And then compute your keyword ownership content performance and content potential scores I wasn't practicing all night. I got this. I'll set up the automated test Most media sites are using single page applications and bogged down by ads these days Check gsc to make sure they aren't blocking the api endpoint for google that can cause rendering issues Oh and check to see how long scripts are taken in the performance tab that can cause google to cancel execution That's a good call Yeah, and google adheres to robot side text on ad networks So when you're checking your speed measures, make sure you're ignoring the ads that are blocked by robot side text, too Got you. Okay. I've found some conflicting canonicals being set between HTTP headers and xml site maps. Just fix that. Let's do a crawl request and we should be set I've done everything I usually do. I always get tripped up trying to do something extra. What am I missing? Hmm It's an e-commerce site. He's looking for speed hacks. How are the core web vitals looking? Nothing new under the sun. How about the code coverage? Surprisingly every page type on this site has a hundred percent code coverage. Okay Let's use route pre-render chrome updated it to be a no-state prefetch So it doesn't load everything in the invisible tab the way it used to but it'll still speed things up I saw about an 83 speed increase on my site from it So what we can do is use the instaclick library to pick the next page based on where the mouse hovers That's a slick hack Work let's optimize on the database layer, too. More of the performance games will be there than on the front end What's the tech stack? PHP 5.6 wordpress 4.7 wool commerce 3 and my sql 5.6 on aws Okay, let's upgrade all the software. Let's get them to php 7.4 and bump them up to aurora Let's set up op caching implement memcache create indexes Optimize queries and preprocess some of the derived data Nice, we're getting one second low times now, but still no indication that we're done yet. Hmm. Okay Here's one that I don't share often. Let's pull a list of competitor URLs and scrape them for inventory In cases where the competitor inventory is low, let's use that to inform how we boost the internal linking structure Hey, let's hear if a structure data fancy meeting you here. I was going to say that I had my suspicions. You always were a bit Off and you're still making the same mistakes And I thought you'd be the one to solve this by now We can't all be right So what this is all an elaborate scheme to get back at ipo rank for not hiring you You were always too confident It may have started that way, but I've beaten the whole ipo rank team and I've beaten you 72 times Now I just need you and mike king to know I've surpassed you both So this is a rat beef Everything has always been a joke to you yet. You've come here for the past 72 days Well, there's nothing else to do and I'm the only one who can save the world Or maybe it's because you wanted to know what happened to jamie. Well, this has been fun. I'm out This room looks just like the others. It's just green You'd expect this guy to be a little more creative. We don't know what happens when you make it this far I'm gonna stand by to help you work through anything that comes up. Cool I probably won't need you, but at least this way you can be a footnote to history Wait This is one of the sites I optimized before The only thing I couldn't solve was how to get the viable copy on the category pages at scale Hmm is the json that composes the page accessible look through the xhr request in the network tab and dev tools Yes, I got it. There's a lot of features here that could be used to generate copy But how this stuff is the future there's a whole subset of natural language generation called data to text Where people are taking structured data and turning it into paragraphs Unfortunately, the implementations are mostly academic right now Some help you are. Well, at least I picked a cool outfit if this is the last time anyone sees me Now we can take what we do know and apply to this Content spinning is it 2007? Exactly Transformer technology works by taking prompts and guessing the next word It tends to get facts wrong, but the copy is otherwise coherent We can find soon a model by scraping copy from competitor websites And then get the structured data in the sentences using a library called rosé nlg or by using wink nlpu cells Then we pull variants on the phrase from the paraphrase.org project Using each sentence from the spinner You can then have a prompt to generate a paragraph using gpt 2 Then boom, we have descriptive text. I do the bad guys always give us a way to stop them The dark clown saga came to a head last night in coney island Local hacker kasey robbins was able to crack the code after 73 attempts 36 missing search marketers were recovered at the scene. No word on the dark clown's whereabouts But law enforcement is on a manhunt throughout the tri-state area Stay tuned for these stories after the break wall street rallies after the dark clown and is it time for google to be broken up? Tech entrepreneur elan musk is quoting controversy once again His artificial intelligence firm open ai is said to be ready to release its gpt 3 as an api for commercial use No official release date has been set. We'll have that story plus the weather