 Hank Yelna's happy. Hank Yelna's recording all this so we better make sure that he's happy as well. Good morning First of all because so many of you were here for the first time you probably don't know me Which is good Because then I'm actually talking to people who haven't heard me talk for like a hundred times before My name is Yoast. I'm married to Mureka. She's here as well I'm the founder of a company called Yoast and a father of four I'm currently back at Yoast as interim CTO after having left only just like five months ago That was an entirely planned, but I'm there and together with Mureka We the two of us invest in quite a few companies That you might have heard of in the wordpress space as well Honestly if you want to hear anything about this See me later because that's not what we're here for I'm hoping this works doesn't really so we're gonna do it like this I want to talk about websites and the environment and All of you might be thinking what's the link between that? How does my website have a negative impact on the environment because honestly it does and Most of you probably don't realize how much of a negative impact it actually has websites are hosted by large or small hosting companies One of the largest hosting companies in the world was actually kind enough to buy Yoast Last year, but there are these are huge corporations that have lots and lots and lots of servers If you're Dutch you might have Followed along in the recent discussions about data centers That we have in several places in the Netherlands all of these data centers use tons and tons of electricity And the question is why are they using all that electricity is is your site using that much electricity as well well it is electricity usage for your website is cost by people visiting your website in your website generating those pages so You have a WordPress site. I assume I think that's what that's why all of you are here and That WordPress site generates a page when someone visits it and then that data has to be sent Across the internet to your computer and it needs to be rendered there, etc All of that takes electricity The question is how much of this can you actually control? What can you do about making your site? less impactful I'm going to use an example This is actually my father-in-law Who has a very nice very simple website? I built this for him. It's based on WordPress. It's very tiny All the way all the pages it has you can see in the menu right there and This front page consists of four files It's two images The HTML of the page itself The CSS for the page itself is actually in the in the HTML. So it's all one file and it has a fabric on So Remember that four files This page doesn't get a whole lot of visitors My father-in-law is retired There's really no reason for him to have a whole lot of visitors to his website other than the great articles He wrote which you really should check out But He had a hundred and sixty page use in the last 30 days I took this two days ago, so it might not be exactly true, but you get to the gist 160 page use Let's Consider and that we had four files per page view That should be 640 hits to his web server in the last 40 to 30 days Now I'll let you take a guess how many hits that website really got in the last 30 days Do you think it's more or less? more How much more five times more? 50 a hundred Let's see In the last 30 days this website had 608,000 hits 608,000 hits that is approximately 950 times more than was needed for the actual visitors to those pages I Can tell you from having looked at many many many websites that this is not uncommon This happens everywhere And why is this a problem? I'll show you the math later on in my presentation about how much sheo to this is but This means that the impact of this website is hugely made bigger by things that are Not visitors not normal users So what is happening here is bots search engine bots Search engine optimization tool bots Lots and lots of hackers Crawling the web to find stuff etc. Entire data centers are wasted on stuff like this This should be a very very small website with very negligible impact and Instead it is serving tons and tons and tons of pages It's important to note that this website did not change in the last 30 days It didn't send any notification to any other system out there that it needed to be crawled It didn't send any change message to Google It has all the things it needs to tell people to actually Properly say hey we this page hasn't changed. It's run on cloudflare It fully supports what we call HTTP 304 if you're not technically inclined you can fully Forget that but if you are it means that we can send not modified headers It does everything right and still it gets a ton of hits so This is me seeing that happen and going absolutely bad shit crazy because Think about it. This is a very small site on larger sites The impact of this happening is much bigger and of course search engines need to crawl the web They need to build up their indexes. I As the founder of Yoast I have done my fair bit of SEO. I Understand that search engines are actually very useful tools and that we use them to drive a lot of traffic to a lot of websites in the world so that's not necessarily a problem, but We all use Google and we might use Bing and Someone somewhere else might use Yandex or Baidu But does this Dutch website need to be spidered by Baidu? By Yandex Search engines target it at other languages and people that will probably not read it in Dutch. I Don't know One of the biggest users of this website is a h refs a h refs is a SEO tool turned search engine and They spider a lot So much and I'm thinking who's paying for all that because it's costing them money, too But all of these tools are spidering and spidering and spidering and that just causes a lot of traffic But even if we say hey, there's 10 major search engines and maybe 20 major SEO tools and some other things We get to 60 tools How is it possible that we get to? 950 times the amount of hits Where does that problem? get created and It turns out that it's actually a compound problem Those search engines and tools crawl too often and we should talk to them about that, but the other problem is that Hackers are consistently going around the web trying to figure out What is broken on your site so that they can hack into your site and the reason that that is still something that they do is Because it's meaningful It's because they actually do find lots of sites that are broken that they can hack into So if we get better at security that becomes less meaningful and they'll do it less you'd assume But the one I want to focus on now is that sites have way more URLs Then you think Because you'd think for an average site five pages 30 blog posts Maybe a home page added no in total 35 40 URLs for that website, right in reality in wordpress This is much more likely You've created 30 blog posts You've added some tags to it not realizing that every tag that you create creates a new archive page in wordpress and Therefore creates a new URL to crawl for all those tools You've created some category pages you automatically if you don't disable them with a tool and WordPress core doesn't allow you to so you need a plugin to do that you automatically have date pages You automatically have author archives So instead of 35 URLs to crawl you have way more URLs to crawl ready and This is only the stuff that you could actually click to in your browser Because hidden in each of these posts WordPress adds way more stuff We WordPress generates a comment RSS feed for every post So the comments to that post live in a single RSS feed that Can be crawled And you're like but search engines aren't interested in that, right? Well, if you'll ever look at your own crawl logs, you'll see that they crawl your comment RSS feeds every day If not multiple times per day The most visited pages on yours calm it crawls the comment RSS feeds for those pages every 30 minutes Those things never change because no new comments come into all posts, but they still crawl them at 330 minutes We also have two o embed URLs o embed is a very nice protocol that you might remember from having a Vimeo or YouTube link You drop it into your editor and it turns into a video or whatever it is. You can do that with WordPress posts as well There's two ways of doing that with JSON and with XML and WordPress conveniently adds both of those to the source of every blog post out there and Search engines crawl every one of those links every single day Within those two o embed URLs is one embed URL Actually on most WordPress sites you can just add embed to the end of a permalink And you'll get the embed version of that page Search engines find that in those o embed URLs and then crawl that too We also create a short link for every page It's just a redirect, but you have to boot all of WordPress to get to that redirect So WordPress is is launched every time some some search engine or other But decides to look at that short link and see where it goes and then there's a REST API link This is one of the newest additions to this whole feast Where you can get the REST API version of that post I Don't know why WordPress exposes all of this on every page without ever asking you whether that's needed or not I think we shouldn't but every time I start having that commitment that discussion a Certain guy called Matt tells me that he wants to keep it Then on top of all this every hit to your website or every page view usually has a Few more side effects than the website of my father-in-law because most of y'all probably have a lot more CSS files and javascript files and images Than just for files for a page view So normally the impact of a page view is much much higher I've taken the liberty and I want to make absolutely sure that the team the word camp and L team is not getting mad at me So I've taken the liberty of using their site as an example I want to say that they can't change most of the stuff that I'm going to show you Because they host it on word camp org and if they were at liberty to change a lot of these things they would So it's not on them this is on word camp org itself But if you view the source of those pages you'll see these links No normal users ever going to see all any of this But it's there and every bot on the planet We'll find it and crawl it and and go deeper and deeper and deeper This doesn't end I Started looking at bot logs When I was working for the Guardian. This is about a decade ago now I was working on the migration from Guardian co UK to the Guardian calm And we started generating logs For how often does Google actually visit our site? And where is it looking and what do we which pages do we need to do something with when we migrate from the one domain to the other And I've been doing SEO for a while But I was still shocked Because in large sites like that what Google does is determines a list of hub pages Where all the new sites posted so the tag pages and a couple of others and it would crawl those pages About 85 of them every two seconds Literally 24 hours a day every two seconds Bing did the same At that time being sent about 2,000 visitors a month to the Guardian I Don't know how any of you ever look at your analytics and have looked at how much traffic you get from Bing But getting traffic from Bing at all is Pretty uncommon in the Netherlands But it would crawl at that rate it would crawl ridiculously fast and We were having the discussion like we are literally using entire servers That we're paying for Just to give Bing the contents of our website Shouldn't we do something with it? Well, I think we do so what can we do? Well to have less crawling We need to create less URLs or at least stop linking to them because the only reason that they're crawled is that we're linking to them That REST API endpoint that we're linking to is very useful if you're using it But funnily enough most of us never need that link in that page to actually use that REST API so why not just remove that link and And and just remove a whole lot of crawling if people actually want the comments RSS feed to your pages And then who we are still uses RSS on a daily basis And this isn't a room full of gigs. I see about 10 hands. I mean outside of this room There's not a whole people anymore that know what RSS is But of those people who uses RSS regularly who of you uses comment RSS feeds I mean, I was a very very big user of RSS back when Google Reader was still around But I never used to comment RSS feed It's absolute nonsense that we expose that all the time so We're going to disable those extra URLs Now I had build a nice plug-in for that and then I decided maybe we should just roll this into Yoast SEO So we did And we're going to I'm going to show you how that works I also want to talk a bit more to you about tags and how we clean up more of those URLs So if you use Yoast SEO and otherwise there are other means to do this But it is actually fairly hard to do most of these changes You can Disable a lot of these URLs and I would suggest that you do because you're actually Reducing the amount of crawls that your website is taking It also means that Google is spending the time that it's crawling your websites and crawling stuff that you are actually interested in Because a lot of the time when it is crawling your comment feeds and your RSS feeds. It's not crawling your new pages and The new pages are the things that you want to show up in Google's index. I also want you to have less tags I've been reviewing sites as an SEO For 16 years now every Well one in three at least of every website has more tags than it has posts If you realize that the sole purpose of tags is To connect posts to each other by topic Then having more tags than posts is meaningless and The problem is that everybody uses tags as they use tags on Instagram, which is a completely different thing Because on Instagram you're connecting your posts to a much wider array of posts on that tag Here you're just solely doing it within your site so please Clean up your tags and While you're at it clean up your categories too Because WordPress gives you two taxonomies. I Don't know why it does that. I don't think it's necessarily a good idea that we give people to taxonomies for a small website The fact that you can add more is great, but why you need to I don't know but If you do clean them up and also Don't feel bad about yourself Yoast.com uses tags exclusively Why well because we had categories to and then we had an SEO tag in an SEO category Yes, even us at Yoast makes stupid mistakes like that. It happens. It's not a bad thing But make sure you have only one. There's really no reason to have both Clean that stuff up. It would actually do wonders for your SEO as well to clean that stuff up and Make sure you redirect them all properly to pages where they fit if it put if a tag only has one post in it Delete it redirect it to the post that was in it problem-solve and Don't create all those tags a new Now WordPress has more bad habits and attachment URLs are my favorite Because I've never made mistakes with those Yoast SEO has a feature to disable attachment URLs Which you should use and if you don't use Yoast SEO then your SEO plugin has this feature too because all of us have it Why well because attachment URLs are a stupid idea when you upload an attachment to WordPress a file or something else WordPress generates a URL for it The thing itself already has a URL and the attachment page is fairly often not used But it is very often linked to anyway because WordPress does this automatically in some cases Get rid of them, please. It's it really is better for your site and then we have date and author archives If you Run a website with only one author Your author archive and your homepage archive are the exact same thing Only if you have a lot of authors and a lot of posts do all author archives start to make sense My favorite example of an offer archive is on a website called ma.tt from Matt himself. I Mean the author archive is useless, but it's there he's the only author and This is all stuff that WordPress does wrong, but unfortunately there are other bad actors in this space as well So on all three of these pages WordPress would serve the exact same page and The FBCL ID parameter that you see in the second URL there You might recognize When you click on in Facebook to a page, it's adds this Why does it add this? Well if you have a Facebook remarketing tag on that page Then Facebook can connect the remarketing tag To that to that visitor and knows who was there etc. Where they came from First of all, I think it's not a very good idea to do remarketing in general That's an entirely different topic. I'll happily talk to you about but This this FB click ID and Google does the same thing from Google AdWords and then Google crawls it itself So it sees links with a Google click ID from Google AdWords and starts crawling those URLs to check whether They're the same as the URL that it's actually pointing to We redirect most of these things away on Yoast.com, but it's actually fairly hard to do that reliably So you have to look at your own website and see like hey, which of these are coming in The sad story is that that means that most of you would have to look at your logs Which is fairly technical and then create redirects for parameters that you don't use Not something that anybody is going to do. I think Luckily, there are some ways to optimize that and We are talking a lot to search engines about how to improve this First of all, I want you to and I'll share these slides online later So you don't have to write this down, but if you have a Dutch website then Baidu Chinese search engines You probably don't need Yandex. We talked about that one Russian search engine. You probably don't need it either Sasnam is actually a Czech search engine It has still survived It is as old as Ilze, which some of you might remember in the Netherlands And I now see the youngsters from Yoast thinking what was that? Because people don't know that anymore, but some of those search engines survived in those countries and we need to help search engines Because search engines need to figure out what to crawl and what not to crawl Now we are talking to Bing about this a lot because Bing was the worst offender and Has actually improved a lot over the last few years. I Was emailing with Fabrice who's the head of Bing before this presentation a very nice French guy who leads up Bing and has actually made it into an OKR a Performance indicator for his team how efficient they crawl. You know why? Because it makes sense economically to them as well Less crawling Means a lot less money spent So they're working on something called index now. I had very Stark criticism of that when it first came out. They've changed the standards Luckily based on the feedback that we and others gave them And it's now actually fairly good And what they're trying to do more and more is move to a protocol where you are telling the search engine I've just created this page. I want you to index that Funnily enough as an SEO that means we've come full circle We went from URL submission Where you had to send the URLs to you on a form of the search engine to get them to be indexed To them crawling all the web the web by themselves and we're going back to your URL submission again Because it doesn't make sense for them to figure out which pages actually should be having visitors Now another thing that's very important for that, especially for the search engines that actually support it everyone but Google Is last modified in XML sidemaps? We've been doing that in Yoast SEO for forever But WordPress core is now once again working on getting last modified into the XML sidemaps for WordPress core, which is very important Because then what a search engine can do is it's gonna grab the XML sidemap from your site See which pages changed and Then only crawl those It's simple you'd think But Google keeps on saying that last modified is not Stable enough and too often goes wrong for them to actually trust the signal So they just rather keep on crawling everything. I Hope that changes at some point now. I'm going to make a Slightly big jump, but maybe not as well. I'll show you later why I've been mentioning before that the WordCampanel website has a lot of CSS and JavaScript Etc. And that doesn't make it necessarily faster and I want you all to build faster websites Why not and why well not just because the user likes it and because Google likes it But because faster websites use less co2. It won't save you crawls in fact For a lot of search engines if your website starts responding faster and being faster They will maybe even crawl your website a bit more Because they just assign a number of crawls to your site to have a Bucket of pebbles like and they take one out every time that they crawls they crawl your site So they might not necessarily crawl that much less But the difference is Staggering Remember those six hundred and eight thousand hits that my father-in-law's website had in the last 30 days This is a purely hypothetical number. I'm afraid it would actually be higher for the WordCampanel site But if the WordCampanel site had those six hundred and eight thousand hits over a year It would produce 80,000 kilograms of co2 we can talk about our gas bills and we can talk about Everybody needing to drive electrically if you're building websites. This is also a problem. You need to fix That's a lot of cups of tea and I don't even drink tea That's a hundred and forty one trees for one website and if you're thinking that's weird Well, look at how many data centers we have and where they place those data centers We have a couple of very big ones in the Netherlands in Groningen next to the energy sources My father-in-law's website is really fast. I Had some fun making it really fast But it only produces 56 kilograms of co2 with the same amount of hits in a yearly basis That is a ridiculously big difference That is why you need to make websites faster even more so than because we're all too lazy to wait for another second It's just better for the environment It's 50 times less co2 just by being faster being a faster site Now what if we optimize crawling? I've shown you some things that you can do yourself and we are continuously talking to the search engines about it but if we would optimize crawling for real and We would still give them give all those bots a fairly big allowance of crawling our site So instead of the 640 hits per month We'd get based on the page use we'd allow double of that. I Think that's generous. I Think it's ridiculous that we'd need anything more. It would use 0.12 kilograms of co2 on a yearly basis We can actually change this together. We can make The whole web use a lot less energy And we should It's over 25,000 times less co2 than what what would happen if We kept on going like we did and we used designs like the word campanel site with all of everything it has Which is much more common by the way than what the side of my father-in-law is So I'm coming to the end of my presentation and I'm going to ask you What are you still doing here? I Want you to talk about this think about this? Start blocking bots that you don't need create less URLs and Complain to people crawling you excessively Preferably on Twitter and other places where everybody can see it and they care Because if I on my own can shame Bing into actually crawling better Then you can help me do that and we can do a whole lot more I really think that we can make this more Well better for the environment better for ourselves and the funny thing is that in the end This should actually bring down the cost of hosting Because right now you are paying hosting for those 608,000 hits Because that's what those hosts are serving and if you have larger websites and you look at these These stats you'll find that a lot of your servers are running only for bots So you can save your own bill as well Was that thank you and now Wendy was still a bit surprised sorry We do have time for questions, I think I think for a lot actually that's a good question I should have thought about the marketing of this better That's a good that's a good question We should think you we should come up with honor like right now actually probably But optimized crawling would probably be a good idea to start with and and just tag me and I'll Float it along. I'm at Jay Devolk on Twitter Devolk, yeah You can do a yo stew, but they're done a whole lot of people start working on it. So just do me So the first thing if you're already using Yoast SEO go into the settings Go into the settings and and enable all those crawl optimization settings We don't dare to enable them by default for everyone because if your site depends on some of these being there We might break it and we don't like to break sites So that's the first thing I do then secondly depending on your audience I would start blocking bots There's really if you're if your site's in Dutch as I said Why would Cessna more and they are crawling your site? You can be almost a hundred percent certain of that And then start improving your website speed in the end that the the combination of those is what's me I mean what makes a really big difference and It's also Start thinking about what you're putting on online there and whether it's it's really needed And I think that's also a discussion that we we need to have more I think a lot of the things that you you need to do manually now. We should actually do in WordPress core Um Well, keep talking about it keep asking about it Ask ask the WordPress core team, which most of them will want to do this It is it sounds on No, no, no really So I honestly funnily enough the things that you're doing to optimize for the environment here are Entirely good for your SEO and Most of the time your SEO benefits Thinking about hey, what needs to be online and and what doesn't need to be online and which message do I want to sell very? Well, so I know I didn't have a change of heart at all In fact, I've been talking about this topic. I think for a decade now It's just that The time seems right for all of us to actually start doing a bit more about this It's it's been a problem for quite a while and It's also The search engines have Invested a whole lot into making better algorithms to search and in Understanding what's on the page etc? But they're basically crawling in the same way as they were 20 years ago. So it's time for Well, some more improvement in that area from their side And it's also time that we as well as the world start thinking about what do we want to allow? Do we think it's okay that? Well, you could go home now and start a crawler and just crawl the entire web and Do we think that that is a good idea that everybody can just do that or should we have some rules around that? Should it be a bit more opt-in instead of opt-out and There's a whole there's a whole lot of discussions that should be had around this and I'm not gonna answer all those questions today because I can't But I think it is something that we well, the that society is ready to talk about now and We have all these discussions about data centers, but nobody ever talks about this and The reason that those data centers are there is mostly because of this Google's data centers are for the last vast majority of things doing this and sending out YouTube movies That's what they do. I well, I'd hope so at some point, but PR has never been my strong point. I Think that would actually be a very good route to get some of those remover URLs in core removed Yeah, yeah Yeah, so what WordPress does you can register it in image size and if you register it Then every image you upload will be changed into that image size as well. There's no There's honor off. There's no in between like saying hey, I only need these for these types of images or I only There are very good solutions for fixing this so far Matt is blocking all of them He has literally did just as we cancelled web P Which was which would have been a great improvement and To come into six point one Or by default for everyone so but there are good solutions for this and so far they're being blocked because that would Hamper some people in what they're doing. I don't agree That might be clear But yeah, no, you're right if you go into your fee me you see a lot of image sizes That's not a good idea because your server will let well every upload you do takes longer Because what it does on your upload is it changes to it? Literally makes a version of that image in all of the different sizes that you need And you don't need them And you know that but your theme doesn't know because it's a stupid little thing The links stay behind. I didn't know whether that's necessarily a problem I mean if it's a stock image that you no longer have in your website That someone can't reach anymore that probably doesn't really make that much of a difference I'm My opinion is that we should change more of these things for users and not have users changed it Because because it's just too hard. So we should just make this better in court or Whatever editor it is you want to use And subpoena up front I Know I actually think that you'd be shocked to see how much traffic this is It's hard to calculate that right but that's true for all of these things But if you consider the 608,000 hits that we talk about for my father-in-law's website That is one of the tiniest websites in the Netherlands My wife is now mad at me, but Now but and we have a ton of sites if you consider that the Netherlands has more than 2 million registered Domains that probably way more. I don't know if there's anyone here that actually has it But there there are tons and tons of sites It's doing this everywhere and on most sites it crawls a whole lot more Yos.com gets this amount of crawls on a daily basis almost And that's a huge site, but it is like at the same time There are a lot of sites like yos.com. There's a lot of in-between This is a huge amount of traffic That being said that doesn't mean that all those video servers should not optimize their video streams Of course, they should also do that But the thing is that we talk about data senders as something very far away from us And it's actually They are there because we do stuff And we can't just complain about not having wanting data centers and then just keep on doing whatever What it is we're doing? Well, I work for one I'm an advisor to Newfold and I intend to spread this message far and wide we host Millions of WordPress sites and Just sites in general if hosting companies together can fix this that would be ideal, but it means us talking to search engines and Luckily, we're now getting to the point where I can talk to some of them and they are getting they are being Open to the idea of of improving this But it's also very hard for them to build something that they can actually spread across the web What Google is involved with WordPress because in WordPress they can optimize these things and then they can change it for 30% or so of what they're crawling on a daily basis There's more sites that run on WordPress, but they don't update So if you don't and if they don't update then Google doesn't get the benefit either so Yes, we should talk about this Hosts should come together and talk about this with those search engines more and everybody should talk about this more and also to their customers Hosts should be helping their customers baking faster websites because that helps as well and Maybe host should start help help their customers build robots.txt files and just block stuff and I'm myself I'm getting very very close to going from Opt-in to up from opt-out to opt-in Just blocking all bots in robots.txt by default and allowing the search engines that we want to crawl our site but that is To do that a EOS SEO would be Political to say the least So we're not there yet