 Okay, hello everyone. Welcome to the Drupal SEO pitfalls and how to avoid them. I don't know if we could do a short show of hands. How many of you are developers over here? And how many of you are in content marketing or marketing in some form or content? Okay, so mainly developers, couple of content marketing people. That's okay. Today is going to be more of a technical SEO kind of presentation. So we're not going to talk about content marketing. We're going to talk about Drupal, specific Drupal problems and how we can solve them. To be more specific, we're going to see 11 Drupal issues and how we can solve them. So a short presentation about us. I'm Brent. I'm a Drupal architect, Drupal trainer and Drupal developer at DrupSolid. And I've spoken at a couple of Drupal conferences already. Hi Walter, I'm an SEO strategist and evangelist also at DrupSolid. And yeah, SEO is my kind of thing, so I like to talk about these things as well. So let's just dive right in into the first one. And the background image is giving away the solution for those who are paying attention. But we'll see what the problem exactly is. We're going to talk about public entities. So by default, entities might have a publicly available URL. So we're talking about a carousel which has some sliders or some module which has team members. And each team member might be a separate node which is publicly available on its own URL. That's something we do not really want. It results in low value and thin content and these pages could be indexable by Google. You don't want these pages to be indexed because they don't add any value to your site. And one of the most basic things to do in SEO optimization is making sure you have quality content. So every page that does not add any value, you should remove that. So the publicly available nodes is not something you want. The solution for it is actually pretty easy. You should prevent those pages from being accessible. This can be done with a number of modules but we prefer to use the rabbit hole module. That way we can set up a node to be not accessible or redirected to another page or certain other options. Next up, something more about entities. So all pages should really be an entity. Now why is that? Often pages are generated based on content located somewhere else. And these pages are not like an editable node in the backend. So examples of that are home pages or overview pages, things like that. You want to be able to configure meta tags and XML sitemap inclusions for each page. So you really want to do that. And for overview pages or home pages, sometimes this is a bit tricky. So you want that page to be an accessible node as well. There are actually two good solutions. The first one is the layout builder. Those who went to the previous session already saw the advantages of it. It allows you to add overviews like a news overview page to a basic page where you can set up your URL, your meta tags, all yourself. So that's one solution. But this solution, as you may know, might have some problem with translations. So you have to install some other modules to keep this working. Another solution is to use paragraphs. We prefer to use a block field or the overview field module. That one also allows you to add overviews to a certain page, which give you again the advantage to add meta tags or other things to your pages. Next up we're going to talk about indexable internal search pages. So we're talking about basic search pages, the search result pages, not like a commerce site which has a lot of facets and things. We're just talking about the basic search pages. So by default these can often be indexed by Google as well. In theory this leads to an unlimited amount of pages on your website which can be indexed by Google because the query parameter can be added to the URL and then you have a unique URL which is indexable by Google, which is something you don't want again because this is low value, this is thin content and you want to remove that from your website. Google also implicitly mentions it in its quality guidelines. You can click through on the link, I'll make the presentation available and you can see that Google itself mentions this, that you don't want to have search results pages indexed in Google because it's just not a good user experience. If somebody is on a search result page in Google, clicks on a result and he lands on another search result page in your own site. That's not something you want. The first solution is when you're using views for your search page, not like you're using notes like we said earlier, then you can install the meta tag and the submodule meta tag views. In the view you can enable the meta tags and then set up the page to prevent being indexed and links being followed on the page. This one we don't advise because as we said earlier, we prefer to use notes. So if you're using a notes then you just set up meta tag and then as you can see on the right you have meta tags and over there you can set up the same fields as well, but it's a lot easier than in the views which aren't addable by the backend user while the notes are easy addable. And this setting will of course add just a no index and maybe a no follow parameter to your meta robot tag in the front end so Google will not index that page. So test environments and pages which can be indexed by Google. Development staging environments, things like this are often accessible for users and often they are indexed as well. So sometimes this is because of a configuration issue, sometimes maybe a little bit of developer laziness, we really don't know, and temp content as well. So paragraph testing pages or Lodrum ipsum pages, anything like that. These are all things we see on an almost daily basis that these things get in the Google index and that's not something you want. I think we can all agree on that. Like for example, if you have a commerce site and your staging environment is indexed and people are starting to order products which have zero value or one dollar value, that's not something you want. These are all a couple of screenshots of pages that are indexed in Google and these are all Dev and Staging environments or paragraph testing pages. You don't want these in the index. The solution is again pretty easy. For live environments, when you have a test page on a live, preferably you un-publish it because most of the times you don't need it but if you really can't un-publish it, again, meta tag and prevent the page from being indexed. If you have an environment like Dev or Staging, we highly advise you to install AC password so that the site is password protected. You might think Robots TXT is enough if you prevent it from being indexed but later in the presentation, Water will tell you why it's a bad ID. Next up, we'll talk about assets or websites resources that are blocked by Robots TXT. Sometimes, website resources like icons or images or anything else are located in a folder that's blocked for crawlers in the Robots TXT. In SEO, we want Google to understand and view our page just as a regular visitor would view our page and if there are some resources that are blocked in Robots TXT, we are telling Google you can't visit that resource. You can't crawl it, you can't look at it, don't do anything with it. So if you use those resources in your websites, you have like a small black box in your website that Google doesn't know about. So it can't fully understand your site the way a normal user does. That's not something you want. So you want all resources to be publicly available for Google. So make sure all your assets and resources are in a publicly available folder and you can keep an eye out on Search Console for any error messages or notifications telling you that Google can't crawl your website fully. Next up, module overload. Well, we all love Drupal because of the amount of modules. If you need something, 95% of the case, there's a module for it, but that might not always be a good case. Yeah, so be careful not to overload your website with too many bulky or unnecessary modules. Each one that you add could impact your website's performance and you want your speeds to be as fast as possible. So if you don't use a module, just remove it from your site entirely. Just a quote from the official Google Webmaster blog saying that your site speed really impacts your ranking in Search Engines. So this is Google explicitly telling us make sure you have a fast website. So if you don't use a module, just remove it. It might improve your website speed a little bit. Yeah, the solution is actually more you. You need to think twice before installing a module. Like, do I really need this module or is there an easier way to fix it? And if you, for example, test a module, but it turns out it doesn't really fit your needs, uninstall it and remove it from your site. Don't leave it on there because you don't need it anymore. And it's a bit contraceptory, but there's a module to check for unused modules. So you can use that one to check unused modules, but please uninstall the unused modules because otherwise you're just adding another module, which we really don't want. Some module inception. So redirects. Redirects are also something to pay attention to. If you don't pay close attention to your setup, it could result in redirect chains. So multiple redirects following each other. This is bad for a couple of reasons. They are not search engine friendly because when Google visits a page and it returns a 301 or a 302 redirect, what Google will do is add the page following it to the bottom of the list of its pages to crawl. If you have a very small site, that's not really a problem. It's just short-lista. But if you have a big website, then the list is very long. It might result in your content not getting updated frequently in Google because a lot of redirects are in that list. So that's not something you want. So you want to limit the amount of redirects where possible. For example, this is a screenshot of a Chrome plugin which shows you the redirect trace. And this shows a page visitor going to a website which then redirects to the www.version which then redirects to the secure version. So HTTPS is added. This actually should be combined into one redirect if that's possible. So we limit the amount of redirects. This case, what I showed is actually the default one that's coming with Drupal. So you might all have seen the HGACS file that you have to uncomment a couple of lines to redirect from a www site to one without or the other way around. That will work. But there's actually a better way to do it if you think along. It actually has to be split up in a couple of things. For every specific use case, you might need a different line. But you have to think along with your website like what redirects do I need to have and then set it up for your own site. Because this one will work straight out of the box. This one might need adaptions for your site. Next up, talking about security leaks impacting your SEO. So if you allow public file uploads in some way it could result, if you don't pay close attention it could result in lower organic traffic For example, a public file upload could be on an HR page where people can upload their resume things like that. You want to have some form of authentication on this upload so it's verified that spammers can just spam your form and add hundreds or thousands of files to your website. Because sometimes, depending on your configuration these files are uploaded and then indexed by Google. So we have a screenshot here from our competition which we blurred to be a bit friendly to them. And they have a lot of torrents and hacks on their life site. So these are just spammers or robots that uploaded in a field that wasn't secure. So this results in cracks and torrents and things like that on your website. If Google notices things like this it could punish you with a thing that's called a manual action. And a manual action is an official Google employee which is explicitly saying this website we're going to rank it lower because it has spam on it. So it's really something you don't want because once you have a manual action it's hard to get rid of. You can do it but it's pretty hard so it's not something you want on your website. This is a quote from Google again from the official support site and the bottom part says if a site has a manual action some or the entire site will not be shown in the Google search results. If you get hit with a manual action and your traffic drops by immediately 50 or more percent that has a huge impact on your business. So it's something to watch out for. The first solution you probably all have seen this box on several sites is the recapture one. I advise you to install a module for recapture. But I do warn you the default one recapture that most of the people use actually has a slight problem. On every page that has a recapture form it is able to cache. So actually in our company we use simple recapture. It's less advanced than recapture but enables caching. So that's a big issue we think with the recapture module. If for example you have a newsletter prescription form on the bottom then no page will be cached. Which you don't want of course. Second solution is private files. You have to know when there's a public file upload do the files that are uploaded have to be accessible by users or not. If not put them in private files then there's no way that those will be indexed by Google as well. Also a small note if you're using private files place them out of the Drupal's web root should be common knowledge but never heard to mention it. Next up something important. The difference between robots.txt if you block Google there and the difference or the difference with a no index because many people think a no index is the same as a robots.txt block which isn't really the case. If you block or if you add a disallow in robots.txt it impacts crawling not indexing and the mid-attack block is the other way around. That impacts indexing but not crawling. So it sounds pretty weird but Google is able to stumble up on a link on an external website that is blocked by a robots.txt but still index it. So if Google finds a link that's not in its index Google will most often of the time say okay I'll add it to the index regardless of whether there's a block in the robots.txt or not because robots.txt doesn't impact indexing. So the result will most likely be a snippet that just has a URL description and no title but it's just your URL on your site which looks very weird I see you all looking at it sounds weird and does it really happen in practice, do things like this actually happen because it could happen in theory but does it happen and it actually does more often than you should think. This is a screenshot of Drupal.org which has a weird home slash home URL in the Google index without a title and without a description of the exact thing we were talking about. So if we dig a little deeper and we look at their robots.txt we see that they added a comment Googlebot picked up strange home page URL somewhere and they disallowed the slash home slash directory so they thought they were removing that page from the Google index by adding it to the robots.txt block but it doesn't impact the indexing so now you're just saying Google you can go to this resource and even though if they will place a meta tag on that page it will never get picked up by Google because they're telling Google not to go to that page so somewhere along the way they messed up and they should have used a no index meta tag instead of the robots.txt Next Google analytics Google analytics is used of course to see how your SEO efforts are paying off so it's an important thing as well and correct data in Google analytics is also very important if you don't have correct data then you shouldn't be using Google analytics so if you or anyone on your team see sudden spikes or drops always pay close attention to them because there could be a configuration issue if we look at something like this this is a screenshot of Google analytics the amount of users and we see a ramping up we see a spike of users the marketing team might say hey our content marketing efforts are paying off we are generating a lot of website traffic but sometimes that might not really be the case this screenshot specifically is from customer of our own and we noticed that around the same time the visitors increased the Drupal EU cookie compliancy module was updated and the result of that was that each page view of each visitor started a new session so when one visitor visited 10 pages it resulted in 10 visits in Google analytics instead of one visit or one user with 10 pages so our users amount of users spiked went through the roof but it was incorrect data there is a fix for this there are a couple of fixes the first one is a two-step fix first of all we're going to anonymize our visitors IP addresses to be somewhat GDPR compliant we can do this in Google analytics or Google tag manager either way we will remove or we will anonymize the IP address and then the second step is to whitelist the Google analytics cookies used in the Drupal module itself so we have to do both of these things to make sure our data is correct the second solution is more the developer approach there is a patch available for the Google analytics module that will be compliant with the EU quick compliance module and it will prevent the snippet from being loaded until approval was given so it doesn't start tracking users until they have clicked the CT accept button while in the previous solution it tracks users but anonymous this won't track them unless they have pressed the OK button which sometimes is better sometimes is worse so those were about 11 points of specific Google analytics or SEO cases we're going to end with some rapid fire best practices so we won't delve into detail but just some best practices always to keep in mind so use the Google analytics module or the Google tag manager module, not both one of the two aggregate the minify, css and javascript files where possible make sure each page has a well configured canonical tag this is something that's pretty hard to do but we can't go into detail because of the time but come visit us at our boot number 13 and we will gladly explain how to correctly configure a canonical tag use part auto for nice URLs always follow up on the amount of pages that are indexed by Google if it seems too high maybe some rabbit hole setup is not correct and you need to remove separate pages from the index if it seems too low maybe you have no indexed all of pages which you don't want okay we have to stop so last one were check your sitemap and some checklists so that's everything we don't have time for questions but we feel free to visit our boot we are at boot 13 just across the entrance or visit us anytime thank you