 We're about to start in a minute and a half. I'd like to welcome you. Thank you for being here. This is my very first DrupalCon as a speaker. It's my fourth as an attendee. I've done a lot of presentations in the past. I keep track of every single one of them. This is presentation 172 for me. But, but, but, but, but, don't get it too excited. My first DrupalCon. So it took me some time and lobby work to get me in here. And today I'm going to talk about using JSON Web tokens to cache in varnish. I'm going to do all the disclaimers now because we won't have any time when I get started. It's only a 25-minute slot. It's going to be hard for me. I babble and I ramble and someone is going to walk in here and cut me off whenever it's done. But I want to tell you that this is not the best solution out there. And a lot of people object to what I have to say. But under the circumstances that I was put, it was a life saver. And I want to, by show of hands, who has heard of varnish before? All right. But who hasn't? Because that's even more important. All right. Oh, don't worry. I have a slide dedicated to you and just you. And when we reach that slide, I will look at you, okay? All right, all right. So it was marked as a beginner talk. But then I contacted the people from DrupalCon and asked to make it or to turn it into an intermediary talk. But they haven't changed it on the website. And thinking that this is a beginner talk about varnish, I don't have the time to talk about varnish per se. But you know what? I'm going to cut a deal with you. I'm the author of Getting Started with Varnish Cash. It's an O'Reilly book that was endorsed by Varnish Software. And it's a 101 book that builds up from zero to wherever you need to get. I don't have the copy on me, but I'll raffle a copy. If you follow me on Twitter, I'll give you the credentials and just tweet me. I'll choose a random winner and I'll ship you a copy of my book. All right? Sounds like a plan? All right, let's get going. Let's press the button. The problem is I don't have a lapel mic, so there's not much walking around here. So let's... I know you can hear me all right, but it's just a matter of getting the recording in. All right. Before we start, there's a set of truths that need to be established. Should I close these doors? Yeah, let's close the doors. Thank you very much. So there's some facts that need to be established, and I'm going to need your help to make it happen. You can choose to raise your hand, shout, nod, look down at your phone. It doesn't really matter. Let's do it. Slow websites suck. Yes. All right. Web performance is an essential part of the user experience. Slow websites are just as bad as websites that are down because in the saturated market we live in, if your website is slow or down, people just go elsewhere to buy the product and then you're screwed. Yeah, yeah. Somewhat. Drupal's cool. Yes. The underlying technology, PHP, is cool. I'm a PHP guy at heart. I've met people here throughout the PHP scene, and I like it, but there's some trade-offs. When you put heavy load on it, it tends to crumble. It's not the fastest language out there, but it is damn flexible, and it's the fastest way to get stuff done is through PHP. Now, to solve these issues, you can throw money at the problem and add servers at the problem and agreed your infrastructure should scale along as you grow. That's a fact. That's a given. But just throwing servers at the problem is not really the idea. So we end up caching to reduce the impact of a server or of an application on the server. Still agreeing? Still with me? Okay. So why would you recompute every single time if the source data hasn't changed in coming requests to Apache or Nginx? Send it to the PHP runtime. PHP has to boot up some modules. Then Drupal gets initialized. Then you have to connect to your MySQL. See the picture? See where I'm heading? Every single time and the data hasn't changed. A lot of people think that caching is just a way to cover up for poor architecture design and then they'll shit on PHP and Drupal for not being as fast at other languages. But I don't agree because caching is an essential part of your architecture. Under normal circumstances, and this one is for you. No pressure. This is the way you'll communicate. You'll directly interact with the PHP runtime that hosts your Drupal application in some way, shape or form. There might be multiple servers. There might be load balancers ahead. There might be back-end servers. There might be separate MySQL servers. There might be a lot of people interacting with the runtime. Now by adding a touch of varnish, you have an intermediary system that sits right in front of your server and where both the user or the back-end server have no real clue that there is an intermediary system. The client thinks he or she is talking to the back-end and the back-end thinks that he's receiving requests from a client. Where in fact it's just an intermediary and it stores computed results and just feeds them to the end user upon subsequent requests. Still Whitney because that's an important concept to grasp. But the majority of the people raise their hands when they heard about varnish. Now in case you're still in doubt, in case you are still in doubt, here's a slide I often use. Here's a slide I often use. Ever seen, who's seen by show of hands the 90s blockbuster bodyguard featuring Kevin Costner, Whitney Houston. Yes, thank you. So, it's a little bit of a flash. Whitney, two-pole. Right. When in doubt, think of Kevin and Whitney. That being said, hi, welcome everyone. My name is Taze. I'm Taze, putting on Twitter. Please follow me there. I'm doing this experiment every time I speak. I add this slide and ask people to follow me and I always see this slide bump and there's an extra incentive link to it. There's going to be a prize draw afterwards. So tweet me afterwards if you like it and I'll hit you up with something. You might have never heard from us. We're market leader in the Benelux area. I also dedicate 50% of my time to our enterprise brand which is called Sentia. I'm the author of Getting Started with Varnish Cash in a Riley book that is endorsed by the people at Varnish Software and that's a copy I'm going to raffle to someone who tweets me today, tomorrow or later. I will ship it to you when I get back to Belgium. So, yeah. Let's... Let's dive right in. The story starts when I got contacted. I work at a hosting company by a big Belgium TV station and they had a website to host and we were in the running to host their new website which was written in Drupal 8. But they had this request and they said can you help us out with our Drupal 7 website and host it because it's causing us lots of pain. It's a very popular website that hosts some of the most popular TV shows that they publish right after the airing of the show. So, as soon as the show is aired they upload the video and I looked at the stats and then up until 80,000 people go to it to get those videos. Videos is not really a problem. It's hosted on a CDN. No real worldies. But the website itself is Drupal 7 with Varnish and I have to admit that the hit rate is pretty great. Nothing wrong there. But there's one more thing that I want to point out is that there's a huge requirement that they had and that was hey, we're a TV station, we have commercial goals, we want end users, people who watch full episodes of the program not teaser videos or interviews, full episodes, we want them to log in using Drupal 7. I can tell you that the metal of their servers at their old hosting provider nearly melted, right? 80,000 people, two, three servers maybe, as soon as you log in in Drupal and you know this better than I am, I'm not a pure Drupalist. As soon as you log in, the session is initiated, the session cookie is set and the cache is bypassed by design, by default in Varnish, Varnish will not cache cookies as soon as it sees a cookie it will just say hey, I'm not dealing with this, I will send you to the backend and I will bypass the entire cache. But even if you take a specialized VCL file, a Varnish configuration file for Drupal will not bypass it because it implies that the user is logged in and is seeing user specific content. So that's important. There is a compromise, however, and they did that. Whenever those really popular shows started they disabled the login module to prevent the servers from completely melting down and torching down the data center. So my idea was what if we could create cache variations for logged in users? So you go to your URL and if you're logged in you can see the URL content, in our case the video and if you're not logged in you're seeing your typical Drupal login screen where we force you to log in. That's the idea, that's the mission of this talk. Now you know it's better than I do that the only information we have is that session, that ID but all the session data is stored in the database. So we need to find a way to identify a logged in user without accessing the backend and that last part and I guess this is the most important slide in my presentation. The idea is to push session information from the server to the client and we'll use JSON web tokens for that. And I got introduced to JSON web tokens in January by this guy, a really clever guy, Marco Piveta. He works at ROV, he's very much involved in the symphony scene, he's like figurehead there and he talked to me at the conference I organized, I organized a PHP conference in Belgium and he said you need JSON web tokens and I listened and he showed me what JSON web tokens look like and this is an example of a JSON web token, it looks like one big blob of do but in fact it's nicely composed out of three distinctive parts, color coded in this case, separated by a dot and all these bits and pieces are just base 64 encoded JSON. The first part is the header, second part is the actual payload, the pink purplish part is the payload and then the blue part is a signature. There is a signature involved and it's an HMAC signature with SHA256 encryption to make sure that the data remains untampered because this will be client side stuff and we all learned throughout history that we can't trust our end users so we need to make sure that the payload that is being sent has been signed off and this is what it looks like, this is an example of a JSON web token. The header contains an algorithm in our case that's HS256 meaning an HMAC signature using SHA256 hashing and that we'll use in the end it also contains the type JWT then we have the actual payload which all has like, if you noticed it really small names ALG TYP SUB EXP and then all the stuff you want to add. The idea is to keep it as short as we can because all that data has to go across the wire. The more data you insert, the bigger that's not really so beneficial so keep it short and sweet and in the end we're using that HMAC signature to base 64 URL encodes our header do the same thing with the payload concatenate it with a dot, do an HMAC signature and add that secret key. The secret key is known by two parties the issuer, in our case that will be Drupal and the validator in our case that will be Varnish. The end user, the client has no clue what the signature is and rightfully so. If you want to play around just go to JWT.io and you can fiddle about they even have a Chrome extension where you can paste a JWT and you'll see what comes out and you can do it in the inverse way as well. You can throw in JSON and it will turn it into a web token and there's even a way to validate the secret key. Blah blah blah is an invalid signature. What we'll do here in our case is store it in a cookie and that's where the objection starts. A lot of people don't like cookies for that. A lot of people think that JSON web tokens have no place in a browser based environment and for people in API land they regularly use bearer authentication tokens. This is not something your browser can handle we usually deal with basic authentication so this is a way of shifting server side information to the client side. But there is an additional benefit it is a cookie and it's accessible by any language that has access to the browser including JavaScript. So what we can do is read stateful data without performing Ajax calls to the back end. It's there. You just have to do base 64 decode read the payload and it's JSON the JSON and JSON it's pretty identifiable by JavaScript. There was a custom Drupal module built for that that creates the JSON web token alongside the regular session information regular session cookie still remains there and there was a hook in there that makes sure that the template reads off to JSON web token. I did not write this this guy wrote it. Andreas Dereca was brought in by our client to replace the old agency and to just fix stuff up while they were choosing a new web design agency and a new Drupal agency for their Drupal aid site and he did a sterling job you can follow him on Twitter right there and he open sourced this module right here JWT cookie on the GitLab it's not GitHub this time but GitLab there's a sub module in there that does some example stuff and this is the JSON web token that he's introduced to us the issuer ISS is my website I will take this website down soon because you might screw with my data because it has simple passwords yeah this is the issuer that's the host name of my website JTY is the session ID and this should match your Drupal session ID. Next up is IAT issued at just the UNIX timestamp specifying when the session was started and EXP that's something we can use to determine whether or not the session has expired and we continue on UID that's the ID of our Drupal user the ID from the database roles which is convenient because I don't want to cache admin users I want to make sure they have all the preview possibilities by directly so as soon as I see authenticated user I know the user is logged in when I see administrator I'm going to bypass the cache and then you have some data the last part which will be interpreted by JavaScript that's all the stuff that Andreas did meanwhile I wrote some VCL code and the VCL code uses modules too and it uses a VMOD VMOD is a varnish module something you have to install on top of varnish and it's called VMOD digest and it's responsible for all the core stuff and this is where it gets tricky we're going to go from zero to oh my god in just a couple of seconds as Samuel Jackson would say in Jurassic Park hold on you butts let's do this this is boilerplate code we have to specify that it's VCL 4.0 we'll import some modules being the standard module, the variable module the cookie module and of course our digest module now all this VCL code when you run will compile it, not interpret it compile it to a shared object C++ that will be attached and linked to the varnish binary so that this is tremendously powerful and the syntax looks a bit like C++ and C but it's just the main specific language and the cool thing is that these modules don't just implement C or C++ they expose an interface in VCL so you basically enrich your language next thing we're doing is generating an access control list with a subnet of all the internal users they're allowed to access crons, install pages, update pages all the others will ban oh we haven't reached the end yet and then we have the back end in this case our web server is running on the same box as our varnish in production you might want to split that up what have we got here there's a lot of codes on the slide but we'll go over the parts that are important this is the receiving and this is where we receive requests and this is where we mix some of the custom stuff with the basic stuff you'll find in every other VCL file you have at Drupal so what it does is it checks if it's a get or a post it checks if there's an authorization header all that kind of stuff will be dealt with will only cache get or heads because of idempotency reasons and because of state reasons we don't want to cache authorization headers and we'll be careful with the cookies I'm not using typical regular expression magic for the cookies I'm using the cookie module and that will fetch my cookie we bypass or whenever we see the PHP session ID cookie or the dedicated Drupal session ID cookie we'll just bypass but in this case we will not do that we'll interpret this and as you can see here we're if we go a bit lower and I try to highlight it the cookie the Drupal cookie is variable in name so we need to find a way to fetch it first so I'm doing some regular expression mumbo jumbo looking for sess and then alphanumeric data and as soon as I figured it out I'm going to store that name in a variable and I will use this to filter out the cookie so we'll remove every single cookie that we don't need so you know all the tracking cookies all the Google Analytics stuff we'll just throw that out because that will just screw with the process and we'll keep the cookies that are important to us in this case PHP session ID no cache CI session CI session uppercase off token or JSON web token and of course that variable session cookie all dressed gone and if it turns out that after removing all those cookies we don't need that is just an empty string we'll chop out the cookie entirely and we'll do and this is the important part it's in red call JWT and that's a custom sub routine that I've written where all the validation happens and there will be lots of code on screen I'm warning you please don't have a stroke we continue this is all typical Drupal stuff besides the if the roles so we don't know nothing about JWT up until that call JWT point after this we're well aware of what's happening we have a roles variable you know that the role came out of the JWT and as soon as an admin is there we will bypass the cache and the end result of it all and you'll see that on one of the next slide is that I'm setting a custom header X login and X login is either true, empty or false and if it's true we know the user has logged in and we can fetch that information from the JWT so if we're trying to access the login page but you're already logged in we're going to redirect you to slash user that is something we're not doing using rewrite rules because we can conserve a connection every web server connection we can conserve we'll do that so this is just regular expression ht access rewrite kind of stuff but all done in varnish so we're matching URLs and redirecting crews so we don't have to consume any of that stuff so if you're going to be fair you could just do that using ht access or engine x rewrite rules and cache the result but I'm being careful I'm being prudent here the thing that really matters is in the bottom you're seeing return hash so that means and that is an instruction in varnish to force varnish to cache even if varnish doesn't like what's happening because varnish doesn't like cookies but we're forcing varnish to say hey I'm not going to be using varnish but the top one is useful because a lot of people like using SSL who uses SSL or TLS on their website majority right did you know that varnish does not support TLS SSL? thanks for contributing I like the interaction here great crowd great crowd so we need to terminate SSL before we enter varnish so we usually set up an HA proxy or a hitch or pound or engine x or anything that but to avoid getting stuck in an infinite loop we have to announce that there needs to be a cache variation for the X4ded proto header X4ded proto means that's a header being sent from the place where the SSL gets terminated and that announces the protocol we're using because even if we're using HTTPS the internal connection will be HTTP and for Drupal there will be no way to identify whether or not there should be a redirect to enforce HTTPS so we should announce this and create cache variations and meanwhile we should deal with this in your code if you see an X4ded proto header and it contains HTTPS that you render HTTPS based URLs we'll skip the rest and we'll just move forward we're almost done here right six minutes to go we're going to skip forward to the toughest part and I don't expect you to understand this again this module is online and I'll share that VCL code too what we're doing is reading the cookie and we're using the cookie module to read the JWT token and then we do regular expression magic and you remember that there's three distinctive bits header payload signature and that's what you're seeing in the first line I've marked it in green we're getting the first group with regular expressions out of it and that group is the header and then we can fetch the type and the algorithm by doing just fine and replace action using regsup in varnish regsup is a sort of substitution method and we fetch the data we need in the JSON object I won't take the time to go over all the regex magic it's just too cumbersome next up we're fetching the payload second group and what follows is the signature we get the signature this is the third group but what we're also doing is creating the expected signature so that we can match whether or not the data was tampered with and we do that by using our digest library digest and then we get the four URL no pad hex yada yada yada you take the key you take the header you take the payload you chop it all together and there's a signature coming out and then we could compare and that's what we're doing next up we're getting the payloads which is received from the raw payload we base 64 decoded it becomes JSON we look for the expiration date we look for the JTI which is our Drupal session ID we look for the expiration date the expiration date for these variables is just usable data all of the garbage has been trimmed out you have usable data that you can recycle to make certain decisions decisions being if our user ID is not a number something went wrong and we cannot say that the user is logged in or if the expiration date has passed you're dealing with an expired token so that user ID the value of it doesn't really match the JTI or when the signatures are forged then you have a clever person trying to inject data to figure out some privilege escalation of some sorts but whenever all these conditions are met the user is considered logged in and here's a decision we're making this will be replaced with whatever the pattern you're trying to match for our client those were the full episodes of the Drupal session ID it's no two if you want to access no two and you're not logged in we're going to redirect you to the login page it's a big decision to make this is where it happens you're going to access no two going to get redirected right final bit because I've someone is going to cut me off and no not yet I've mentioned cache variations because that was the goal right and all this Drupal code to determine whether or not the user was logged in and here's the final piece you need Vary headers are powerful HTTP headers I would advise you to use them you can issue them in Drupal and you can say vary on xlogin now xlogin should be a valid request header but since the browser is not sending it but varnish we can throw the vary header back and varnish will create a nice cache item for that URL for the anonymous user and to finish it off we need to talk a little bit about Drupal this is Drupalcon right we need to talk just a bit of Drupal Andreas it's not me who did it Andreas used some modules he used the varnish module of course he used the key model to store the JSON web token he added the HTTP response headers and UI to set custom cache control headers on nodes which is convenient because my vision about varnish is all about empowerment so that means you as a developer because I assume most of you are developers should have control over the cache without having to write custom configurations agreed there was lots of custom vcl but the goal is when you use varnish to have as few vcl as possible and use HTTP best practices whenever you can then we have the two modules JWT cookie and that example stuff and let me show you some screenshots this is the homepage that example because this is all this is not the case in production like that he trimmed out all the custom stuff for the client and just made me a proof of concept the JWT example cookie displays a timestamp and that's something convenient if you refresh the page and the timestamp remains the same you're using the cache that's a trick I use for debugging and he is reading the value of the JSON web token and we're not logged in but that's not really a problem because there is no real cache variation on that page when I try clicking on the button the logged in user only content to node 2 I'm getting redirected to the user page I'm not logged in I log in and as soon as I'm logged in I'm seeing the cached version and I'm seeing that I am actually logged in it's a pretty stupid example but I had to include it and behind the scenes I'm not going to go through all the code but he uses a composer package the else the lkubuki or whatever you pronounce the JWT library I can see all the fields he is setting here and you can extend that if you want more data so it's convenient I would advise you because it's Drupal 7 and this site is no longer online it's now the Drupal 8 version which was made by an agency who had caching in mind it doesn't use that anymore so this is very much usable in Drupal 8 land but I wonder if you should do slight refactors I guess you have to so hit me up if you have ideas to convert this to Drupal 8 and then in the end he just sets and downloads and validates everything oh I'm getting the signal and that's okay by me because I would like to thank Andreas again for creating that module for us and the client you can download it there again the idea is that you push session information from the server to the client using a technology like varnish if you want to know more about varnish buy this book wherever you buy your books right amazon wherever you get it or as a subscription service if you want to buy your books I'm going to raffle one of those away so afterwards just hit me up on twitter I'll pick a random person I'll ship you that book all my presentations and the video footage of this will be listed on my website as well as the slides I'm available on twitter on instagram and oof lots of pictures there I would like to I'm going to step here having the data about which user is watching which videos