 The first of our speakers is going to be Giles Thomas. Giles started programming Python when founding a business. He wanted to revolutionize the spreadsheet world by making spreadsheets programmable. And then he tried to sell them to financial companies. That didn't work. Then his team moved over to producing the Python system that they wanted for people like themselves. And that sold a lot better. Giles is also playing the guitar. Today, however, he is going to take you on a journey. The journey of an HTTP request through a platform as a service. Please welcome with a hot applause Giles Thomas. Thanks for an excellent introduction. Thanks everyone for coming. The thing that we wrote, the thing that we thought that we were going to build, the Python system that we wanted is called Python Anywhere. It's a platform as a service. It does lots of things. One thing that it does is that it hosts a fair number of websites. So just wanted to get a general feeling about how much people in the room know about running websites. How many people here are responsible for the continued operation of a website? Maybe a personal blog or company pages? Okay, so a fair number. Yeah, that's maybe 50%. Let's bring that up a little bit. How many are responsible for several websites? Okay, more than 10. Still a few. More than 100. Still one. More than 1,000. Okay, I'm not going to keep going up because it'll be really embarrassing. It'll turn out that you actually run more websites than we do. We have 24,241 websites running on Python Anywhere as of this morning. Actually, probably a few more by now. And we've got an infrastructure to run this. It's a simple platform as a service infrastructure. I'm going to go through a description of it pretty quickly, touching on a few of the details. But what I'd like to do is leave quite a lot of time for questions because I think which bits are interesting to drill down into probably come more from you guys than from me guessing at what you're going to be interested in. The websites we have, they range from very basic things where somebody has basically started using a particular framework. So somebody here has been trying out Web2Py. And maybe they're going to build something. It'll get a visitor a day. Maybe they're just a hobbyist experimenting. The next stage is maybe sites which get a couple of hundred visitors a day. This guy is learning Mandarin Chinese in a particular way. He's sharing his lessons with a few other people. So a couple of hundred visitors a day. We want to spend almost no resources on the first kind. Enough resources to keep this kind of site responsive even. Other people are running popular technical blogs. This is my colleague Harry's A Bay of the Testing Goat. It's a companion site for a book he's written, which is awesome. And you should totally buy it if you're interested in test driven development with Python and Django. But he gets maybe 2,000 visitors a day. So the site needs to be responsive. It needs to be up all the time. But it doesn't need that much more than that. It's not a high volume site. This is one of our most fun customers. This guy is running a site and it's insanely popular. It gets dozens of hits every second pouring through there. It's actually quite a good selection of music even if you don't like getting out of your head and various things. But it's, again, quite a popular website. It's not Amazon. It's not Google. But it's got to be there. It's got to be responsive. And it's got to be maintained at an affordable price. So how do we do this? Well, here's a very basic set of logos. These are the tools that we use. We do Linux, obviously. We use Nginx for our load balancing for all of our HTTP needs. We use Uwizgi, which is an absolutely awesome product which manages Python processes for you so that they can serve up web applications. It can serve basically any web application that uses the Wizgi protocol. So that's going to be Django. That's Web2Py. That's Bottle. That's Flask. All of the big ones, possibly except for most tornado installations which doesn't play so well with Wizgi. We do use Redis. I'm not going to go into much detail on that today. We also use Lua for a certain amount of scripting. Now, I know I'm in a bit of danger here for talking about the ones of Lua at a Python conference, but we do use it. It's awesome for what it does for the specific use case we have. Now, you'll notice that I've got Django and I've got Python there. I didn't mention them. Well, all of our infrastructure uses the tools that I've described so far. All of the configuration is managed by Python. It's managed by a number of Django applications which basically spit out the configuration files that all the other stuff needs to run and keeps the cluster live and doing what it's meant to do. So I promised a description of an HTTP request journey through this platformer service and here are the machines that are involved in that. So you can see what I have here is up in the top left. Let's use the mouse pointer. You can see each of these blue boxes is a separate physical machine or a separate instance running on Amazon AWS or whatever. Here's the user's laptop. He's running Chrome. Down here we have a load balancer and a bunch of back-end servers. So everything apart from this machine up here in the top left is part of Python Anywhere's infrastructure. We run on Amazon AWS, but that's kind of... that's not particularly relevant to the context of this talk. Let's say that the person who's running the browser up here wants to view my friend Harry's website. They want to go to visit www.abaythetestingoat.com. Well, their browser makes a DNS request. The DNS request comes back with the IP address of abaythetestingoat.com, which is the IP address of this load balancer down here. So it opens up a TCP IP connection down to the load balancer. It sends the request to the load balancer. Now, in order to route it through to the web application, that abaythetestingoat is this web application here. It's this Python process that's running over on this particular physical machine, the middle one on the right-hand side, if you can't see the mouse pointer. So the load balancer needs to have the intelligence to be able to know that abaythetestingoat.com is running on back-end server 2. So we'll just say that's magic for the moment. Let's say it magically knows back-end server 2. It makes a connection. And now we have two TCP IP connections, one from the client to the load balancer, one from the load balancer to the back-end. The back-end now needs to identify that the process web app 4 is the Python process that's running abaythetestingoat.com. It does that. Again, we'll say magically. It makes the connection. The web application code does its calculations. It renders templates. It talks to the database. It does whatever magic it does to generate a page. It sends it back to the back-end server, sends it back to the load balancer, sends it back to the client. Now, if you're used to running normal kinds of websites, the kind of system where personally, for example, I have a VPS where I used to host my personal blog, then you might be thinking, what's the point of the load balancer in there? Because normally you'd simply have a server that looks rather like the back-end server here. It's running a front-end web server like engine X or Apache, and it's got a number of web applications running as Python processes underneath it. And we have this extra set for the load balancer. That's kind of where the magic comes in, because it's the load balancer that allows us to scale up, to scale down, to add in resilience and failover, and all those other good things that people expect when they outsource running their web applications to a third party like us, rather than renting a VPS. Right, so I said the load balancer knows by magic which back-end has got to send that request to. Our load balancer is running engine X. It's running a specific flavor of engine X called OpenResty. Now, engine X is an awesome web server. It's extremely fast. It's very good at proxying connections through it like we do through the load balancer. And it has a lot of great plugins. OpenResty has basically engine X with batteries included. One of the batteries that are included is Lua scripting. The kind of Lua scripting you can do is actually insanely powerful. You can do any amount of Lua processing and it works extremely fast. Lua I think is a nice language. It's not as nice as Python, but some of the design decisions they made that make it a less pleasant language to look at and work with are actually very good for speed and efficiency. So that's why I think they chose it for the majority of engine X scripting. What we do inside our load balancer code is actually really very simple. What I've got here is the engine X configuration file. Hopefully that's reasonably readable. At the top here, we're saying in it by Lua file, when engine X starts up it's going to load, it's going to run that script in it backends. In it backends basically just specifies some global context which is available to any Lua script inside engine X saying here is a list of all of the back end servers. That's all it does. As we go down here, we come into our server block so we're listing on port 80 and 443. And this location slash block is basically something that's going to be executed. It's going to be executed for every request. So what we do is we extract the host that this request is asking for www.abadethetestingo.com We extract it from the HTTP host, the header from the HTTP request and it's actually in a variable called root host. We then set a back end IP variable to empty string. And then this is basically a function call here. We're calling the Lua function that's contained in get back end IP. Now, you can guess what get back end IP does. It returns the IP in this back end IP variable and then we go into this little bit of engine X magic which is proxy pass. So that says just hand off the processing of this request to that server over there identified by this IP and engine X does the rest for us. Let's take a look at that Lua file. This is an interesting bit of code because it was something we put in for the first cut of our load balance that we have in a week or so. It seemed too simple. It seemed too... It didn't seem complicated enough to work. All it does is hash the host name that comes in. So that's literally the code that Python uses when you hash a string converted into Lua. It hashes that so you've got a number from the host name. We then take that modulo, the number of back ends and use that to index into the list of back ends. So that means that every single web server we're running is assigned essentially randomly to one of the different back ends. It's stably assigned to the same back end. If we add new back end to the cluster the modulus number we're using increases and so everything automatically spreads itself out over the cluster again. That's the load balancer. I said that the back end server also needs to identify which process is running a particular web application. So this is some really basic engine X configuration that any of you who've done stuff on engine X will recognize. All we're saying here is again extract the domain name from the request that's being made from the request we're processing a different way of doing it but the same effect. And what we tell engine X to do is delegate all requests for www.bay, the testinggo.com to a particular socket. This is all dynamic stuff. This is what the config actually looks like. It's not a sample there. So any request that comes into this engine X it will immediately look for a socket in that particular location and expect there to be a Uwizgi process sitting on the other end of it running the website that should be on that particular domain. How does Uwizgi know that it needs to have a web application running on that socket? Well, Uwizgi has a directory it contains what they call vassal files. A vassal is Uwizgi's terminology for a running Python process and a set of processes that's responsible for a particular web application. It's configured by a vassal file and a vassal file basically has various things saying where the code is what kind of sandboxing you want to apply how many worker processes you want but importantly it also has this line at the top here which is the socket that it needs to listen on. Uwizgi's very clever. If a vassal file is created configuring a Uwizgi vassal like this it will detect the creation of that file if it's in the right directory and it will fire up all the processes immediately and that means that obviously the web application is started. So what we need to do is start a web application when requests come in. This is where things get a little bit more complicated. What happens if a request comes into one of our back ends and there is no process running for that particular web application? Well, I told you the engine X I showed you earlier was simplified with. When engine X tries to connect to a Uwizgi back end that's not there maybe there's no socket, maybe Uwizgi itself hasn't started the processes maybe it's killed them because they are timed out after a certain amount of inactivity engine X will internally generate a 502 error. Normally that just goes back to the browser and obviously things look bad. What we have here is an error page handler. If there is a 502 error we essentially go to this other block here at fallback error page 502 if there is an error we wind up inside this code here and all we do here is we check whether there is a vassal file for that particular domain. So let's say we are looking for www.baythetestingo.com the process isn't running the first thing we do is jump to this fallback we see whether there is a vassal file for that domain. If there is a vassal file for that domain we can safely assume that there are processes running so actually this was a real 502 maybe something went wrong inside the web application so we generate a real 502 error but if there isn't a vassal file for that particular web app we know we need to start it. Now you remember that proxy pass from the load balancer where essentially we are saying delegate or work for this request to this IP over there this is another proxy pass here which is delegating to a little microservice running locally the microservice running locally is actually a very small Django application it has access to the database that configures all of the websites we run when it receives a call on its initialized web app view then it says okay I need to start up that particular web application it goes to the database it gathers all the information about the user it works out whether we have a virtual a container for this particular user running on this particular machine it starts that up if necessary it then creates the whisky configuration it generates a whisky.ne file the vassal file passes that off to to you whisky. You whisky starts the process up and running and suddenly we can start delegating all the work to that so why is this interesting? well what it means is that we can actually scale pretty much transparently let's say we've had a busy day and let's imagine we've only had until now say three web servers in our cluster and then suddenly things hot up maybe the web applications we've got are getting more busy or a bunch of new people are signed up and we've got to serve more websites all we do is we create a new backend server which is very easy with Amazon we just fire up a new instance and then we tell the load balancer about it immediately on telling nginx to reload its configuration it will start distributing requests differently across the load balance, sorry across the back ends and any back ends that need to start web apps will automatically start them the ones that are running web applications that they no longer need to run will all start timing out and killing themselves so dynamically reconfigure the cluster very very simply now let's say that something goes wrong last night one of our web our web servers started showing problems and we got pinged by pinged them saying liveweb1 was going down liveweb1 is a particular server that we have on Amazon and every year about this time hardware starts failing on AWS I think what happens is that all the regular engineers go on holiday and the interns haven't been given enough information on how to manage their their systems liveweb1 started failing and so all we did was login into the load balancer, remove it from our from our list of back ends everything then immediately reconfigured itself automatically just through the use of this hashing function to run on the remaining servers that meant that of course all of the web apps were running a little bit more slowly because our machines were closer to their load limit but that's fine, we have a fair amount of capacity for that we could bounce the web server sorry could bounce fix the broken server bring it back into rotation with a new IP and suddenly everything worked again that was a very very rapid tour through how our how the whole system works and now I'd really like to hand over to you guys for any questions and we can drill down on anything that was interesting thank you so we got a lot of time for questions we have a little gentleman here at the front this is about initialize web app could you just use your whiskey to automatically start why would you not use that the real the real problem is actually in the amount of work that needs to be done to start the process because all of our users run inside inside sandbox environments which we have to have control over the code to actually start them up and you always get the time when we started using it and I think still doesn't have the capability to do all of the setup work required to do it it can start processes quite happily, it can run certain kinds of pre-init scripts I don't record precisely what but there's something that it could not do to do with the virtualization essentially first question how do you do the sandboxing the sandboxing it's kind of a roll your own thing these days if we're starting Python anywhere today we might think of we'd probably use Linux containers or we'd potentially use Docker we're using Docker for some stuff we're working on now and thinking of rolling some changes back in but the good thing about Linux containers is that it was built out of reusable components it uses charoots it uses process namespaces, network namespaces and that meant that all of these have been becoming available for a number of for a number of years we've essentially rolled our own kind of Linux containers lights by plugging those things together okay, and what happens if if a worker is down and let's say free requests come in at the same time is there some kind of a race condition here, right? sorry, I don't think I understood that so if you need to start a a whiskey process for a website that was down but free requests come in at the same time is there some kind of locking to make sure you only started once and you don't lose any request while it's starting up? yes, yes, there's locking inside inside our initialized web apps which handles that and you whiskey does queue things to a certain degree as well so we kind of got belt embraces there we use the code that starts up our sandboxes in various places including these we do in browser consoles and things like that so we've kind of got locking on our side to protect us from that and you whiskey does a certain amount of queuing okay, thank you no we don't and that's something we really do want to support all of our infrastructure does support it to one extent or another I think I'm not actually I'm not sure what you whiskey support for WebSockets is like at the moment but the problem is that the whiskey protocol doesn't really support WebSockets if you whiskey does support it it'll be in some kind of extension on top of that I think that even when we do so well we definitely will support it we'll either wind up rolling something of our own to be able to manage long running say tornado processes and use the same engine X infrastructure to route through to appropriate places or maybe you whiskey will by that time do something that we can use you never said how you deal with persistent states or with databases sorry could you say that again well what happens to the databases of the web apps oh I see, okay that's that's managed separately we have we have MySQL instances and we're working on supporting Postgres instances just separate the kind of behind those back end servers sorry do we have code that could we repeat the question please do you have some services that manage databases for users on the MySQL side is all a little bit messy and was built ad hoc for we're adding Postgres support right now and we're basically building that as a we have a flash micro service which runs on one of a set of Postgres servers which fires up Docker containers each of which runs one Postgres instance so the flash micro service does the provisioning we're hard to work on that at the moment it's working well enough to pass our functional test which probably means it's a month or so away from deployment last question please there's one Hi, you said that when one of your instances you got an alert on it then you manually removed him from the load balancer is the reason why you don't set up auto like automatically removal and also if you have auto upscale because you said that you have some sort of limit on your instances that's an excellent question it's really been a matter of development time one of the features we do need to add is automatic instance when we first created it made sense to do it manually because each instant failure was rare enough occurrence and was kind of unique enough in the way that it failed that it was better to have a human in the loop whereas now I think we've managed to get a list of the different ways in which instances can fail and we can probably start building in more automated responses but yeah that's just a case of we haven't had time to do it yet so thank you very much Giles speaking here