 Test test perfect. Yeah, that works. Okay. Are we live? Oh gosh, I'm very loud Is it too loud? Oh Nice So we're gonna be talking about sticky sessions today. Maybe it wasn't such a terrible idea My name is Dan. I'm from Boulder, Colorado. I work at a little startup in downtown Boulder. It's located right here We're Victor ops. We're building an on-call management incident management platform And actually Boulder looks more like this. So it's not a bad place to live I mean, it's actually so nice to live there right now that the real estate market is really booming You know, everything is really houses are being sold like in a weekend with three multiple offers and so about three years ago my wife and I went to buy a house and We found that house on the MLS and we went and looked at it and the realtor said well, don't worry You know, I could be able to buy this it's gonna be gone in a weekend and we said, okay Three months later my wife sent me a text message saying hey that place is still on the market And so we went back and looked at it and I don't know if it's obvious from this photo But the people who lived there had an interesting style They had actually mounted a tree decal on the wall and it also had this awesome like three-tone paint scheme going on And it was really messy and nobody could see past that and so something that would usually have sold in a weekend in Boulder was actually on the market for multiple months They had this other problem that everything in the house had been dated it hand It was a house from the late 90s, but nothing had been updated in the last 30 years So we had all these brass fixtures and this cloudy glass and the whatnot And so my wife's first comment when she came in was shouldn't we do something about these fixtures? I go through and upgrade them to brushed chrome or update them to oil rubbed bronze or something like that and In my response being an engineer was You know, we can we can do that or we can just wait for brass to become popular again And let the trend come to us and our house will already be modern and contemporary again That that apparently does not work I tried that a year earlier when my wife said I need to have a white gold band And I said well, why don't you get gold gold? She said oh no. No, no has to be white gold and that this kind of led me to talking about trends and How architecture Happens and then it comes back to one other decision and then we move back to another decision It's always this kind of cycle and there's there's always reasons for this It's not like people are arbitrarily choosing trends, you know, there is there is impetus in the market for it But if you don't understand the trade-offs and the decisions that you're making when you're talking about these fundamental architectural decisions Then you'll just always be cargo culting what everybody else is doing And so what what I would like to talk today is about a fundamental trade-off that we make when we're building architectures and that's all about the statefulness of that architecture So the way that I view the world of at least web web application development is a very simple Architecture I write my application PHP or Java rails or something like that and then I get a big instance of my sequel behind it I buy a whole bunch of web servers. I hook them all up to a relational database and I kind of call it a day and The underlying decision that I'm making here is that I'm separating the behavior from the data But there's another way to do it We could actually store the data for our applications inside of the application And this has a lot of Interesting side effects. This is actually the way the Victor ops is moving right now We started off building in kind of the standard architectural way and now we're moving to actually store a state inside of our application So I'd like to talk about what it means to store a state just so we can talk about where it is in the architecture Where do we store it today? Talk a little bit about history and what people have done in the past and how we could maybe revisit some of those ideas Talk about some motivations for why we store state in particular places Maybe some dangers to storing it in the application versus storing it in a relational database And then give you guys some more information for you to look at So when I talk about state in its simplest terms, it's this distinction between data and the behavior So if I have code that runs the code is typically the behavior and and what the code works on is the data Another Grander way to think of state is it is an implicit coupling between time and space in your code It's also really tricky to understand which part of your application is stateful so if you look at your code and You think about the order of operations and that and that that's an important part of how it behaves Then the order of operations is part of the state and there are a couple of ways, you know, people talk about happens before They build synchronization Primitives so they can make sure that an application moves in a particular manner And that's all stateful behavior or that's all state that you're talking about when you're trying to build these If you have a coupling to space, then it depends where something happens and Where is kind of a nebulous term? It doesn't mean that it actually occurs in a different physical location It could be two threads that are talking together. It could be two processes Or you could have a application that is deployed in California trying to talk to a server That's in New York anything anything where you care about the fact that you know two requests Hate your server and you don't know where that is or they hit different threads That's again. You're talking about state But state is also a lens So this is a very simple function. I wanted to keep it one line So it doesn't have a barrier check for when it hits zero So this is an infinite loop but the point is it's a simple factorial function and most of time you might look at this and say this is a Completely stateless function. There's no variables be a being assigned But if you run this on a real computer computers have stack sizes maximum stack sizes they have Apple, you know memory and so if you maximum stack size is a hundred and you try to call this function with a hundred one It damn well matters that that hundred iteration is happening because at the hundred for the first iteration It's going to throw an exception and crash and so in this case the state is actually being held in the stack in This next function a lot of people would say that this creates a side effect that writes to the disk for instance And so this is a stateful function Well, if I change your perspective so that you're looking at this from a web server's perspective and somebody's coming in and making a Get request and and they don't care that this is writing to a disk then this is a stateless function And so it always depends on how you look at things on whether or not you're talking about state And that's always important When you're considering where things should go So when I break down the world between stateless and stateful apps State less apps typically store their data inside of a database and they ship that data from the database over the network to the behavior and Then they do some work on that data and then they ship it back to the database So it's kind of a data shipping paradigm and typical architectures look like this So I have two two requests coming in it doesn't matter what server they hit they go to the database They get their data. They work on the data goes back to the database They're often deployed behind load balancers and basically the majority of web applications you see on the internet The behave kind of as clud is create read update and delete applications behave in this manner So kind of like this If I break it if I break down what's actually happening here when there when a request comes in We know that the engineers wrote this part of the system We know that they went to the internet and downloaded this part of the system. So maybe that part works These people don't care when things break. They don't care that it's the browser's fault They don't care that it's engineers code and they don't care that it's the database to them This entire thing is stateful all applications have state and Keeping it separated. Maybe that's not the best thing in the world to do So a stateful application stores the data right next to the behavior and The data doesn't have to move when it's worked on it moved is going over the inner is going over a network in this case So 10 15 years ago when j2e was still a cool thing to do People had this concept of sticky sessions And I should caveat that by saying that people do still do sticky sessions today But if you go on stack overflow and you and you Google should I use sticky sessions on my application server ever been The popular consensus is no you shouldn't use them and the basic idea is every request that comes into a web server from a particular Human will go to the same server So if I have a browser open on my Mac and I make two requests and I'm hitting a web farm I will always go to server one So like in this case two different people The guy on top is always making a call to server one the woman on the bottom is always making a call to server three And this makes it easier to reason about how I do caching for instance I can create really simple in-memory structures and just pull them out of memory And I always know that that person's session is going to be local to my server and they accomplish this in Basically two different ways So in order to spread work through a cluster you need to do some sort of look up because when somebody hits you from a browser You need to know where that server should hand which server should handle that request and that look up can basically be distributed Or not distributed So generally we call this look up a hash table because you've taken like the IP address and you run it through a hash function And then you store it with a server's IP address or server's name And every time the the person hits the load balancer it hits a server you route them to the same server and You're using some sort of identifier could be a cookie that they have in a browser It could be the address or we know that that won't help with netted servers But some way of identifying a particular browser that's hitting your website And so to oversimplify the richness here when that hash table is distributed you might use some sort of Maybe a consistent hashing look up or you can send requests any random server and those servers know how to route to other Servers so you can kind of move the request through the through the farm until you find the correct server And that's who ends up responding A non-distributed hash table could be a centralized load balancer that just has a hash table sitting at memory So it's not sharing it with anyone else if it were to crash it kind of loses that Concept or it could be even as simple as just a persistent connection to a server You just open up a socket and you just send all of your requests to one particular server all the time You've kind of hidden the load balancer concept And then the hash table is kind of on your computer's list of connections So why would we even want to do this? You know, I talked a little bit earlier about some ideas around how it makes it simpler for programmers to store in memory caches But there are some other big reasons why stateful applications are a really interesting concept So it can be CPU typically does something about four billion times per second And if I look at the cost of going to get information from different levels of a computer Starting off with a CPU's cache that sits on die, you know L1 or L2 or L3 and then you know looking at that compared to main memory disk and network And if I think about each cost in terms of one second of activity Instead of nanoseconds and microseconds which are all you know orders of magnitude and we don't normally think in that context Going to the CPU cache takes about one if that takes one second Then going to main memory takes about two minutes So that's not super long if I go to a disk it takes 14 hours to get that data And if I have to go over the network like in that data shipping paradigm that most stateless apps are built on It takes six days to get that data. I mean so just like another way to frame this Going to your CPU cache is like you turning your head and talking to the person sitting at your desk next to you Going to main memory is like walking across the office to get some piece of information and then coming back to sit down Disk is like driving from LA to Portland to get some piece of information and then coming back and Then go into the network. That's like every time you wanted to answer the question You drive to New York and drive back I mean think about the amount of wasted activity that could be happening while that's occurring the CPU is just kind of sitting there farting around and So just from a pure performance perspective We are leaving so much on the table by keeping our data and our behavior separate from each other There's also terms of correctness when you're when you're programming across the network if we keep the two the behavior and the data coexisting there are actually proofs in the distributed system literature that talk about how you can change levels of Realizability and serializability and we can actually get different Different guarantees on our system just by keeping things off the network You know, you don't have partition problems when you don't have a network involved. I Also believe as an engineer that ergonomics are very important You know the more tools that we have the more that you have to deal with a Stateful architecture happens all inside processes that you understand You don't have to go out mean not that people don't understand my sequel But it's one less tool to understand to keep things in native data structures There's also a whole different realm of resilience that we can build in One of the first conversations you'll always have when you're building on an architecture is what do we do if the database is down? Everybody punts on that They never think about how their code should handle the fact that where the state lives Might be unavailable once in a while and so when that does happen most applications just absolutely crap the bed and So you can build in different kinds of resilience if we keep the state and the data together You can actually change the way that you think about things and respond failure is a lot easier to handle whole classes of errors actually go off the table There's no network related problems You still have to be conscious of concurrency issues, but at least those are all local to one process or one thread or something that effect So you there are a bunch of choices on how you might do this You know we talked about sticky sessions and that was something that was codified in a bunch of different programming languages J2e is what it you know a really nice implementation of that But there are a bunch of different decisions that we have to make You know one of them is we need to choose particular one times. They're actually there are managed environments where Building stateful architectures is not super wonderful. You know MRI the default ruby runtime is one of those And you'll kind of notice that when people deploy against MRI or the PHP runtime what they typically do is spin up a bunch of processes behind a web server and Then kill them periodically and allow additional ones to serve new requests so kind of a CGI exact model, but with a you know with actually keeping hot processes And it's very difficult to have a stateful architecture if you're constantly killing the memory space where you're storing the state You also typically need some sort of threading model in your Inside of your runtime again so that you know the stateful architectures usually have a lot more background if you know background Processing going on so they can keep everything Everything polished and working so you need some background threads to work on You also typically need some kind of control of your memory that could be on heap It could be off heap memory that you're just memory mapping. You just need some way to actually Control how much data you're storing From a framework perspective You also need to make choices and these are kind of the application frameworks that we build applications in So you need some way to support making remote calls Unless you're lucky enough to just deploy on a single server You're probably going to have a cluster of nodes associated to handle the amount of load that you need to handle And so having those nodes talking to each other is pretty important You also you need to make sure that your frameworks handle concurrency Ideally as a first-class citizen and not just something that was bolted on after the fact with some really low level Mutexes or some of the fours that you're forced to deal with a lot of the concurrency And then you need some concept of clustering and clustering is an interesting topic in and of itself You need some idea of membership inside of your Nodes that all understand who the other ones are this can either be dynamic or static a Static cluster is something where you know like on my sequel rollout using galera Where you actually specify all the IP addresses of that cluster in a config file and in order to add additional Capacity you actually go back and reconfigure the cluster. That's perfectly fine. People do that There's also a concept of dynamic clustering which allows your cluster to be more elastic You can add capacity and it will discover the rest of the cluster and then join it on the fly Both of those are pretty reasonable ways to do that a couple of examples of frameworks out there that have reasonable run times underneath them Aka is built on top of the JVM. It's an actor-based framework and they This is one thing that allows you know all of those All of those examples I talked about before, you know Aka allows you to do Erlang is a programming language, but they have a runtime or they have an application framework called OTP That's designed for helping you to build real-time systems And that's something that's also really good for using stateful architectures And then the database React has an underlying library called core That's a distributed systems framework to form the basis of how react distributes data and scales Or more generally, it's just thought of as a toolkit for building distributed scale will fault tolerant applications And those all kind of allow you to build stateful architectures React core, I mean that was straight off for their github page You know a big part of this talk is about trade-offs that we make when we make these decisions And so there are of course downsides to building stateful architectures Yeah, and it's important to know about these pitfalls before you get to them So that you don't discover them at 2 a.m. When the problems have come up Probably the biggest thing that I have seen the biggest problem rolling out of stateful architecture at victory ops is serialization You know when you have a database you always feel confident that you can restart that database and you always get your data back Because they've spent a huge amount of time making sure that the format that they write things to disk Can always be read in and so if you're storing your state inside of your Application and you hope to be able to restart that application at some point You need to think about the way that you're serializing this state And there are kind of two different levels of serialization There's writing things to disk and then changing the underlying model and being able to read those things into disk And we kind of call that backwards compatibility The other problem that happens especially in a cluster is that you will roll out New parts of your model and then try to send messages to systems that are running the old code And they need to be able to handle that fact and that's kind of what we call future compatibility Yeah, I'm receiving mutual I'm receiving messages from the future effectively and I need to be able to deserialize that in both Kinds of serialization are extremely important when you're building a stateful architecture You also need to watch out for thundering herds Especially when your application is starting up or in the case of a dynamic model of a dynamic clustering model When it's failing over from node to node This is one of those cases where engineers are terrible at finding it while they're working on their local machines They very often have small workloads. They test things locally and then roll it out to production and it craps the bed This kind of makes sense too. This might be a you know a bit of a an obvious thing But you know you're serializing all of your data to disk and then you're Restarting and kind of trying to fully hydrate a working database And so of course there's going to be a lot of load on the system when that happens I've actually heard reports from people running stateful architectures in the system in the real world Where their systems take hours to restart the clusters and so you do have to work around that You also need to be very careful about the way that you use memory again. This is something that we take for granted in the way that Relational databases or no sequel databases handle structures that they can You know when they need to page data to disk and only keep in memory what they need to be working on but it's so It's very very tempting giving those performance numbers that I gave earlier for Engineers when they're building a stateful application to just keep unbounded in memory data structures and Then again it works great on the laptop, but in production It'll grow to such an extent that it starts paging for instance, and that completely changes the performance profile of their application And this is you know something we take for granted in stateless architectures There's a lot the good thing is there's a ton of inspiration out there for how you might do this Basically any distributed database is a stateful architecture And so if you need code examples or you need just design examples or white papers or Case studies you can read things like from the react team the Cassandra team that even read You know the Dynamo paper and get general ideas for how DHT's work in production There's also some framework examples Akka has a distributed data Module where they help they help you build CRDT's into your database Or into your application a CRDT is just a concurrent replicated data type so it allows us to build eventual data structures With really sane merge semantics so that I could have a counter that increments on any node And I know that eventually it will come to consensus without having to lock across an entire data and an entire cluster The Orleans project is actually one of Microsoft's research is they very specifically set out to build a stateful web development framework and They ended up deploying it. I think Halo was written the hero back end is written on top of this Unison is a Haskell based framework for building stateful architectures He actually took this to kind of a next level and in addition to it being a framework it also comes with an IDE for building your applications and a Language so it's a language of framework and an IDE for building distributed Systems and for and and on top of that you can build really nice stateful architectures So we talked a little bit about what it means to have state and What does it look like because it's kind of a tricky subject? We talked about how we store things today in that two-tier or multi-tier web Architecture where things are stored in relational databases and then shipped over to the processes that use them We talked a little bit about sticky sessions and and why people didn't like them some motivation behind Why you might want to build a stateful architecture some caveats to making that decision And then there was some more information about that So that's all I got but if there are any questions, I would be happy to answer them That's and most frameworks like rados and Django and Python, you know, they promote these patterns So they obviously This is then a production. These are servers stored outside of the app server. So that means you mentioned, you know It's a wasteful to go over the network stuff so What's what's like a I guess like a pragmatic Alternative to this if if someone has an application like this To kind of Performance of being able to read and write to sort of a state store You went over some mechanisms, but it's see like But they're all kind of you know It would require like this a big overhaul of pre-existing architectures and that would pretty much require You know a lot of like buy-in from many different people just to make it happen. So is there any Alternative or any incremental improvement upon like the common pattern of just Yeah, I think I can just throw it down that the question is if I'm transitioning from a stateless architecture to a Staple architecture what are kind of the interim steps that I might go on? And I think the answer is you have to start off piecemeal You know redis Using a redis as an interim cache and kind of shipping things around that way is Still the same pattern. You're still keeping the state outside of the same memory space But it's certainly faster and and so I think the you know, I think you cleave off certain parts of your system and You know, maybe you start building things as Microservices where the new services can be stateful and the old way can be stateless and This might only make sense for certain parts of your architecture in the case of Victor ops For instance, we have some backing servers that are stateful, but part of our application is state less And so it it depends on you know, if if some of those benefits Work for your company, then those are maybe the parts of your system that could become stateful It doesn't have to be holistic Yeah So the question a couple of points the the question is how do you rebalance live workloads? How do you rebalance workloads that are no longer important? So kind of like a dead work, right? Those two questions So rebalancing live workloads is a pretty interesting problem. You actually have to keep in your cluster The performance metrics of the other nodes and then you can move things around preemptively For dead workloads, you know when we talk about keeping the state in memory that doesn't mean that there's not there's not some way to Store that state or store the way to get back to that state somewhere else So at Victor ops, for instance, we use the event sourcing model So we basically write a log of every state change to Cassandra While it's going on, but that's just a very fast right activity all state changes themselves happen inside of the application so when some when work needs to Effectively when it's done and it needs to kind of go into a dormant state We can just kill that shard for instance, and it can always be rehydrated from the log later Does that make sense? It's actually just keeping everything in our case. It's it's more of like a Transaction log with snapshotting So every everything if you were to think about it in terms of a counter for instance We store a log of plus one plus one plus one plus one plus one and then periodically you might say okay The current count is 100 and that's a snapshot. It's the way that databases if you if you look at like right-ahead logs It's the you know how that's implemented and the designs behind that anything else Cool well, thanks guys It'll you'll forget like I do yeah, I didn't really need it in the first session. Do I need to for the streaming or the Okay, okay, cool. Thanks. Okay Welcome to the developer track at scale 14. This is sponsored by Perkona Which means that I have to read you their marketing blurb With more than 3,000 customers worldwide Perkona is the only company that delivers enterprise class solutions for both my SQL and MongoDB across traditional and cloud-based platforms And with that I'm just going to turn it over to Jesse and let him do his presentation Thanks everybody So I'm Jesse Davis your mic on is it check check. I think it's on I work for MongoDB and MongoDB is pretty cool Naturally, we are and If you want to find me on Twitter, I'm over here and I'm going to be tweeting at a link about More information about code routines at the end of this talk so you can find that link there What I want to show you is how a Non-blocking framework that uses callbacks and an event loop would be implemented in Python 3 and We're going to go in three stages. So you'll know where you are first of all, we're just going to do a basic blocking HTTP client and that'll be like our Base case so that we understand the simplest possible implementation of a solution to this problem But we'll see that that's not very efficient and so we'll replace it with an Async framework that uses callbacks and That'll be very efficient, but it will also be just incredibly ugly and awful and So we'll replace that with something. That's quite beautiful called code routines So a little bit of setup I've got web server running on port 5000 and if we see What that web server is serving? We'll see that it says hello scale and we'll also see it's not very fast I've deliberately coded it to take about a second to answer each request and We'll see in a moment why it's so important that this is slow It's actually talking to slow servers that an async framework is best for let's do the first bit here Let's write a basic blocking HTTP client so I think The simple way to do this is just write a function that will fetch a URL at a path and It's going to need a socket So we'll import that from the socket module and Let's connect it To local host port 5000 and We'll need to send an HTTP request to the server so it'll be like get of some path and We'll tell it that a protocol is HTTP one It will substitute the path in there and since I'm using Python 3 we'll need to encode it before We can send it over the wire. So we're transforming from a string into bytes and an HTTP request ends for some reason with a double carriage return line feed. I have never known or cared why So once we send this get request then we need to read the server response and Reading a response has this sort of annoying API with Sockets where you just kind of keep reading and you never know how much you're going to get until something tells you that you're done and With HTTP one the way you know you're done is that the server closes its side of the connection and so you get an empty read So the way we're going to do that is we're going to make an empty buffer to store the chunks in and then We'll go in a loop and we'll just do a chunk is Pardon. Yeah Good a Chunk is just whatever we get when we ask for up to a thousand bytes Who we might get one byte we might get all thousand bytes we just say what our maximum buffer size is and If we get something then we'll just add it to the buffer and keep asking for the next chunk or If we get an empty chunk, that's the socket modules way of telling us that the servers closed its connection and In that case, we're done. We've read the full HTTP response. So the response is equal to This is like a Python idiom. This means the empty byte and We use it to join up all the chunks in the buffer into one big byte string and Then we'll decode that so now we have a text string and Let's print it and let's remember to return because we're in a well true loop So let's see if I got this right we'll get the foo URL and run that and It says hello scale a couple hundred times It doesn't seem very fast. So let's see how long that this takes so we write down the start time and We'll print How long this took? percent Time dot time minus the start time run that All right, so it takes a second and it's not very surprising because I very carefully coded the server side so that this would take just about exactly a second and You can maybe see this a little more clearly if I Just print the first line of the response just the HTTP status header So let's split that by new line and print the first of them There so we see that HTTP one 200 ok and That it took one second Now the problem here is that if I get two URLs, how long is this going to take? It'll take two seconds Because I'm doing them serially Each get must read the full response before the next get sends the next request So there's a couple of ways of solving this problem. Obviously what we want to do is we want to do these two things concurrently and In Python you can do things concurrently with threads There's the infamous global interpreter lock, but that doesn't actually get in our way very much in this case because the Python interpreter so the global interpreter lock means that only one Python thread Can execute Python code at a time and so it means parallel computation is not possible in Python using multi-threading But That's not the problem that we need to solve here We're not actually doing significant computation. All we're doing is just dumping bytes into a buffer and then printing them out we're not actually using the CPU very much and the nice thing about multi-threading in Python is that Python threads drop the global lock while they're waiting for socket IO And so you can actually use Python threads to do concurrent network operations As long as you don't need to use the CPU very much. So that's a perfectly reasonable way to solve this problem but Another great way to solve this problem is async We could talk about the pros and cons in a little bit But let's just say for now that we've decided that we're not going to do multi-threading and that we want to do concurrent operations on a single thread and It sounds impossible So let's figure out how to do it. We're going to write an async framework and The async framework it has three kind of components one of them is non-blocking sockets the second component is Selectors and the third component is an event loop So non-blocking socket, it's just a socket where you've called set blocking false so Easy enough now. I've made my thing async. Let's run it so That didn't work very well We get this blocking IO error and the line that threw it is the connect line Through an exception as soon as I called connect. Well, I sort of have a rule of thumb for Python exceptions, which is if I don't care about it. I should just ignore it So let's I don't know. Let's see if that works Well, we got a different exception. So that's progress and We got an exception on a later line The exception was Socket is not connected. So this sort of makes sense, right? We told the socket don't block and The contract of a non-blocking socket is that Every operation either succeeds or fails immediately. It never waits to complete Since Connection can't succeed immediately. It takes some time to set up the TCP channel so throws an exception and This isn't actually an error. There's nothing going wrong here. It's just saying I couldn't do this immediately So I'm going to tell you that by throwing an exception It's obnoxious, but this is how they work. So I'm just going to ignore that blocking IO error But that means that I immediately get here and we're not yet connected So I want some way now to wait for the connection to complete and This is where this second part comes in Selectors So since the dawn of ages operating systems have had ways to Say that we're interested in events on non-blocking sockets and to find some way to wait for those events to occur and Those are functions with names like select or pole on Litics the most scalable Version of that is called the e-pull on a Mac like this. It's called KQ Depending on the operating system, you might want to use a different method to wait for an event But the nice thing is that in Python 3 We've got this selectors module, which means we don't have to worry about where we're running we can just say from Selectors import default selector and we make one of those and that's Whatever is best on my Mac. It'll choose KQ on a Linux box. I would choose e-pull. I don't have to worry about it The way I use the selector is Down here. I do register The file number of my socket and this is just a number. It's the file descriptor of the socket. It's four or five or something like that and I Write the list of events. I'm interested in In this case, I'm waiting for the socket to become writable because that's the next thing that I want to do on it I want to write to the socket and I'll import that from selectors and Then I call selector dot select and Here's now where I block select waits for the socket to become writable once it is I'll clean up after myself and Then I'll be able to call send So let's see how this does Throws another error Well, but that's good. So I threw an error Another blocking IO error, but this time I did it down here in receive so I managed to get past this part and I got down here and I think that means that I just need to do the same stuff that I did before But this time I'm not waiting for the socket to be writable and waiting for it to be readable So I'll import that Run this. It's not that great. Is it? Like it still took two seconds and The code is worse. So what's so great about ASIC? This is going to be a theme of the talk, which is things have to get better or things have to get worse before they get better so let's How could we make this better? What we want to do is once we've registered this socket to be and we're waiting for writability We want to somehow do other work until that's ready So that we can do these two fetches concurrently so In order to be able to do other work this get function it needs to return So that we can then begin the next call to the next get So what if I do that? That's not going to work, right? We need to somehow get here once that select finishes So let's start to kind of sketch out how we would do that What we want to do is we want to write a function. It's called writable that will be executed Once this socket becomes writable once we're ready to do the next thing with it and PyCharm is underlining S and path because those are no longer available in here. Those were local Variables of get and we're not in the get function anymore so let's say that those somehow get passed in so that we have them again and We're not going to need this select call anymore because we're assuming that we're somehow by the time we get in here The socket's already writable So now we need to figure out some way to schedule this writable function to be executed As soon as it's ready to run And the way we're going to do that is Here's where all the magic happens this register Function takes a third optional argument called data And data can be anything we want. We're in Python. It's dynamically typed. We could stick in a number of string or We could actually stick in a function there as the data So let's do that. Let's make a closure using the lambda word and What the closure is going to do is it's going to execute writable With s and path captured from these local variables This is going to be a little function that doesn't actually run right now, but when you do call it it'll call into writable so That stuff spurious now So now we have an async framework, so let's run it So it was very fast, but it didn't do any work. Why is that? Well down here we called get foo and get bar and so we registered the two sockets to With their callbacks. We're waiting for them to become writable, but then we never call select. I deleted that line so let's do that selector.select and That returns something sort of complicated. So let's print that out and see what the return value of select is so we run this and There's its return value It's this it's sort of this big mess, but we can see it's got two selector keys What these selector keys are is there a bunch of information about the two sockets that are registered four notifications about and They got a bunch of junk that I don't care about we have the file descriptor We have the file descriptor again for some reason We have the list of events that occurred This is what I really care about Here's that lambda That I registered earlier as the data argument This is the callback that we want to execute as soon as the socket is writable So we can pull that out First we need to iterate over all of the Things that the select call returns so that's going to be for key and It also returns an event mask Which I don't care about I only care about the key They really don't care about anything in the key except for the data Because that's that lambda that I passed in So we'll call that that's equal to key that data and we'll execute it So if I run this now we have an async framework, right? Okay, so it threw an exception here. Well, okay, so so far we're making progress, right because writable got executed so that means that we have successfully waited for the socket to become writable and we wrote to it but What what went wrong was so received through a blocking IO error that indicates that the sockets not actually ready to be read yet and I think that the reason that that happened is We called select here and we're not supposed to be calling select here anymore not inside this callback What actually happened is that we the first socket became writable So we entered its callback and then we called select again and The event we got was something completely irrelevant. We got the event from the other socket becoming writable and So we thought okay great. So now it's time to read my socket, right? But that wasn't at all what we were supposed to find out so What we need to do is we need to delete this select call so now it's going to get a little complicated We've entered writable We wrote our request and now we're waiting to read the response So We register we want to know when the socket becomes writable and Then we somehow want to do something once that's ready So let's call that writable, so this is going to be the callback and And S and buff are underlined in red. They're not available here So again, we'll pass those in and the way that will Get those local variables into a closure and pass them in when we're ready to use them is once again we'll make a lambda that will call writable with s and bug there was a Freudian slip All right, and we don't want to do that forever. We would just want to do that once and I'll fix the indentation here Okay, and now there's one more thing that we need to do which is If we read a chunk and we add it to the buffer Then this means that we are not yet finished reading the response So we need to Re-register until the socket is writable again And that means that more server response is available for us and the way to do that is just to copy and paste Which is all I've ever done as a professional All right, and I think we might be ready to go so What's going wrong anybody wasn't this working yet? Yeah, we call select We only call it once right? So we called it we waited for an event we handled that event We called the related callback and they were done And we say oh that took zero seconds They think really is great. So what we need to do here is we need to do while While Turing while true process the events and Whenever an event happens we call the associated callback and then we wait for the next event. This is great How long did that take? It's still running. I forgot to I forgot to ever leave the while true. That's that's correct. I Got rid of that while true loop. Yeah Yeah, I got rid of that. So we've just got one while true loop here, right? There's no exit condition for this loop whatsoever So I think We want to somehow sort of know when we're finished, right? So once we've returned from here twice that means we're done So I think what we want to do let's say up top that we've got like a n jobs equals zero. So this is the number of things that we're working on and Then here Global and jobs So in Python you need to declare something as global in order to modify it from an inner scope so we do that and then we Incremented so now we're working on one URL and we work on it and we work on it and Then when we're finally done here, we don't need this return anymore, but we do need a Global and jobs and then here we need to Decorate that and Then I think so these two gets will both increment and then eventually they'll be decremented again And then we'll just we'll only loop until n jobs goes back to zero. Well, there we go So that's pretty cool, right? And what's really cool is So we got two URLs in one second instead of two seconds and what if we get like 20 URLs That also takes one second, so that's pretty neat. We're very efficiently doing concurrent IO on one thread and That's possible because most of the work Basically, there is no work. It's spending all of its time waiting around so it's hardly ever using the CPU all it's doing is adding events to the list of things it's waiting for it. That's cheap and compared to the multi-threaded version of this The overhead for waiting for each of these events is minimal All you've got is a file descriptor in a list of things that you're waiting for And a pointer to a closure Right. It's tens of bytes Compared to the overhead of a thread which has a stack It has entries in like scheduler Davis structures deep within the curdle. I assume I have no idea how that works There are hard limits on the number of threads that you can run Which are typically much lower than the limits on the number of sockets that you can open If for every socket that you're working on you also have a thread you will run out of threads First you run out of threads before you run out of sockets Allocating a thread per socket Artificially limits the number of sockets that you can work on concurrently So that's the problem that async solves if you have lots of sockets, but you're not doing much work on each Async allows you to scale to a larger number of sockets concurrently So that's cool right Yeah, so the question was isn't this while-end-jobs loop a busy loop aren't we spinning a lot of CPUs that the concern So no because the select call This is the one place that the framework blocks This select call will wait quietly until some event occurs And this is the only place in the entire Framework that we're allowed to actually pause and not do anything We've essentially said that this thing here, which is the event loop is The sole part of the code that's responsible for blocking nothing else can wait so What have we seen so far we are what time is it? We are 32 minutes into the talk we've written Async framework with callbacks and an event loop and we've shown that this is very efficient and Also that it's very ugly right we bloated this thing out from like 20 nice lines and one function to About what a hundred line 70 lines and it's disgusting How can we get back to the like Edenic beauty that we began with without sacrificing the efficiency that we've gained so Co-routines what I'm going to show you is how co-routines are implemented in the Python 3 standard library module called Async IO which was introduced by Guido van Rossum in Python 3.4 and They're built with a future class Generators which are a Python built-in feature which has been around for many years and The last bit is a task class So we're going to go through these one by one So first of all The future a future wraps a callback and a future represents some Value which is not ready yet So I can make an extremely dumb one That has a callback Which begins now and it has this idea that it will resolve Which is some event that we're waiting for and When the future is resolved it executes the callback that was waiting for it to be resolved and And everywhere everywhere that we use callbacks now I'll replace those callbacks with futures And we'll see how that goes here. So here's a callback Make a future let's set the futures callback to this lambda and We'll pass in the future instead of the callback and here's another callback so we'll move that out and We have one more callback, and that's here so once again we'll wrap that in a future and Now instead of registering callbacks. We register futures that wrap callbacks And so that means that down here this key data. This isn't the callback anymore This is a future And so we want to execute the futures call. Well, we want to resolve the future and that'll make it execute the callback So now we run this once again takes one second and We haven't gained anything, right? I only made the code even worse But I warned you that this was going to be a theme that things get worse before they get better in this talk. So I'm fulfilling that anti-promise So we wrapped all our callbacks and futures the next portion of this is generators So Let's let's engage in a digression. I've got myself a python console here and The way generators work in Python. It's a little bit bizarre so Generators come from generator functions and a generator function Is a function with a yield statement in it? So let's say that our Generator function prints start and it's got this statement yield one and Because it has a yield in it now. It's a generator function We'll see what that means in a second So maybe next to prints middle and I don't know it assigns a local Then it yields another number and then it's done. So that's the contents of our generator function And if we execute it It doesn't actually run it doesn't print start. It doesn't print middle It just returns this generator object. So what's a generator object? Well, it's got two things. It's got some code So here's its code It's named generator function because that's where it came from. You can actually see the code. There it is very enlightening That's Some compiled Python bytecode. We can ask how much is there? There's 40 bytes of it and besides code the Generator object it also has a stack frame to this stack frame. It hasn't done anything yet So its instruction pointer is negative one it has not executed any bytecodes yet and like any stack frame, it's also got the local variables and There aren't any yet because this hasn't had a chance to create any so the way you make a generator run is You call next on it Next G so now it started to run right we saw it print start and The return value of next Was one That was the number that it yielded so printed start and it yielded one and now it's actually stopped at The 13th of those 40 bytecodes So it's in suspended animation here. It's waiting for me to call next on it again So if I do that Now it prints out middle so now it's reached the next point in its execution The return value was two that was the next number that it yielded We can see that its last instruction pointer is now 34. So it's advanced farther through its code and it also It's had a chance to execute X equals seven and so now it has locals if I call next G again It tells me rather bluntly that it's finished by raising a stop iteration exception so That was our digression. That's how Python generators work. They're Incredibly weird and interesting And they're also very useful because it's a Function that you can start and stop at will that you can cooperatively schedule So this seems like a pretty cool thing to use to create co-routines things that can be Scheduled asynchronously to cooperate with each other. So how would that work? I Think the idea would be like This get function this would become a generator function By putting some yield statements into it. So instead of like declaring a callback here and then Having the callback run down here What if we could somehow just say like yield F and then somehow We know the socket is writable by the time this generator is resumed if that could somehow work that way Then we wouldn't need this readable callback either right we could just say like yield this future and then somehow Socket is writable is readable By the time we get here If that worked that way that we could get rid of this callback as well So every time we get a chunk we would just want to go back up here We would want to create another future then wait for it again until we had read all of the chunks So we would we wouldn't need to keep Registrating the socket down here right if that worked that way we could actually we could just get our while true loop Back then we would kind of be back to where we were where we Wait for the socket to become readable and then somehow by yielding F We wait until that event occurs and then we receive a chunk We append it to our buffer and then we just do that again until we're done and this This loop has the same kind of structure as we originally had in our blocking example question Well, it's about to be For the moment all we know is that yield F. It's going to pause this generator And then somehow we need to get this generator resumed Once that future is resolved So how's that can work? Well, let's run this and see Well, okay, so So far so bad the reason is that what I've got right now is I called get But now that get is a generator function It just returns a generator when I execute it right it doesn't execute its code. We saw that earlier So we need to somehow called next on this generator until it runs to completion and raises a stop iteration exception So the way we do that is This third thing this task class So write that down here Task is the name for the class that the async IO standard library module uses to call next on the generator and task is essentially the thing that turns a generator into a co-routine by Scheduling it by running it to completion and resuming it whenever whatever it's waiting for is ready so it's gonna take in a generator as its argument and store it away and then let's say that it has a method called step and Step is going to call next on self.gen to run it to the next yield statement and whenever one of these things yields What's the return value of yield going to be it's going to be the future that it's yielding so that'll pop out here some future object Will be f so now we want to Kind of wait for f to finish and the way we wait for f to finish is that we Assign something to its callback. We say this is what you should do when the future resolves So what do we want to do when this future resolves? Called next again, right? Yeah, so the way the way we do that is we'd say step step is The callback that we want to execute right since it's like funny recursion So let's run that That still doesn't work Can you see why not? What am I missing? We do resolve the future. Oh, we never created a task. Okay, that's a good start task right so we need to Create the tasks and wrap up these generators and tasks. Let's see if that was it No, oh still need to call step. That's right. Okay, so let's do that here in The constructor so as soon as you create a task it gets a generator and starts running it And then whenever the generator pauses we say, okay, resume it whenever the thing. It's waiting for is ready Let's run that This is looking good. I feel good. What if we did this like a Bunch times too much output to process. Okay, what if we do this somewhat fewer times? I feel like I have an infinite loop here again somewhere. Where is it? Ah There it is right we have a while true loop. So now we need to return again. All right Oh stop iteration and that's getting thrown from Step oh, so that's good, right? It's good that we're seeing a stop iteration because that means that we've started to be able to run Generators to completion We've run the get generator all the way to the point where it returns So you know my philosophy about exceptions in Python Which is to ignore them completely. So what do we want to do here? I? Think we just want to return right? Because this task is done. So if we do that Whoo, right so we're back to where we were and For once I've actually made things better At least thank you At least that's my opinion right we It's not as good as it was what we started it's 100 lines of code now whereas before it was like 20 but It's really efficient It's just as efficient as it was with callbacks and We're back down to a single get function It's a little more complicated than it was once and it's probably because we haven't implemented the sort of rich Convenient methods of a real async framework if this were a single. Oh, it would be just about as short as the blocking example would be Because we wouldn't have to do quite so much manually wouldn't have to like make her on non-blocking sockets and stuff and and by Substituting yield statements for callbacks We've managed to get all the way back to a single function again. We have normal exception handling We have normal Local variables we don't have to pass them around in closures anymore It's beautiful again, but it's got all the efficiency of single-threaded concurrency So let me see. I wonder I have to blather on with the conclusion. Okay a couple of minutes So let me blather on with a conclusion now We learned two cool things one of them was technical Co-routines are a beautiful way to do asynchronous concurrent processing in a single thread It means you can do lots of concurrent IO without Allocating a separate thread for each socket so you can scale out Highly concurrent network applications much much more efficiently with a style than you could with threads but they're almost as pretty as threads and If you want to learn more about that I've set up a page for you at bit.ly slash co-routines here and That's got a bunch of links out to a lot of much deeper information It's got both the code that I wrote just now and it's also got a Link to a chapter that I wrote with Guido van Rossum, which has a much deeper and broader Co-routine example than this one the other So it's this the sort of technical hooray but the non-technical aspect of this is Look at everything that we were able to understand in just one session. We went from absolute nothing all the way to a Co-routine-based asynchronous framework this thing that looks like it's totally magical is Actually completely within your grasp you can learn it you can learn not just how to use co-routines But how they're implemented from the bottom up and it really isn't that hard It really doesn't take that long. So if you follow that link and you read it in a little deeper I think that Really grasping this awesome new idiom is completely within your grasp Thank you very much. So we've got a few minutes for questions if you want Go ahead. Sorry. I didn't get that Take the microphone So if you want to write an app using their browser and I try to take as much as resources of the server for my application It can't bet right because I block in other process to sell system resources. So how can you revamp it? Like it you know you as much as a socket I have For my applications and it's just like revamp another application to use the socket on the server So do you know the way to revamp that from other developer that just try to make their process as fast as possible, you know And just take away from other application Sockets You get the question or no, I'm sorry. I'm not totally following. So like So we have a limited number of socket we can use on the server Yes, and if I using this across and they try to wrap as much as a socket for my application Right, right. I got it. Yeah, so you need to have any suggestion to revamp that happen on the team So the question is once you remove threads as a bottleneck how your next bottleneck is likely various caps on the number of sockets that you can open and how to remove that on Linux there are various ways. I don't know all of the specific commands, but I have heard of Linux boxes running a million concurrent connections It requires a bunch of kernel tuning You probably have to run as root, but it is very much possible to end up with only memory as a bottleneck, which is actually pretty exciting because You've reached a point where hardware itself is your bottleneck and that's the bottleneck that you want to have That means that you're using your hardware to its fullest Another question. All right, great. Thank you so much everybody Welcome to the This is the third session of the developer track at scale 14x. It's sponsored by Percona and I have to, you know, read their little marketing blurb with more than 3,000 customers worldwide per Kona Percona is the only company that delivers enterprise class solutions for both my SQL and MongoDB Across traditional and cloud-based platforms. So, you know, I appreciate them sponsoring this And I'm gonna go ahead and turn it over to our Tyler Koy. He's a Jinkins infrastructure lead So I tried to do something different for this presentation In that it's all self-contained in a Docker image So if this is how you can run my presentation if you're on the scale 14x Wi-Fi And you want to see me get beat up by a bunch of people in yellow shirts. Everyone download this at once It's about a hundred and ten megabytes and they will probably hurt me pretty badly if everybody does this at the same time But it'd be funny to watch. I mean, anyways So I'm assuming that most people here have heard of Jenkins in some form or fashion Jenkins has been around for about a decade A lot of people use it for continuous integration Continuous delivery or automation of any form of fashion and the really big strength of Jenkins is this very broad Plug-in community. So there's about a thousand plugins and to date I don't think I've found a use case that I had for Jenkins that there wasn't some plugins to make it nicer for and Because because not a lot of people know this I wanted to highlight this there's this organization called the SPI Which stands for software in the public interest and that's who we're affiliated with and they the same sort of organization that holds the Debian trademarks and I believe Postgres and a couple other other projects were Fully formed mature open source project and I'm really proud of the work that we've done to get there And I wanted to highlight some of the sponsors of the project and these are not Sponsors in the traditional sense. These are a lot of companies that for one reason or another have chosen to participate in the Jenkins community Cloudbees is a big one. Cloudbees is my my employer now as of a couple months ago They do Jenkins Enterprise, but they also fund a lot of development on Jenkins open source Which which we're really thankful for and the second really big one is the Oregon State University open source lab And you'll find Justin and Lance running around at the conference today. The open source lab is one of the primary reasons that the Jenkins project has been so successful When we when we parted ways so to speak with Oracle a few years ago The open source lab really helped us out a lot and then PagerDee, DataDog, Atlassian, and Rackspace Red Hat all of these companies have helped us out in one form or another Without expectations of payback or anything. I like to reference them because I think it's a nice thing to do They support us and in various ways like DataDog has really helped us out with some of our infrastructure and PagerDee as well But these are some companies that support open source just because and I think that's really cool Need to even so I work for Cloudbees, but I've been in the Jenkins community for the past seven or eight years now I originally started because I was using it was called Hudson at the time And the project was on Java.net and I couldn't find answers to the questions that I had So I went to IRC, which is how I solve all my problems as I go Complaint on IRC until someone points out how dumb I am and I fix something So I started participating with the Jenkins project then and as we grew I started to participate more and more and not being a Java Developer I started to participate in the infrastructure side of things and as of last year sometime I became a Jenkins board member and I'll be one until someone votes me out, I guess But the talk that I wanted to give is largely around infrastructure and for the scope of what I want to talk about This is what I mean by infrastructure We all know like machines like an AWS EC2 instance or a physical rack that goes in or a machine that goes into a rack somewhere But we're also talking about configuration management Managing files services secrets, you know passwords and credentials and things like that But also the packaging tooling to package up our applications and get those out and the deployment tooling that drives those out into a production environment and then the monitoring and alerting that tells us whether that's doing what we thought it's doing That's infrastructure as far as you know the next 30, 40 minutes or however long I talk That's what I mean by infrastructure In the Jenkins project we have a lot more like if you think about Project infrastructures like Debian who here thinks about all of the services that run the Debian project Like nobody no one ever considers the actual infrastructure behind all of these great big open source projects And so when I got involved in the infrastructure for the Jenkins project, we have a lot of stuff going on We have a very vibrant plug-in developer community. So we have to have built infrastructure for those developers We have to have JIRA and Confluence so those developers can work not only on Jenkins core but also on the plugins effectively and someone's got to be responsible for that and We also have release infrastructure Koske who I think left he was in here taking pictures That's the founder of the project by the way if you want to come by the booth there You can meet the guy who created Jenkins, which is a pretty cool thing about this conference But he really believes in releasing often so for the last decade Jenkins has been released once a week a version of Jenkins and for better or worse the size of Jenkins the actual file that we Distribute has gotten bigger over the years as well So a distribution of the Jenkins application is a hundred megabytes right now So every week we are releasing a hundred megabyte archive to tens of thousands of people That are pulling it down testing it or evaluating it for one reason or another and that's just Jenkins itself Then there's all of these thousand plugins around it Which all have to be distributed and there's infrastructure that goes behind that and someone has to be responsible for it So in the beginning and I wanted to I wanted to give you guys a brief sort of history of how we came to be where We are with our project infrastructure We were originally on a site called Java.net which was like source forage, but even worse ten years ago and We're quickly outgrowing it like the only other successful project on Java.net that I can remember was glassfish and Hudson at the time was becoming more and more popular and bugzilla that was provided through collab net wasn't working Distribution or excuse me through Java.net wasn't working and distribution wasn't working so we started to provision our own services to run these things and Kosuke who worked at Sun Microsystems at the time his modus operandi for doing this was he would go find machines in data centers around Santa Clara where he worked and He would basically commandeer them like is is Sort of fly-by-night operation We would find machines in one closet and that became Jira they would find machines in another and that became like a build machine and Everything was set up manually So the way that I got involved with with Jenkins project infrastructure is Kosuke gave me root access to a bunch of machines which had no history which had no means of being repeatedly provisioned and You know if you screwed something up on them You screwed up the project because you could only log into that one instance of a machine and Futs with things to try to get it to work And that was awful like genuinely objectively awful. So in 2014 we I learned to puppet where I was working at the time and like like Someone in the early 20s when they learned something new they're like great I'm gonna do this for everything from now on forever And so I was really anxious to apply puppet to all the things that I that I was responsible for so we deployed masterless puppet and we deployed masterless puppet because Puppet with a puppet master was terrifying to me as soon as I got to the point of Provisioning an open SSL certificate. I threw up my hands and said screw this. I don't want to do it I don't want to play anymore. This isn't fun And we deployed now use and we didn't really have a development Environment for puppet at the time and I don't think we were unique in this case I went to a meet-up in in 2011 in San Francisco and we had a testing Infrastructure round table and if you can imagine like 30 people sitting at a big conference room table going How do you do it? How are you doing it? Like that's how people were testing infrastructure in in 2011 because they're just one of the tools available and we were no different at that time so in 2014 We started working with puppet enterprise because I asked nicely and puppet gave us puppet If you work for an open source project, don't just ask people for free stuff and if they say no, it's okay But if they say yes, then you get cool free stuff So we also started using data dog and pager duty at that time and we started to make a more mature Development stack as well by this time our spec puppet existed, which was really cool server spec exists And I'll explain more about what these are, but we had the tools for me coming from a software engineering background I all of a sudden had the tools to describe my infrastructure the same way I would describe and validate any other piece of software and So this is where we are right now but like I said infrastructure is hard and Open source infrastructure is even harder if that makes sense like if you're in an operations role Someone is paying you to endure pain when you're in an open source operations role No one's paying you to endure pain You're just doing it because you enjoy getting paged at night and that's what makes open source infrastructure hard and it's also Why Jenkins infrastructure is difficult because no one wants to volunteer for some of those challenging things and Because we're cheap and we just take whatever people will give us we have assets across four different data centers Where I've worked previously we had assets in one data center and that was difficult at that time But for the Jenkins project with this Spartan infrastructure with very few people who are sort of guiding over it We have a multi-site infrastructure, which makes things difficult Now this kind of brings me to continuous delivery and it's it's a logical Reap and I'm asking you to leave with me because it'll make sense in so the big reason that people don't like continuous delivery Don't practice continuous delivery in my experience talking with people is that? It hurts like every time I touch production something goes wrong every time I try to do a Deployment I've got to go through this checklist of 20 different things to make sure that someone somewhere has made sure that this is the right in the Right spot. This is configured correctly and we can do a deployment and the nice thing about continuous delivery is It gets better when you do more of those deployments like the more you focus on it The more you take those things that suck and you focus on automating and fixing those the better it gets And so instead of once a month Or if you're really unfortunate and work at a big giant corporation like once a quarter or once every You know six months doing a big bang release that just obliterates, you know an entire week of your time Doing these little incremental incremental releases. You have these little incremental pieces of risk and It's really the big difference with continuous delivery and another concept that people will talk about continuous Deployment is with continuous delivery. Our goal is to get changes We're making in a state that's ready to go live It doesn't mean we have to deploy every single commit But every single commit should be in a state of ready or not ready And then if we decide to deploy that's up to us But everything should be vetted enough to where someone can pull the trigger if we need to pull the trigger as you know In a business you want to do that so you can get new features new changes fixes to your customers Effectively and reliably in the case of open source infrastructure You want to do that because no one's got time to deal with broken deployments when you've got an all-volunteer team Yeah, I think continuous delivery is neat But like I said infrastructure is hard There's a lot of different things here and they're very different things It's not like everything is a piece of software here. Some things are just physical machines Which have different requirements? But to get to that continuous delivery world We needed two ingredients in the in the infrastructure Ecosystem we needed testability of our infrastructure and reproducibility. So I want to focus on testability first Because that's like we're getting to Jenkins. Don't worry. We'll get there But if Jenkins can't run tests and you don't have any any tests to verify that what you're doing with your infrastructure is correct All Jenkins is going to do is deploy code for you. You're not going to really get a lot of benefit from it So if you're not already familiar with Sort of I would say the two most important strata of testing in the software world We've got unit testing and acceptance testing and I'll focus on on unit testing of infrastructure First, but we've got a couple tools and I mentioned these before we've got our spec puppet Which will go through and then serve respect for acceptance testing and unit testing in Puppet sort of requires you to know how puppet works If I could just get a show of hands, who knows how who's used puppet before? Most people all right, or maybe half people half people will go half