 Hi everyone, thanks for coming to the last session. I am Alan. I'm a site reliability engineer at Bloomberg. I Don't run cloud foundry. I used to run it in a previous job V1, but now I'm a Hadoop engineer Today I'll be talking about Autoscaler and some math behind it to develop some intuition this was originally Kubernetes talk. It's a more of a personal research project of mine on autoscaling and auto configuration in general So about me, I wrote some a few books and courses in Docker two years ago, so they're very outdated now That's just the way it is And so let's let's go to Our the distributed systems we run in general like whether you're using cloud foundry or Kubernetes so basically you have an application and It has a certain number of instances in your server farm it receives traffic and then as it receives traffic to process a request it consumes the CPU of that server farm and then that CPU usage gets registered in your monitoring system and Then depending on the thresholds you set the goals you said let's say I want CPU to be less than 70% You initiate an event you page your operations person They wake up at 2 a.m. They log into their boss jump box and then they run CF scale and then they go back to sleep Okay, and then let's move to the air conditioner So just from an overview the architecture looks similar, right? so an air conditioner basically releases coolant to the environment and the and then when when the weather Let's say the weather becomes hotter like the sun rises The the temperature in the room decreases no increases. Yeah, and then the sensor in your air conditioner gets registered with that temperature and then the thermostat sets compares the actual temperature with the temperature you set And then since it's too high it will Configure the coolant valve to release more coolant thereby lowering the temperature So the the difference is is that the air conditioner doesn't have a temperature reliability engineer that gets woken up at 2 a.m. So that your Room gets cooler, right? So you use It's all automated in the thermostat so we have this in most Cloud services Kubernetes and cloud foundry. They have you so you have your CF auto scaler So it's basically the same you have your number of instances and then it and Then it you you get your CP utilization from the it's a slash stats endpoint and then CF auto scaler or Whatever you have well CF auto scaler doesn't work that way but It gets it compares the two the actual value and then the target value and then it adjusts the number of instances accordingly according to your rules Engine so in a sense, it's similar to an air conditioner in that it's autonomous self-correcting and self-regulating But we can learn from the math in an air conditioner that has been around for a hundred plus years on What makes an effective auto scaler and how do you know if you need to override your auto scaler and step in and stop it, right? so the math behind an air conditioner is called control theory and the textbook definition is Basically it shows How you can take advantage of the dynamics in the system and by dynamics we mean How is the input related to the output? So in terms of our Application the how is the input the number of instances affecting the CPU given a certain load so control theory is also used in other areas in automation like your car screws control your power adapter and events and We can use the dynamics of these and the math to develop the auto scaler, right and We can use the models as well to identify if we have a good auto scaler so the history of control theory started during the start of the industrial revolution, so this is a figure Of what they call? centrifugal governor it Basically regulates the speed of a steam engine so basically your steam engine is connected to the flywheel and then the flywheel turns and then these two weights will go up and when it goes up It adjusts the valve so when your engine is too fast the the the balls go up And then the valve gets shut so you have less steam in your steam engine and it will slow down So this made The machines reliable in terms of our performance So these mechanization and optimization led to the industrial revolution and made things more efficient and had Innovations like you can dig gold more efficiently and you can have trains with constant speeds So you can have a proper train schedule So looking at the history it actually took a hundred years before Maxwell wrote a paper on how it works So basically in during the start of the industrial revolution for the next hundred years People were basically cargo-culting the centrifugal governor Technology so in the sense in the technology community were doing much better Okay So so it so here's the textbook definition of control theory and Basically, it's the same architecture as the auto-scaler I showed earlier, but you can model it in the following Normally a textbook introductory textbook will show the three Components so you have the target input or your reference value. So here it's our So in in our auto-scaler that would be the target CP utilization Let's say within 70 percent or something and Then you have the actual utilization the output called why and then you get the difference of that and then you get the Error so based on that error you put it in your auto-scaler or your controller and then it gives you the input value you Which in our auto-scaler is the number of instances and then we use that number to run CF scale basically So for our application in an auto-scaler the fourth component that we you will add in These systems is called a disturbance. So Web traffic is actually modeled as noise. Basically, it's noise that changes your output the RCP utilization that So here is an example Auto-scaler so basically you have your sensor that gets your CPU utilization And so that's your output and then you put that in your controller and you get the number of instances and then In control theory in design Textbook they call it an actuator. Basically the thing that runs CF scale and then I add some monitoring to graphite and then I added an interval on when to run the auto-scaling Loop so here's the example output. So the green one is the number of instances and the Blue one is utilization. So if you double the traffic utilization increases, and so they use it that you Double the number of instances and then when you have the traffic you have the utilization. So you have the number of instances So These simple auto-scaler works because you can model the relationship of CPU and your number of instances in a linear model, but not all the time but you can get a lot bang for the using a linear model so With a linear model, you can basically model the dynamics as the relationship between the The input U and then the output T. So U and U input and then output Y. So U and Y are like standard variables when you look at Textbooks. So first you have your input U is dependent on your output so your the number of instances is dependent on the CPU and Then here is the derivative of the CPU. So the change in the CPU. So how fast or how slow is it changing in response to the environment and Then the next equation to model is the output It's dependent on the previous output. So You can see that the system has a memory effect. So the CPU utilization now is affected by the CPU utilization Previously and then it's also affected by the number of instances which makes sense so any feedback loop system in control theory has these four desirable properties I Will explain the intuition In a linear system and the math behind it in each of them So first is stability. So to begin explaining stability, I'll explain it starting with an unstable systems So here the dashed line is the the target Utilization and then the the solid line is the actual utilization. So here We see that the CPU utilization is above the threshold. We add more instances but then the Utilization goes down too much. So we over provisioned. So we go back to Removing instances, but then we remove too much. So utilization spikes up again and so on and so forth you go to this packing force Oscillation and it never ends basically and then here's another Unstable system. So in this case no matter how many instances you put in your application It just doesn't improve at all. So the problem is somewhere else like your storage or database or Network and then here's another unstable one, but it goes back and forth, but it doesn't Explode but it's still just a spad so a stable system is those a system that settles to the target value. So in this case our system should Converge to the target CPU utilization. So here we're still doing Overprovisioning and under provisioning, but at some point we we get it right and This is the behavior that you typically want in your system You gradually add instances and then the performance gradually improve and Then the obvious another obvious property you want in your autoscaler is the accuracy how well Within the bounds are you reaching your target? And then if you reach your target, it also matters how fast you reach your target So you can think of it like in operations the meantime to recovery So depending on your business case you may be able to afford to scale out slower, but in case you want to recover faster you can Explicitly design an overshoot where you over provision at first, but then you gradually Settle down to your value. So that's when you want Overcorrection, so it's a design trade-off so building simple controllers so going back to this Architecture and we know what makes a good controller. We can use a few Controllers with the linear model, so we will design basically the component C here so the first one is Proportional control basically the difference between the You take the difference between the actual CPU usage and your target usage and just multiply it by a Certain number and then that would be your number of instances so this control is inherently inaccurate because you're just multiplying by a Dumb constant and the higher your Factor the longer it takes for your system to settle down and you'll have a lot of overshoots So to improve the accuracy you you try to average Your errors over time so you use some form of integral control so basically an integral basically takes the previous state and then adds the Factor of the error so You can also write this recursive equation as an integral and you'll get the same thing but then an integral controller takes much time to settle down so You want you can add a controller part where you use the derivative or the rate of change to make it more Sensitive, but since you're more sensitive you're also more sensitive to noise in the system So typically in practice people combine the integral and the derivative part to have a smooth behavior but but have the system respond in a timely manner so Even though these linear models are useful. It's important to note that our systems are not linear. There's like a serial component in our Applications, there's communication overhead between other dependencies like database or other micro services So here I just showed that even though there's a linear relation sort of linear relationship between The utilization and number of instances. There's still non linear components that can be attributed to like startup time or warm-up time and other overhead in the system so control theory can It's not only applicable to auto scalers, but to anything you want to be auto tuned and Auto configured for example you want to get the Set your applications keep alive timeouts. Let's say in engine X and See how it affects the response time or your number of connections and your memory in your Machine so the nice thing about linear models is that you can combine these variables And instead of having just a line showing the relationship between one input and one output you can do multiple Inputs and multiple outputs so you'll have either a plane or a 3d blob of End-dimensional linear models but You can start at a small scale. So in summary Control theory allows us to have an architecture of a self-regulating system which basically iterates on feedback and Since we're using linear models that can go a long way the math behind it can show us what is an effective feedback mechanism But then in the end our models are not Completely non-linear because they represent the real world. So it's important to reevaluate your models So if you want to get started on how it works the O'Reilly book on the right is A good introduction it has Python code that simulates like real distribute Simulates distributed systems and then you can play with it And then the the other one on the left is made by some researchers in IBM many years ago And they do a real case studies of like auto-configuring Apache and other things and I have a demo so here. I have the example node.js app in CF. I I add Squared calls several times to make it slow So that we can see the changes and And then I will show the different components in In a control theory feedback loop setup in making an auto-scaler. So first is the sensor Which basically does CF curl in the stats endpoint and I get the average and Then I have the actuator which basically runs CF scale. So it's pretty hackish and Then the meat is in the controller the one I showed earlier And here I'm using a plain Proportional controller so the integral part is zero so basically I multiply the Random factor with the difference in utilization So I have it running here okay, and I send the metrics to Stackdriver so here we have the number of instances here So it's auto-scaling between 20 and 21 and then the CPU average is hovering around 10% and I'm generating a thousand QPS of load So here's my load generator in Kubernetes, which is Just one load one load generator and then so I'll try to quadruple the traffic just refresh until Everything is running and then we can refresh it. So you see the CPU is starting to Pick up and then the auto-scaler is starting to add Instances already can refresh the graph So let's wait until the metrics get loaded in I forgot to show earlier that Like to show my sleeves are empty So I have a plain CF deployment. I don't have CF auto-scaler Deployed so I'm just using this simple auto-scaling shell script. Okay, so the CPU It already added and then CPU increase I think the stack driver load balancer is a bit delayed but We can Wait a bit in the meantime, I can answer some questions if anyone if you have Yes. Ah, yes. So the question was if I have repo for the code. So it's in my personal github page and slash control theory Okay, we're looking for SREs in Bloomberg if anyone's interested Yeah, so I got that out now Let's go back here. Okay. Yeah, so the traffic increased and It shows that I added instances so I can scale it back now So that we can see that We're responding to a spike. So after a while it will go down so Yep, that's it any more questions. Thank you