 My name is Dee Scoper. About half a year ago, I proposed for Fujitsu to work with the community to build an app auto scaling project. There were some rumors from IBM that they would open source theirs, so I thought it would make sense to work with them. That all took a while, so in the meantime, I've taken over the PM job for the CLI, and in January I proposed it to the community again, and got support from SAP and IBM joined us then. And now, as you know, about a month ago, we went live two months ago, and Michael Franco, who will lead this presentation, has taken the job of the PM. And Bo Yang has been on this project at BlueMixell. He's most familiar with the actual solution that we're starting out with. Would you please? Sure. So what we're going to do is the slides will show you where we are or were, and then explain what it is we're providing as part of our MVP release. And then more importantly is what I'd like to do is really get an active Q&A to understand what people are really looking forward to see once we have the foundation in place, right? So right now, as Dee's mentioned, we're incubating out of the Cloud Foundry incubator. If you go there, there's actually a repository that has working code. However, it's based on CouchDB, so that's one of the reasons why we recently, as of last week, re-incepted to move over onto a data store that we would prefer, which right now is Postgres. So we are moving from a NoSQL to a SQL model. And at the same time, we've decided to re-architect exactly how the foundation of autoscaling will actually work, so that way we can actually move forward with the requirements that, at least across the initial three companies that are participating in this project, see in the near term. So as mentioned, I'm the project manager for this, and I'm lucky enough to have teams everywhere but the United States. So for me, being in the East Coast, I have teams that are 12, 14, and 9.5 hours off where I am. So I either come to the West Coast, or I go somewhere else so I can talk to them, or I just make it work. So the common question is, what is autoscaling? Well, in today's world, it's, I can scale up and I can scale down. It's pretty easy. All you need is a human to just hit the button every time you want it to happen. So obviously the whole point behind the project is to automate this. And the real question is, what is this? And from here, I will let Bo describe exactly what it is we're providing and how that really works. I would take some time to give you the basic idea of autoscaling service, how it works, and also introduce what are the interfaces we provide today and maybe in the future so that you can interact with the autoscaling service. So there are multiple ways we can automate the scaling of your Cloud Foundry apps. So one way I guess most people can think of is, you know, we may want to scale the application based on the metrics. The metrics may include some of the, you know, metrics that your application is using in the system resources like CPU memory or some of the metrics that, which is specific to your app. Like in the Java, we have a JVM heap. And maybe sometimes you want to scale your application based on some of the internal data structure that your application is using. You have a queue in your application. You really know that basically if the queue length is too long, you want to scale up. So you can, maybe you want to scale based on the length of your queue, you know, of your application. So with this actually, the autoscaling service provides a way for you to scale dynamically based on these kind of metrics. So, oh, sorry. Okay. Let's go to the next one. I will, you know, go back to this, what are the interfaces we want to provide, right, to interact with the autoscaling service, right? So with dynamic scaling, basically it is, I would say it's a feedback controller, right? It's pretty much common in the mechanic engineering and also pretty much common in the computer system. So the feedback controller is basically something, you know, a control system or management system that you basically regularly check your target system and make a decision how to do some changes and apply those changes to your target system. It's not a one-off action. It's something like, you know, a regularly and continuous optimization. It's a sense and a response in a regular way, right? So the metric-based scaling is such a way. Basically, the autoscaling service will collect the metrics from Cloud Foundry and decide when we want to add the instance or we want to remove the instance and how. How this means that how much, how many instances I want to add, how many instances I want to remove, right? So here I give one of the examples, which is basically memory. We want to do the memory-based scaling. So the graph above, please show that, okay, at some time at a non-Cloud, there are more traffic coming in. I may need more, you know, memory for my application. And then after some time, like after 11 o'clock p.m., then the traffic goes away and I need less, right? So how we are going to do that with the autoscaling service? Basically, the service itself will, you know, collect the memory usage from the CF system. And then they said, okay, there is not enough memory because there we define some rules. This means that, oh, okay, if the average memory usage is greater than something like 17%, because your application needs more. So the current instance, if you compute the average memory consumption, it's increasing. Then it's greater than 17%. You want to take some action for SQL out. So you specify, oh, I want to add two more instances, right? Then the system will take that action. Well, because your memory demand continues to increase, so this kind of change will not satisfy your need. So you go to the next loop to, you know, go through the same thing. It's still, you know, the memory usage is greater than 17%. You take another, you know, increase of your instance until you satisfy the needs. So that is basically the idea of the, you know, memory-based, you know, autoscaling in this case. So there is a similar thing when you scale down, scale in your application, right? You don't need so much memory. So you remove the instance automatically based on some of the rules. The advantage is obvious, right? You don't need to take care of what kind of workload of your application, right? The autoscaling service itself will just sense and respond. Well, there are some drawbacks, obviously, right? Because there is a control loop, so basically you need time. Sometimes you really don't want to make it so fast because if there is some noise or if there is some incident workload, you really don't want to scale out and then scale in again, right? It will cause kind of a fluctuation. So your control, maybe you intentionally slow down that action, right? So this, as a consequence, sometimes it's a little bit behind the resource demand of your application. In order to solve this issue, actually, not fully, but one of the ways to do that is that we provide another way for scaling. It's called scheduled scaling. So if you really know the workload of your application, you know, given time period, you need more instance or you need more resources that you can do based on the, you know, scheduled scaling based on time. So in this instance, basically means that, okay, just before, you know, 7 o'clock, 15 minutes before 7 o'clock, giving me at least 180 instance so that I can get prepared, right? And also the 15 minutes maybe give me my time to warm up my instance so that we are ready to serve your workloads. So this is the best idea. If you look at this, actually, the schedule scaling doesn't define the metrics but define only the time period and also the minimal instance number and the maximum instance number. It's not going to define the zero one because you may want to kind of select flexibility for the number of instances even during this time period. So during this time period, if you already define the dynamic scaling policy like the memory based is still applied but within the new minimal instance number and maximum instance number. So that is how the schedule scaling works. And a little bit about maybe I could go best because we build as a service. So basically we follow the CF service broker API. So once you want to use that, it's pretty simple, like you use other services. You create a service and bind the service to your application, right? It's done. And when you bind your service to your application, one choice is that you just specify the policy or the rules in the commands, in the parameter when you do the bind services. There are some other ways to define the policy. There is an API exposing the ways you can create policy, you can attach policy, you can detach policy and delete policy, right? I will not show the JSON format of that policy but this will show some kind of UI. Actually for the open source one, we don't have this UI but we show this, trying to give you a big idea what are defined in the policy, right? Basically there is minimal instance account, maximum instance account for your application. It's a similar case for when you define other scaling group in Amazon, right? Web services. And then there will be for the metrics one, you can define several rules, you select the metrics type. In this case, because now today we start with memory, you select memory and you select the scaling of the rules and also specify the scaling rules. Like in this case, if the memory utilization is greater than 80%, I want to increase one instance. Well, if it is less than 30%, I want to decrease, right? And you can increase with a specific number of instance. You can also increase or decrease with a percentage, right? I want to increase 10% or 30%. I want to decrease, you know, 10% or 3%. There are also others to control, detail the parameters, control the whole outskilling activity. Well, I don't want to describe that. Maybe, you know, if we will go to the open source project, there will be detailed documents seeing how you control others like, you know, what will be the statistics period and how you are doing the statistics and how to, you know, kind of conflict what will be the cooldown period for when you do the scaling, you know. And also, schedule the policy. We provide two kind of schedules. One is, you know, repeating policy, right? Basically means you specify start time and time. And also, you repeat on specific days, right? In this case, you can repeat on Wednesday, Tuesday, some days of the week. Maybe in the future, we can consider, you know, some days of the month or some day of the year as well. And besides that, you can specify a specific date. You give the exact, you know, the day, exact the start time and time. And, you know, the minimum and maximum instance number. So besides that, besides you cross RUD the policy, you may want to, you know, say how it does, how the other scaling, you know, does with my app. So we also provide kind of APIs so that you can retrieve your scaling policies. This means that whether the scaling is successful or not, when it is started to scale, and what is the reason why we do the scaling, right? A number of information is here. Okay, this is pretty much I want to share about, you know, how it works and how you can end up with other systems. Right, so for MVP, we're focused strictly on memory and it's easy to explain and understand, but it provides the most simple way for us to deliver something where, again, it's the framework that's most important right now. So where we're looking to next is, you know, other things that we can scale based on which will come after. We're looking at collecting response time metrics from the router. So that way you can do it either based on throughput or total response times and decide if you need to scale in or scale out based on requests taking longer or shorter, right? Which is similar to how memory works, although for a lot of, well, so if you use Java as a bad example, where memory pretty much looks flat from the outside, you don't have a lot of real insight to it but your response time will be more relevant. Other things we're looking at is providing a way for application developers to actually provide custom metrics because in the Java case, you could record the actual heap usage, which is a more representative usage pattern of what's really happening. So, you know, you can see that, yes, you're only using 30% of the overall memory usage that appears from the outside, so you get better granularity. Obviously, in order to do that, it's something... We're discussing different ideas as to what we expose to the application. Obviously, it would be a way for the apps to push this quote-unquote metric. I mean, to us, it would just be numbers that we just, you know, using the basic rules is, yes, you exceeded some value, we don't know what it is, we don't care, and magic happens. The devil's in the details on who does the aggregation and if you want averaging and other capabilities. So that's part of the reason why, by starting with at least one thing, we get a basic understanding of how much we will do as part of the autoscaling service versus how much has to be built out when somebody may want to introduce a custom metric where part of it may be a two-part thing that aggregation is an onus on the application developer since we don't really... We may not understand instances and other nuances. So again, there'll be some experimentation when we get into custom metrics. And much like response time, there's the opposite, which is throughput. And as Bo gave examples, there are things that are obviously external, which is the router that we can get metrics from, but there may be more throughput optimizations that are internal, like if you're using a queuing system or you're using some back-end service, and it may be related to the throughput to those services that you want us to either scale in or scale out. The last thing I put up was CPU, I gave it a funky-looking question mark. And the interesting thing to that is right now, I'm proposing that we don't even go near CPU because it's a thing that makes no sense. It makes actually little sense given the platform that we're given and the actual metric that we're given because we actually are only given one number of percent CPU, and we really don't know what that number means because it can fluctuate from as low as, you know, if you're on, let's say, an 8-core box, we may get a number that says you're using 1% CPU and I can get a number that says you're using 800% CPU. And I don't know, and I don't think anyone can relate as to is that good or bad, right? Is do I scale out or in when I have 800% CPU usage? Right, because it could be you're just using the box because we have extra capacity and you can burn that much. So, you know, trying to scale in because it appears high is not the correct action. It would have made sense if we had known more about the CPU usage of the host, how much is actually left over, what you're using and if you had a cap. But we don't have that information and we can't easily get that. So, right now, CPU is off the table. So, that is it for our presentation, but I'm more interested in what people are, people being users of the system are more interested in when we talk about auto-scaling and what they're looking for in a type of system like this. Anyone? Anyone? Questions? Right, and again, right, people have to realize that these types of automated systems really don't react. The purpose isn't to react fast, because like in your example where you see the number of requests go from 1 to 10 instantaneously. Right, as long as it's a sustained number of requests. Right, and that's really what the system is trying to do is trying to smooth it all out. But yes, given the approach we're taking, you can have any number of rules. One of the issues that will be interesting is when an application has multiple rules, you can conflict. And so there has to be a prioritization that obviously scaling in takes precedence over scaling out if you have multiple actions that need to take effect. And the moment any action is actually taken, Beau mentioned it but people may not have caught it, is that you do enter what we call a cool-down period where no action can be taken, because we need to actually let the system catch up that even though we say scale up or scale out, it takes some amount of time before, one, that action happens, and then two, that it gets reflected in the metrics. So there are interesting effects that can occur where people want things to happen faster than what they will do. Now, it doesn't stop anyone from doing what I showed on the first slide, which is scale the app in and out yourself. However, you do have to be careful because you do tell us the minimums and maximums that are expected, and so if you happen to go outside those boundaries, we may actually rescale you back into the boundaries that we actually know about. And that does depend on what triggers us to actually detect some of this, because again, we're focused on the rules, not so much the schedules are only evaluated at the time that they are triggered on the edges, not in between. So the service broker is just there as a way to inject the policy into the system, right? And from that point on, as long as the application is running, the rules are in place, and you can change. So we're starting with you provide us the policy per app. We are looking at, and we've designed a way to do it where you can have pre-built policies that are named, and so when you bind the app, you can just say refer to that policy if that's what you want. It makes certain things a little bit harder because you have to first go to the service, define the plan, or define it on the service instance, and then when you bind, you then do the attachment. Okay, so you talked about scaling based on resource usage and performance metrics, but what about price? For instance, if you run AWS, you don't want to pay 4,000 resources just because there's some contention. So the question is, can we scale based on price? So the way I would turn it around and say that's what I would view as a custom metric, and you have to somehow relate the pricing to us because we're not going to know that. We have no idea how you're being charged, why you're being charged. The onus will be on whoever implements the custom metric to relate some piece of data that takes into account pricing so we can do the right thing. So I mean the end user would want to specify the price to the capacity in some sense. And then you know the cost, so you could compute that, right? Right, so for us, again, it's all black box. So we would see things as numbers. The fact that it represents a price, okay, fine. But we would need, the trick is we need the metric that we can just hook up and just compare the numbers because that's all pricing really is. A high price, a low price with thresholds. You don't really, you know, it could be marbles for all we care about, right? At some point it doesn't matter. Thank you. No, but I can control the instances that could be dry. So it depends on how price factors in. So if price is really based on number of requests that I'm interacting with the service, and that's really driven by the number of instances of an app, right, hypothetical, right? If I scale down the number of apps, I'm also potentially scaling down my cost potential. Right. I have a quick question about the, when you push an application that's already been auto-scaled higher and you're specifying resource limits in, say, the manifest, will that then cause the application to be scaled back down to those initial limits and then the auto-scaler has to kick in again? Yes, so the question is if your manifest specifies a number that is outside the bounds of the min or max, what will the auto-scaler do? And yes, the auto-scaler could care less what your manifest says because it's just looking at what you specified in the policy and what the actuals are or your desired at the time. And your manifest is really reflecting the desired, but the policy will always try to override that, which is why I was saying there are cases where you can fight the system, but the system is going to keep reacting and potentially undo whatever it is you're trying to do. Correct. Count me as one who was perhaps snively hoping that you could scale based upon CPU. I was wondering if you could talk a little bit more about that problem. I understand the number of cores problem. If we could provide a factor perhaps to the auto-scale service, perhaps we know all of our execution agents have eight cores on the box. Are there problems beyond that you could help me to understand? So, one, I already know that that's not true, that people do have different numbers per cell or endpoint agent because I made that supposition as well, and then I was corrected that it does vary, plus the fact that operators are allowed to put in over-commit factors so we can lie about all this. And the problem is that the metrics we get back are not a reflection of the lie, but are actual reality. So we have two... Well, we could get... One is we would have to get more information, and then we would have to get enough information that the lies don't matter, right? So, over-commit becomes a real issue where when you start lying about your capacity, you're purposely trying to drive your CPUs lower, or the usage per application lower in a highly high density type environment. But when you have low density, right, you have freedom to consume all the capacity, in which case the numbers don't actually are flipped and the way you're specifying thresholds don't make sense. Right, so a concrete example is normally in a traditional workload, you worry about, oh, look, when this box goes to 90%, I'll get an alert and say, you know, I need to spin up more instances. Okay, but on a box that isn't used and you go to 90%, well, hey, that's great because your one instance is handling all the workload that you need and the fact that it's a 90%, well, why do you really care? I mean, perhaps it could be an indicator that more could become, and I mean, if I chose a threshold that, you know, if my app goes over 80%, maybe I won't. Right, so what we can't do is we can't tell if that is really, and you may not have a good, you know, so you could argue that, yes, once I go to 90%, you know, spin up more instances, you know, and we could do that. The question is, is that correct for most people? Not sure, but I mean, we could add that type of supports for CPU. It's just unclear that, you know, the fact that it's not so much 90%, it's the fact that, oh yeah, I can hit 800%, and so when you set the threshold at 90%, and all that means is, well, you have less than a core, right, and the fact that, yeah, there's room for eight of the, right, you can use all eight, but I won't let you because I'll keep spinning up more instances instead. That's when, you know, you're getting, you're paying for more when you should be paying for less. Right, so that's the struggle, is if I had known, oh, there are eight cores, and oh, by the way, nobody else is using this excess capacity, you've basically used all of it because you can. That's when we have to think about, okay, scaling out isn't potentially right at 90%, it's more correct when I'm hitting the threshold of the box. So it's like 90% of the available capacity versus 90% of what I would think of what my app is using. Which might be impacted if I have other applications on the box using 50% as well. Yes, and that's why you need, you know, more information about the box and the overall usage of the box independent of your app. Okay, thank you. So you've talked about horizontal scaling here. Do you plan to support vertical scaling where you will increase the size of the container when the memory goes up? No, that's your problem. I mean, the problem is, I mean, there it's somewhat arbitrary because in reality I would have assumed you've already sized your container to fit your application profile. And part of the problem, again, is like for Java. So some of the languages are more geared toward that type of workload, but if you're doing Java, right, the heap is tuned to the container size. So unless we can get heap metrics, we can't say, hey, let's grow your memory. And to do that, now we're getting into potential outages because the moment I say, hey, I want to repush or scale out and increase memory, right, things have to be not so much restaged, but restarted everywhere with larger containers. But we do have zero downtime deployment available in CloudFormby, so that could be leveraged. Yeah, except it doesn't happen in all cases. So you have to be careful that you could have, I mean, you can mistakenly cause things to happen if I try to do certain things. I mean, for that type of scenario, part of it is how much should we do as part of auto scaling versus how much should you be doing through your own monitoring of your application to make sure that things are sized properly. Okay, thank you. With respect to both the auto scale options, which one has precedents? Between what and what? Between scheduled and... Right, so the question is between rules, triggers, and schedule, what has precedents? So they actually do two different things. Schedules are all about setting or resetting the min and the max boundaries, and the rules actually are the triggers to cause an action to be taken. So they actually work somewhat in conjunction, where the one is what we call is dynamic, right? So the triggers or the rules are the dynamic things that are constantly being reevaluated, and schedules just occur on some interval, right? So when an interval occurs, the min and the max potentials adjust, and that could cause something to go up and down, but the triggers or rules are the things that get evaluated to determine, do I increase or decrease over that schedule? Right, so they actually work together. So they're complementary instead of opposing each other. Okay, sure. Any other questions? Okay, well, thank you for coming and listening to us talk.