 Hi everyone. We can go ahead and start. So just one quick tip for you. If you want coffee, then they have coffee in the design center We'll just come from there. So people will be well caffeinated All right. Thanks for coming. My name is Tano. I'm with the IBM Silicon Valley lab in San Jose, California Here we have here my colleague Hongbin Liu from Huawei Julio Rano from the IBM software development group and Kimmington from the IBM Research Center in Beijing JLow is also with us. He is with the China Development Lab, but he couldn't join us at the summit today Right. We also are lucky to have a group of University interns and they have done a lot of work in the prototyping the some of the initial POC and we'll show a demo of their work at the end of the talk today. All right Okay, so We'll talk about scaling and Scaling for VM. I think it's a pretty well It's not a new subject. So it's been a lot of work in Opelstack on scaling VM already but when you add Container to the context then basically you have a new dimension and this is What we will explore today and I use explore because this is the new work and you know It's still very much work in process in in progress So I will start by describing a couple of use case where we'll tie together the scaling at the container level and the VM level Then I pass on to Hongbin He will talk about the current support in Magnum right now for auto scaling and what we are thinking about doing next and Then Julio is going to talk about selling. This is a new Project that would basically provide cursoring as a service. So auto scaling is a part of that Then Cummings is going to close out by describing how you know, we think we can tie everything together and we'll show a demo of the initial POC that we put together for for this summit. All right. Okay, so let's first look at the use case We know that Magnum Container as a service in in Opelstack will build for you a cluster and On that cluster typically you would not run one application. So it's a shared cluster When we scale that's closer, basically we look at some kind of policy to trigger the scaling and Utilization is one easy way to do that we can look at the CPU utilization or memory utilization to drive the scaling and You scale by adding node or technically note away from the cluster. So that's what we do at the cluster level And when we share the cluster, we would run multiple app on the cluster so we're going to assume assume that there is some way to Trigger the auto scaling on the cluster on the app level So so this will be something like watching the latency for the request that come into the container or we could watch the Request queue so many ways to do this, but there'll be apps as centric specific to the app and we'll scale by by just adding or removing container to the for the app some example of this kind of app or the Typical three tier or two tier web app where you have a container that handle requests coming in to the web app So when you have this then you would have some kind of a service level agreement That would dictate, you know how the app perform. So it could be something in the form of That the latency cannot be longer than a certain level or it could be that the queue cannot be longer than you know Something and that would determine the kind of scaling that we need to implement at the app level right, so let's The next project I'm going to show a couple scenario to see what what could happen. So I'm going to show on the top the Scaling that happened at the container level so container here I'm going to be generate meaning it could be You know container if you have a swarm cluster or it could be a part if you have a kubernetes cluster So we will be pretty generic in this term here At the bottom I'll show the scaling that happened at the cluster level. So this is where we're adding host to your cluster So to start out we just have a single app very simple scenario a single app on a cluster a this blue container and We're going to let it up to scale out based on the load on the app So the way the cluster scale is simple We just so watch the utilization on the cross on the host as I know get fully utilized and we just add another node to host more more More container so very straightforward in the coordination between the two level is basically that utilization Okay, so let's look at the second scenario one will have a more app ringing on the closer so we have now a blue app and they and they Red app Okay, first we'll assume that we have an infinite resource on your cluster So now we can allow the app to scale out as much as the one and and with that, but it's no problem We and again the coordination between the two level is just the utilization But the cluster is not unlimited. So we always have limit on the resource So we can always only scale out the app to a certain level. So what happened when you have limited? Okay Let's let's see what happened. So let's suppose the blue container to scale out and because it is a heavy load it's gonna scale out and you rise most of resource in your cluster then the red content container would Attempt to scale out because the load increase and you find that it could not get out because there's no more resource Well, so you say Is this okay? Well, it could be okay. If that's that's what you expect your app to to behave it, but what if your red app is the critical app and The blue one is the lower priority app. Then what you have now is basically in an inverted priority where your critical app Could not use the resource in your cluster. Where's the lower priority app is, you know, hogging up the resource So what do you do? Well, okay, so I can say I'm going to put a limit on the blue container So it's not going to hog up the resource. So that's one quick way to do this We may set a max on the on the blue container. So that worked and Let's see what happened. So blue scale out and hit the limit So it stopped there and then what happened if red doesn't scale. There's no load on the red So now what happened now is that you have a resource available on your cluster that is not being accessible by the blue container and That's not good either so that brings us to the what we really want to have and then we start we want to be able to allow the The app to scale as much as one but have some kind of priority so that we can allocate the resource according to what they need So so what we will do is that we now have a priority on the blue and the red container Right. So let's see what happened. We let the blue scale out because the load of blue arrive first And then when the load on red arrived later, we tell blue to scale back and free up the resource so that the red can now scale out and meet the the load on red. So what just happened is that we now add a linkage from the bottom to the top layer. Basically we at the cluster level Because we hit the limit on the cluster resource We're now going back to the cluster and tell going back to the container and tell it to scale back you know to adjust the the resource usage and with that we now we have two way linkage from the top to bottom We use utilization to to provide the the information and from bottom on top. We use the priority to Adjust the scaling and with that now we have a complete loop and we can Manage the two level of scaling in a smart way so With that kind of scenario to to help motivate the solution, let me now pass on to my colleague Hongbin He will talk about the How Magnum is handling scaling right now So hi everyone, my name is Hongbin. I'm the core reviewer of the Madeleine project So I'm going to talk about how the Madeleine is enable the auto-scaling in the in the casters and So this is the overview of the Maglem's so Maglem is to it's an open-stack container service project and it's basically in it enable users to run the containers on top open-stack infrastructures and It can do several things it can do the provisioning and it can provision the Kubernetes casters it can provision the Docker swarm and the resource and it can scale and by the means it can add and remove no one instance to the casters at the run times and It can manage a set of the container-related resource for example in Kubernetes There's a port and service and replication controllers and in Docker swarm. There's a container resource and So this is a overview architecture of the Maglem's so in the right side There's a madlem client. There's a CLI to talk to the Madlem service and in the madlem service There's a two process. The first one is the madlem API which is the process to process the RESTful request and This API explore a set of the REST resource The most important resource is the abstraction is the Bay. So Bay is basically a represent a Kubernetes casters or Docker swarm or MISOS casters and There's a note that is represent a no one instance and there's a port. There's a service There's a replication controller that is from the Kubernetes and there's a Bay model that is basically is a store a configuration of the Bay and The second process is the call the madlem conductors and This is the process to do the real job. It it basically it can use a heat template and to deploy the Kubernetes casters or Docker swarm or MISOS and Then after the cluster is provisioning it talk to the API of the Kubernetes to manage the container resource and Yeah, so this is the architecture of the madlem's and Then I'm going to talk about how madlem is going to enable the auto scaling and this is the grouping and it's close to finish and So it do the auto scaling by cell step the first step is to run a task that is running in the background to periodically pull the with pull the matrix from the QMAS API and then the madlem will analyze the data and send the matrix to the serometers and if the certain matrix that is beyond the shuttle of the Specifies by the users and the serometer will trigger a lump and then the heat will do the scale of the bait and This is how the madlem to enable the auto scaling. It is Scaling in the Bay levels So on the other hand there's a proposal for the Kubernetes and That is to handle how to enable the auto scaling in the pot and So it do by cell step the The first the QMT introduce a new resource. That's called a horizontal pot auto scalars and this this new resource is going to one-to-one maps to a replication controllers and this component will retrieve the matrix from a pot and then to analyze it and use a building algorithm to trigger the scaling and if it is trigger and the replication controller will resize the pot and this is how the Kubernetes is doing the auto scaling and So we can see that the madlem is adjust the auto scaling in the Bay levels and Kubernetes adjust the auto scaling in the pot levels and So what we are going to propose is that is under discussion is how to combine the Mechanism of the auto scaling of the bay and auto scaling of the pot and to achieve a Solution that is that's more complete. And so this is what we are going to Proposed is the news component that is to the madlem's that's called a madlem auto scalars And this component is going to scale the bait and the application to manage the bay and the applications by manage the pot that's running on the applications and So first it's going to check a list of the replication controller that is in the bay and figure out if this base is a user has created a auto scale of That is associated with this replication controller if there's a not this replication controller is going to manage by madlem's and Then madlem will do the scaling that is that is on the bay and the pot and then It can be customized so the user can specify the data source. It can specify the matrix and the shutout and It can specify the number of nodes that stand by node and this node can join to the casters right away when when the caster is need to scale and So this is the This is the two-level scaling this graph show the two-level scaling. So on the so this is the standard humanity setup and There's a set of node. There's one in the pot and there's a dedicated node. There's one the hipsters and This hipsters is going to Get the matrix from the from the from each node and then we will store the matrix on the storage bed end and on the right side there's madlem auto scale the service and It has Collector says to collect the matrix from the storage it has the analyzers to analyze the datas and it will check the scaling and operator will do the scaling and Operator will talk to the humanity masters to scale the bay Sorry scale the pot and it will do talk to the madlem conductor to scale the bay and Sending is going to handle this also scaling So I'm going to pass to Julio to talk about the sending project Thank you Hong Bing So I'm gonna talk about Senlin and how this kind of fits into What we're talking about here So Senlin was born out of heat the heat auto scaling support as a general a clustering service for cloud homogenous cloud resources and by treating the Homogenous cloud resources as a cluster We can start to talk about the manageability of the resources in terms of Additional use cases beyond just auto scaling We can talk about clusters being distinctly able to be scalable a low balanced highly available and ultimately more manageable Okay, so the central architecture looks very similar to What you find in other open-stack core services We purposely kind of followed best practices and patterns that you already have existing for other services So we have a client component that users can use CLI That users can use command line to issue rest requests to the Senlin service API The Senlin service API can take the inbound requests and map those by RPC to the engine and an important point to part out here is You know all the RPC message handling and processing Utilizes, you know all the common libraries that open-stack provides out of the box So we reuse most of the Oslo modules for doing all our message handling and message processing the Senlin engine to Primary constructs within the engine that are important here that really distinguish Senlin in terms of a general Cluster management service and that's profiles and policies profiles are a specification for The operations and the characteristics of the cluster resources And then pro policies are the set of rules that can be enforced or checked based on actions that are triggered And we'll see an example of this in a demo that we go through later on in the presentation And then lastly here the Senlin database is a persistent store to store information relative to specific policies and profiles for Cluster instance, so that's kind of how everything fits together Okay, so on the left here. There's another view of The Senlin constructs and abstractions and kind of how they fit together on the left-hand side We've got policies and again like I said policies are an organization or a set of rules that govern enforcement enforce and You can do checking based on actions that are triggered. So for example a placement policy allows you to Place resources within suitable locations. So for example, you can talk about placement policy for availability zones regions specific data centers, etc In terms of scaling scaling is a common use case and we'll talk about that within the demo But again a set of rules that allow you to work with resources that are elastic, right? So Provide the ability to provide rules that are enforced and checked for scaling in or scaling out You know elastic resources and resources that change And then we've got other policies that we can provide To support other management use cases like deletion or health load balancing or batching So policies work in conjunction with profiles what you see in the middle and as I said profiles are a description or specification if you will of The cluster resources the resources that the cluster that are part of the cluster And these are homogenous. I think that's the important point piece of point out So so for example A pop profile will talk about characteristics and operations that are unique to that Cluster to the resources that are part of that cluster so you can have operations for deletion for update For joining a cluster that again are specific to that profile type and to that type of resources So profiles policies work together to manage clusters of objects. So clusters of homogenous objects. So in this case cluster clusters of heat stacks or clusters of Nova VMs a cluster of containers, etc Okay, so this is An example of how this kind of ties together again profiles Describe the characteristics and the operations that you can enforce on the members of the Cluster so for this example, we've got a Nova server type and the various different characteristics of that cluster resource And then we've got policies here These are examples of scale in and scale out policies based on a change in capacity That are associated with the cluster and then finally a web hook is kind of what enables The policy and profiles to work on this cluster in this example. We've got a resize action So a web hook takes as input few different pieces of information one is the object that You're targeting the target object in this case a cluster an action in this case the web hook is a resize and Credentials so the web hook provides just basically a URL input and asks for that information You can think of it as like an HTTP callback So in this case as long as you encode the information for this particular web hook You can initiate the resize action and it'll work based on the policies that are associated against this cluster And we'll see again an example of how this comes together So for policies the this is an example of what we're using in the demo simple policy But the key point here is that since we're separating these into their own distinct Constructs we can specify them as being declarative. So in this sense We're using YAML markup to describe the policy and and furthermore what we can do is For mataka we're exploring the use of mapping existing cloud standards such as Tosca policies to enable You know the YAML markup that you have here. So we have a standardized way of describing policies And then finally the trigger is kind of what initiates the action. So key point here on the trigger is Sennlund provides a generic abstraction for a trigger and what you see here is an example of a cilometer threshold alarm. So Users can provide specific and implementations of a trigger to work off of their existing cloud monitoring Service in this example its cilometer, but you could provide Implementation for menasca or surveil or kilo eyes or other cloud Monitoring resources. So with that, that's a little description about Sennlund I'll hand it over to my colleague huming Tang out of China research lab to talk about The demonstration on how this all comes together and kind of how this works huming Okay, thank you for that was a nice presentation of sending I wish one day I can do that so Okay Let's get back to this Topic how we did two layers auto skating and put things together to make it work. So Before doing that Let's take a look how auto skating is being done today on open stack If you have experience you this chart is very easy to read. So basically you will have a template You can then heat will translate that that template into something real some real objects backed by other services such as a server or senior volume or whatever then for auto scaling You need an auto scaling group resource And you need a cylinder alarm resource You will need a scaling up scaling down resource and this whole thing works together to make auto scaling a usage scenario that can be supported The There are some limits there are You know, I'm from the team. I'm I won't say he is bad But heat is not designed to be an auto scaling engine heat is an orchestration engine that is the Mission of the project auto skin can be supported by heat, but we also see a lot of Requirements Say deletion policies place and policies a lot of hooks. So based on that Perception the team has decided. Okay auto scaling should be offloaded into something something stand alone that focus on auto scaling and If you have auto scaling Requirements that is the new project you should go to not heat heat will do orchestration only and do that thing well if you Complaint about all the skating find the same you guys sold them so One of the problem we have today with heat based auto scaling is We have only one verb that is a stack update So like update basically contains everything you want to change If you want to change image ID, for example, that is a stack update if you want to resize your resource group That is another Update so that is Not so flexible. So that's something we want to change the other thing is When you use heat based auto scaling you have basically a outer stack created inside that stack you have Dedicated inner stack created for the auto scaling resource group if you are using Template resource those template resource themselves will create yet another layer of stacks So this whole thing is getting very complicated is is His team is still Struggling on base how to make things work more stable. So if you are using sending We provide more knobs more APIs more options for you to control because it's more focused on managing a group of homogeneous objects, so Insending engine we have basic two important abstractions cluster and node and for no We don't know what it really is. It could be a heat stack. It could be a Nova server, but that doesn't matter as is complete handled by Profile that that's the the mode you want you will use for creating a specific node and Suddenly really focused on how to manage the cluster. Wow In this session we are showing auto scaling scenario We make make it work like this. We have a cluster. We attach scaling up scaling down policy to that cluster and We trigger scale in and scale out actions on that cluster when scale in scale out actions are execute actually or more accurately before those actions are executed the Scale out skill in policies will be checked. So This is how things work and who has just helped present the generic Solution for sending to talk to other monitoring services. The first prototype prototype we worked on is Synometer alarm, but we help plan to support monastic alarm Jacker Q or your your your other data center monitoring systems We don't think that will be a Disruptive change to to the city engine itself so Okay, that's something real Where's the video? Oh Too fast it's getting used getting really fast Okay, basically what yeah Okay, I can use Okay, the first thing you even you will do because We did this POC in the context of Magnum So the first thing you will do is create a Bay model and use that Bay model to create a Bay Then you can see from sending command line. Okay, these are the nodes created From Magnum they this This nodes are from one cluster and that cluster will contain a million nodes and the muscle node for Kubernetes Okay, so we can see there is still one node being created Next when we do bait list. Okay, baby has been created. You can see the details about the bait We will show to IP addresses there and from sending side Okay, we can see okay here the clutter is created and it contains three nodes now Let's see policies Here Actually, we created for this demo two policies and We are showing one of the policy detail also on this screen Two policies one force getting out one force getting in the Details we are showing is for the skinning out policy is ice. Oh, you can see that Okay, let's continue Then we have Here for this demo, yeah, we have Send me web hook list that we create web hook to trigger the skinning out action and We create a scenario alarms that says Okay, if memory utilization from containers is about 70% this Create a new meaning node. That's the rule Look we can see some details Here we have two Screen here the left hand side we are we will create some work load. Actually. It's a spark will work load a kind of toy Application the right hand side we are seeing output from the Magnum conductor Let's see how things work We lock into one of the meaning node and the memory utilization is about 36 or also We do some basic setups to make spark work Actually, these steps can be automated. You can imagine that But for this demo we didn't go Go that far It's a simple Benchmark that will consume a lot of memory we will drive the memory utilization above that threshold and see if New meaning node will be created automatically So now this memory utilization is about 60 Okay, this is the workload. Okay. It started And in its memory pretty soon Okay, 62 percent. There's a I think it's in an average not a total, right? It's an average of memory utilization Okay 72 now 72 is above the preset threshold Let's see what happens here Left hand side screen if you do a sending node list You'll see a new node being created it all happened automatically Why is that? because the Synometer alarm was fired Yeah This is the if you show the alarm picture of this alarm alarm we created is in alarm state there So after the new meaning node created What's that we when we ssh h into the meaning Cluster we see okay We previously created two million nodes now we have a new one and the internal IP address is 10.0.0 0.6 that is a new one and we get its external IP as well And then we ssh h into that new node and see if spark a load Extended to that node as well Okay, here we are we are in inside of the meaning cluster and we saw Three nodes created one hour ago. That's the The time we record a video then one of the node was just created well about one minute ago That's a new node so that's that's the demo and So we still have some time to fall questions So could you explain in the what auto-scaled the application? Containers I understood that the send lens policies Use the profile and policies together to scale out Your number of nodes the number of minions, but what actually scaled the spark controller That That's something we need to figure out. This is a pretty preliminary step towards the holistic auto-scaling solution in the Container space We heard there are proposal in the Kubernetes Community they want to do their own auto-scaling if that thing lands Eventually we need to consider that and incorporate that into the make them auto-scaler service That's the long-term plan, but today we only show just one workflow if you have Workloads increased inside your container the meaning node Underline that can be scaled. That's the demo point. Thanks. Thanks. Yeah. Yeah, that's the traditional auto-scaling It works any questions. Thank you