 Okay, so I think I'm going to get started. So my name is Thomas. I'm working for IBM for the Cloud and Smarter Infrastructure Organization in the CTO office. And with me is Lakshmi, a colleague from IBM Research. And we are going to talk about heat and enterprise applications. Before we go into details, I want to do some level set, what to expect from this presentation. First of all, some terminology. What do we mean with application orchestration? And when you talk about orchestration, many people think about different things and about different levels of details. In our presentation, we talk about the deployment or we understand the deployment of application components, including the underlying infrastructure. And also the management of a deployed application throughout its lifetime when we say application orchestration. When it comes to enterprise applications, it's typically large scale deployments that have high requirements on scalability, reliability, performance, and so on. In the presentation, we want to share experiences in that field based on two solutions that we work on in IBM. So one is application orchestration and smart cloud orchestrator, which is actually based on the Oasis Tosca standard. And then Lakshmi will present a project called Viva, which is a higher level DSL, especially designed for DevOps scenarios. And we want to relate this to heat, of course, and show what we do today in addition to what heat does today, but also how we already can use current functionality. And then also outline what concepts we think are important for software orchestration and heat and give a, well, a view on possible future directions. I want to get started with a couple of examples to give you a better feeling for what kinds of workloads we are talking about. Based on those examples, I'm deriving some common requirements that we think are important for software orchestration. Then we will present our solutions that we have today for software orchestration and how we use heat in those solutions. And finally, just a couple of words on ongoing activities in the heat community. It's also important, I don't want to make this a design proposal kind of presentation because that is done in the design discussions, so I won't go down into those details, but just give you a brief summary of the discussions that are going on. So let's get started. So this is not actually a, I wouldn't call it a complex enterprise application, but I wanted to get started with this because this is known to everyone in the heat community. It's the famous word, as example, but it already allows us to highlight what we think is important. So for software orchestration, it's important to us to have software components modeled explicitly instead of having everything collapsed into a script which is inline into instance user data because that allows us to reuse things so we can reuse an Apache module in another use case, we can reuse MySQL in another use case, and it also makes sense to separate it from the actual application workload because that gives us a higher reuse factor. Looking at another aspect, if we want to make the web tier scalable, it has some impact on networking. So in this example, we say this thing, the application gets deployed into a private network, and if one thing is scalable, I have to care about things like load balancing, I have to care about floating IPs and so on, but in order to keep the model portable, we actually don't want to model the low level details of networking. I don't want to go into floating IPs or into a neutron network resources, but I want to keep it very abstract because that allows us to have this much more portable because one provider might use Nova Network, another one would use Neutron, and then in Neutron, you can have very different layouts, but to not hard code a model towards such a layout, we want to use abstract modeling concepts. I have some more details about this later on. Next, it's getting more complex, so this shows a multi-tier SAP application that we also want to orchestrate. Three tiers, as I said, there's one server which is hosting the database instance, and SAP has a couple of storage volumes attached according to SAP practices. Then there's a central instance, again a server which is hosting a central host component that shares configuration data and binaries of other components in the system, then a sub-central instance where you basically log on and then which distributes the load to dialogue instances, basically the worker nodes in an SAP system, and then there's a dialogue instance which is handling the actual user sessions, and there's a connection between all of them via NFS, so typically the central host shares, binaries, config data, the config data is used by the database instance in the binaries and the config data are used by the dialogue instance, so it's quite complex graph of components and dependencies, and what's important, what I want to convey in this picture is that from the dependencies, if we have a well-defined semantics, we can derive the complete processing flow, so starting from the bottom, the servers don't have any dependencies among each other, so I can deploy the complete infrastructure in parallel, I don't have to care about any synchronization, then based on the relationships, we can basically walk up the tree, so this thing is the next thing that can be processed, it only relies on the server being up and running, but it doesn't have dependencies on other things. Next, since this component here shares some data, when that one is up and running, we can export some directories via NFS, once that has happened, we can process the other components on the dialogue instance and the database and mount NFS shares, after that we can start the database because that one is depending on config data shared via NFS, when that is running, the central instance can be started, and finally, the dialogue instances can be started. Another important thing in such a model are connections between components, and since that tier here is scaling, so I can have many dialogue instances, I need a way to react to the scaling event because in the SAP case, I have to update profile configuration data on the central host so that other components in the system know about the new instance, so to this connection, we have some signaling semantics attached, and by the way, this is powered by Oasis Tosca, so that's actually a model that has been built by SAP and the company called Vynomic, so they built this according to a standard and we can run it in our orchestrator today. The third example, and this is actually one where Lakshmi will give some more details later on, is a collaboration platform in IBM-IBM Connections which is based on WebSphere, and in this example, I want to highlight some requirements on placement policies, connectivity constraints, and so on, so the application consists, or the deployment consists of a cluster of application servers, WebSphere application server, we have something called WebSphere deployment manager which is a central component that you use for managing all your other components for deploying workloads, so applications onto the application servers, and we typically, or in a production environment, we have many of those clusters and the deployment manager is connected to each of those, and then we have WebSphere in front, in this example, two web servers that are to provide high availability and they route the traffic to the application servers, and in the back end, we have an HA configuration for the database or two databases that provide the data for the application. Now what's important here for production environments is that we have constraints on how to place all those nodes, so for one such cluster, so for this group here, we have an anti-collocation constraint that says at most two of those nodes must end up on the same rack, and then for all of the application servers, we have a constraint that at most two must end up on the same physical machine, so that allows us to spread it, to reach performance goals and also to have high availability. The same here, so for the web servers, we want to have them on separate racks. For the database, we want to have them on separate racks, and we also have constraints on the connection to the storage volume, so it must be directly attached storage to be fast, and we have a latency constraint on the database, on the network connection between the two nodes because the replication must work very quickly to have a hot failover configuration, so there are a couple of constraints that influence how that model gets deployed into the infrastructure, so based on those examples, I just explained I'm going to summarize a couple of requirements that I already mentioned, so first of all, and most important, we needed a clear modeling of software components for the reasons I mentioned, ability to reuse things, and we basically want to have the software components also as stateful entities because an orchestrator must know when one component is up and running so that it can start with the next component. We have dependencies with clear semantics defined, and that allows us to derive the processing flow, and for multi-instance components, like many instances of a cluster member within a cluster, there might be a special constraint, for example, in the case of WebSphere, we cannot join all members to cluster in one shot, but we have to do it one by one, that's just a thing that must be followed in WebSphere because you cannot join the nodes parallel because they depend on updates of the configuration data, the cluster-wide configuration data. To ensure portability of application models, we think it should be possible, and we think it's a best practice to tell an orchestrator as much as necessary about the infrastructure, but as little as possible, so not go down into details about port configurations, about VLANs or subnets and so on, but just keep it at an abstract level. And finally, in many scenarios, we have requirements on placement, so there's a requirement for the concept of policies. And then, well, all of that was primarily looking at deployment, but deployment is only a very tiny fraction in the lifecycle of an enterprise application. In SAP, so SAP guys, for example, told us that the deployment is about 7% of the lifecycle of an SAP system, and when we want to manage the application after the deployment, we have to think about scaling, and especially scaling based on application metrics, so in many cases, it's not enough to look at CPU load or memory, but we have to look at transaction rate, number of concurrent user sessions in a system, and that then drives the scaling on the infrastructure level. We need to be able to properly handle events like scaling failover or any changes to the topology on the application level because just spinning up another server or activating a software component is not enough, but in many cases, you really have to update other software components in the system to make it aware of the new component. Updates should be possible for long-running application, ideally online or in a rolling manner, and finally, you can also have complete custom flows that you maybe cannot even express just in a topology model, but you need the ability to have like a workflow orchestration system work with the instance data of the deployed application. So now let's talk about some solutions that we have for addressing those scenarios. The one I'm going to talk about is based on Smart Cloud Orchestrator, and here I'm just showing a very high level view. So Smart Cloud Orchestrator has a pattern management component that has understanding of patterns and can do software orchestration. It makes use of an automation library, so a library that contains the installables, the script, automation modules, and so on, also like the schema definitions of components of what properties they have, how they can be changed. It sits on top of an infrastructure management layer for giving us VMs, IP addresses, and so on, and that actually talks to the OpenStack APIs to create servers. And then there's a workflow orchestration component that can interact with the deployed patterns, or you can also do some pre-processing and post-processing by means of workflows, and we support the TOSCA standard so you can feed TOSCA models into the system and let it manage those patterns. So again, to tell you how the general flow is during deployment, the pattern, when you deploy a pattern, for all the infrastructure resources, a request goes to the infrastructure management layer, which then goes down to OpenStack. OpenStack gives us the infrastructure resources, creates VMs, and it also bootstraps an agent inside the VMs, and then the agent talks back to the automation library, so it downloads the instructions for how the software should be configured, and it also gets the information from the pattern what the dependencies between components are, and then sets up the complete software on top of the infrastructure. You can have interaction with a workflow orchestration system, so either you do pre-processing or post-processing, or you manage the application throughout its lifetime. Next, I want to have a closer look at the infrastructure, so the software-defined infrastructure handling in the system, and that's where I actually come to the relation to heat, to where heat is really helpful already today. So we have that infrastructure management layer, and that is, the current design is that it requires some setup of pools, it requires setup of networks and IP groups, so pools of IP addresses that can be assigned to the VMs. It also requires the pre-allocation of storage pools, and then out of those pools, it allocates resources for the pattern deployments. And what we want to do is to add more flexibility to that infrastructure layer. For example, we want to be more dynamic, we won't have to have custom creation of networks on the fly instead of just relying on pre-existing networks, we want to have richer set of topologies, and we want to have support for more types of infrastructure resources. So we thought, how can we solve it without implementing all that complexity in our infrastructure management layer? So what we did is we thought, well, heat does all this for us, so we don't have to implement it, and that makes life much more easier. So let me show you how we do this. The use case, for the use case, again, using this SAP system, and let's assume we got this pattern from SAP, that's for a multi-TRM system. It has identical configuration for each trainee in a class, so that they can all use the same documentation and can follow the instructor. An important thing is that SAP system ID is also included in that fixed configuration. Now a thing with SAP systems is that an SAP system with the same system ID can only exist once within a network because there are discovery things happening on the network, and you can deploy the first system, but the second one won't come up with the use the same system ID, or it will even corrupt the other system. So what you can do is to deploy each of the SAP systems into a different network. So the pattern includes the definition that each system should go into its own private network, and then you give the floating IP to access the system to the respective classroom participant. Now, how do we treat this? We are using Neutron in OpenStack for the purpose. So we have a base setup like the base tenant network, and for each system we want to create a new network and deploy it into its own network. So we basically treat the complete pattern as two different parts. So we tear the part into an infrastructure part and into the software part, and then for the infrastructure part, we create a hot pattern and give that hot pattern to heat, and the hot pattern basically contains the definitions of the VMs, the networking definition, the volume definition, and also bootstrapping information for bootstraping our agent. And then heat takes that one and brings us from here to there, so it creates a new network with the VMs in that network. And when the VMs are up and running, our agents get bootstrapped, and then they connect back to the software orchestration layer and pull down the information about the upper part in that pattern and set up the software on top of heat. And what's interesting is, so that those are screenshots for that scenario taken from Dashboard, so for the thing that looked pretty simple in the pattern, so in our pattern editor, we get 20 opens-degree sources, and this is a screenshot of the topology view in Havana which, well, you cannot see what in that screenshot, what the resources are, but it already shows it's pretty complex, so there are a lot of dependencies between the resources and you have to care a lot about when they, how they are created, how the parameters are passed, and heat does all that for us. So observations from the prototype, as I just mentioned, heat brings enormous value for the infrastructure setup, so in this use case, we use 10 resource types, 10 types of opens-degree sources that heat supports, and there are many dependencies between the resources, and we didn't have to implement this in our infrastructure management layer, but we could hand off all this complexity to heat and basically just do one call instead of doing quite a number of calls, and it also, well, it reduces the complexity and it also offloads a lot of processing to heat that we don't have to do in our stack. Next observation is that, well, as I mentioned, a relatively simply pattern from a user point of view turned into a pretty complex infrastructure setup, so if you, well, and if you follow the best practices to keep the infrastructure model abstract, you can make the pattern much more portable. So in the pattern, we just had the notion of a network to symbolize that each system should go into its own network, but we didn't define like neutron networks and subnets and ports and routers and router gateways and so on, but we kept it abstract, and one thing where heat is also helpful is that it has the concept of provider template, so you can have different provider templates, that's it, it's a heat terminology where you have lower level resources defined and you can use that as a more abstract concept in another template, which allows you to, for example, switch from NOVA network to neutron or between different neutron configurations that you want to use in your data center. And then finally, the agent bootstrapping allowed us to do software orchestration on heat as it exists today, and the software orchestration works because we have the dependency handling and everything in our engine today. What we are currently discussing with the complete heat community is to add more capabilities for software orchestration so that heat also handles the dependencies between software components, and then it will work the same for other software configuration providers. So Chef, Puppet, or Simple Scripts can also be deployed as software components and then if heat does the orchestration we can maybe reach the same functionality. Finally, I want to say something about autonomic behavior. So in that prototype we built, we actually can have autonomic behavior in two different layers. So our agent framework can do things like scaling or monitoring of components, it can trigger failover behavior, and it can do this also based on application metrics. So you can have agents configured to monitor application transaction rate and throughput and user sessions and so on, and then it can trigger scaling in the infrastructure. In heat, you can also do auto scaling, but today at least what you get out of the box it's based on infrastructure metrics, so like CPU and memory load. I guess more will be possible with the salometer integration, but what you get out of the box is infrastructure scaling. Now if you would enable both, you get a conflict because if there are two chiefs in the house, they will compete with each other. So what we are doing to solve that for now is that we don't use heat auto scaling, but we do the monitoring completely in our layer of the stack, and then we use the stack update functionality to trigger the addition of another server from the application layer. At this point, I will hand over to Lakshmi, but maybe stop to see if there are any questions. All right, then I hand over to Lakshmi to tell you a little bit about Vivo, which is another technology for software orchestration. Thank you. So just to give you some background, this is a project we started at IBM Research about two and a half years back, and the goal was to, we realized that deployment automation is only one part of a bigger puzzle or a bigger piece, and this continuous integration, continuous deployment provides a very strong use case or scenario for deployment automation, and most of the time, we have found when talking to customers or people, they're ready to actually take their effort or time to build a model to automate their deployment only for the benefits they get later on in terms of such continuous deployment or continuous integration. So given that, from day one, Vivo was born in the DevOps world, so we, so it's, okay. I shouldn't look down, I think. It's a Ruby-based DSL, and we made the conscious choice of having, preferring a code as compared to having a visual description we felt, and many people really liked having a concise description, so it's a Ruby-based DSL for continuous deployment, and it enables or it allows the developers and ops folks to come together and describe what they want for deployment. So there are parts of applications that are described, application components. So what you see in this side is an application developer's role who describes application components, for example, a database component or a web server component, et cetera. And then you have the ops side of the puzzle, which where an ops specialist would describe the infrastructure components. Here infrastructure could be just virtual resources, network, physical network resources, or even middleware components for that matter. And then the ops folks and the app folks would sit together or the dev folks, and then kind of bind or map application components onto the middleware and infrastructure components. And as you would know, there could be multiple mappings, one for a test environment in which probably you want to map all the application components onto a single, say, virtual machine. And for a production, you would want a different. So what it allows us is to nicely capture the variations in terms of different deployment scenarios in this environment topology, and then keep these two pieces separate so that they could be independently reused. And so it enables separation of concerns in terms of dev folks describing what they want and ops folks describing what they want independently. And also it allows reuse of those pieces independently. So a given deployment is described in this DSL using these three different components. And then it is actually, there is a compiler that takes this model, compiles, does some dependence analysis, and does some validation, and then does a deployment. So currently it can deploy to EC2 to OpenStack and a couple of other clubs. So what I want to talk here is we wanted to both validate our premise that these kind of different views are needed, and they really add value. What better way than to actually apply it on a big application? So what we did was we took this IBM connections, and Thomas already presented a picture of this earlier, and this is an application that is reasonably large, has all kinds of traditional enterprise components, and also it is actually used by thousands of IBMers every day. So it's a real application. So one of the advantage was, hence we could go and grab the ops folks and the dev folks and bring them together and ask them to describe their pieces of this bigger puzzle. So that way one could really validate. It was not an easy thing, but we're getting all of them to actually do that because there is both a lot of inertia as well as a lot of legacy stuff that they have, and the continuously question is it really worth it? So at the end of the experiment, we were happy to see that they were convinced reasonably that it was, it is really worthwhile. So in a sense it's a validation for his goal too, right? This is a DSL we explored, but he is aiming at the same space, and eventually it will be this dev and the ops folks who will be together working on developing heat templates and deploying them. So that's kind of the bigger context. And this application, so let me see how am I doing? I have five more minutes, okay. Basically it consists of web sphere application nodes which are grouped into clusters. What you're seeing in the middle. And it has these web servers on the front and a database node and an NFS server on the back. And it has constraints in terms of availability. So a couple of constraints is the collocation and anti-collocation policies that Thomas already covered. So I'll skip those and go to this one. What are the difficulties we faced? So first, this was all done manually using a bunch of like, for example, even the installation of this whole application is using some custom installers. So there is no visibility into this installation and hence it took a while to actually derive all the dependencies and put them all in a reasonable automation and structure so that actually this could be automated. So we wanted to not just automate running those scripts, we wanted to actually go deep and automate the wiring of the infrastructure components as well as wiring of the application components all automatically. So, and like I mentioned earlier, the agility requirements actually motivated the effort. In the sense they really were doing very slow releases. They wanted to do offer like more frequent releases of their of the Connections app and also maintaining the configurations across releases, versioning them, et cetera, was getting very hard. So with all the availability and performance requirements. So what I'm showing here is a snippet of the bigger description of this application. What I want to highlight here is one could ask why use a DSL? And one of the things that a DSL brings is very concise specification. And what I'm showing here is you saw those four clusters, right? So this is a Ruby-based DSL and it's a loop in Ruby that allows you to succinctly say that this is what I want. Now imagine you are deploying a big Hadoop application and you want like hundreds of nodes and each node has let's say an IP that is based on its ID, which is something that is part of this numbering. Then you could easily specify all those things very concisely as compared to having a spec that is really unrolled in the sense replicated. So again, this is related. The second thing is it also allows you to actually attach external scripts. Here in this case, we use Chef as a configuration management tool. It allows you to attach external automations to the node so that it could be directly leveraged. One of the design principles we have is to be able to use automation of the Chef. In the sense the community writes Chef scripts, one should be able to just take it and use it. And which means we cannot expect them to change anything in the script. And that really paid off for us later on because then we asked these folks to bring the scripts that they use today, they could just reuse them. Otherwise changing the scripts is a big nightmare. So it allows us to do that. And one of the things that allows us to do that is actually the include and also this kind of late binding. This is a way to express that a value that is actually used for some configuration is actually going to be produced by one of the entities that is created during a deployment. So we distinguish between values that are available when a deployment starts and values that are produced as a part of creating a resource like an IP address or as a result of a software configuration like a URL or a port number that is produced. So this is a construct that we have that allows one to very neatly say that this is a late binding of a value to a variable. And then we have the constraints that Thomas also talked about that's all attached. Moving on, I want to just highlight one more point which is the orchestration, software orchestration that is needed to do this deployment. And I'm just showing a small in some sense slice of the picture because I'm not showing all the values that flows in, there are quite a lot of them. And what I want to highlight is there is actually several layers of stuff that happens. This is something Thomas also highlighted. That is the infrastructure layer. What I'm showing here in the green is the calls made to the infrastructure layer to create the base resources. And then that is the middleware that is installed on top of it. And it has its own dependencies in terms of how it has to be set up. And the deployment manager for the middleware here actually requires a particular kind of dependency structure that needs to be preserved so that first it needs to start, then the deployed nodes need to start, then it has to push something into the deployment manager. It'll compute a profile for each one of them, then push it back to them so that they can put it in their nodes and get started. And then there is an application that is put on top of it which has its own. So you could see that the dependencies that needs to be satisfied are at a very fine granularity and it appears at different levels. There are dependencies at the infrastructure components level, then also at the software components level. And so deploying on OpenStack, I just wanna give the big picture view now. So what we do today is take the viewer source in the sense of description of it. That is a compiler that does an analysis and maps it and directly, it uses the OpenStack API to deploy this. And when we do this, we use ZooKeeper for software orchestration. And what we would like to do is instead actually generate a hot or heat template and delegate the infrastructure creation, orchestration, and the software component creation and orchestration to heat so that we can actually leverage a lot of nice features that are there in heat. So to see the feasibility of it, we did a prototype of that for some applications. And in that prototype, we use the current features in heat. And we are pretty excited to use the software component features that are coming up in heat. And in the current prototype, we had to use ZooKeeper to do the coordination but we're hoping that would go away once heat has its coordination so we could just use that. So that is, okay, thanks, Lakshmi. So, and to wrap up the presentation, I'm going to quickly talk about some activities in the heat community. So one of the hottest topics from my perspective has been the discussions around software orchestration to be added to heat and to the hot language because you see, we have two solutions and IBM at least two, I must say. Who are doing software orchestration and who can leverage heat as it is today for the infrastructure and then add agent-based or ZooKeeper-based orchestration on top. And like in IBM, there are many solutions out there who have the same requirements. So I think the community is, it's clear to everyone that something common like this should be implemented and also in a common project. And the key goals for the software orchestration thing is really to go from the currently inline scripts in user data, for instance, as and the concept of weight conditions, weight handles, which make it very complex to write a template sometimes, move that away towards a clear software component model. We have ability to model your software pieces to define dependencies, data flow dependencies and so on in a hot template to have better flexibility, a clear separation of software from infrastructure which also gives you more flexibility in deployment topologies because you can say two components should go on the same server or on different servers without changing your component definitions. So that's all features that will be enabled. Another design principle is to not duplicate things that are done out of where in other technologies like it should be possible to use Chef and Puppet without implementing the, well, the concepts that they already have. So it shouldn't be anything to add platform, economics to software orchestration, but for that you could use Chef. And another very important point, it should be user-friendly. So it should be possible for a normally skilled person to write a heat template and define software components and how they should be deployed on the infrastructure. And well, there are actually two discussions, at least from my perspective. And we had those yesterday in a design summit session. So one is what are the hot constructs so that the constructs in the hot language to define this? So will we have, how do we express software components? Also how do we express dependencies and data flow between components so that we can do real orchestration? And the other one is how can this be implemented? So like how are the software configuration tools bootstrapped inside the instances? How is the metadata passed to those tools? How is the signaling done to signal when one component is ready and the next one can be started? So those are all things that need to be solved. And again, here we can see what of the existing mechanisms in the engine can be used, but without surfacing some of the complexities that we have today in a template. Another discussion we brought up was this discussion around policies and placement. So like we said, there are requirements for placement, for high availability, for performance reasons, constraints on networking connections and so on. And I would say this is not a heat or hot only discussions, but it requires very close interlock with other projects. So a lot of this has to be dealt with in Nova or Cinder or Neutron. But from a heat and hot perspective, I think we should think of a way to annotate a hot model in an intuitive way to, well, allow template writers to express those constraints. And then we have to discuss a way how this metadata gets passed today to the underlying services. So can I add something here? So when I mentioned the way we currently did the experiment, so I did not show an important piece there. In this picture, actually, here, in this place between the DSL compiler and OpenStack API, there is a component called a placement engine that is actually sitting in between and trying to find placement of placement and resources that satisfy those constraints, like the co-location policies, network policies, et cetera. And some of our colleagues here are now working with the Nova Scheduler Group and other groups to actually see how that kind of a placement, more intelligent or more constraint-based placement, could be actually incorporated into other components in OpenStack. I think once that evolves, then this will become even more relevant, the ability to express these kind of policies at the heat level and then be able to exploit those underneath them. Yeah. So with that, I'm ending the presentation. So we are really looking forward to an exciting ice house development cycle. And I just want to say it's been great working with the heat community. So I started in April and I learned a lot. And in some cases, I'm sure we will have requirements and say this and this must be done and it's always good to have all the experts to talk to because sometimes they just tell it, oh, it already works, do it like this and that. And that's actually amazing to get all this information and work on things, work on closing the gaps together. So I'm looking forward to this. Yeah. I also share the same thing. I should thank the heat community. They have been extremely welcoming and for all the discussions and all the debates. So thank you very much.