 So thank you everybody for coming. Like I said, my name is Yaron Peroso. I'm from Gigaspaces. I'm the product manager of Cloudify Cosmo, a project that I'll speak about today. And as she said, we couldn't make it, so he's much better speaker than I am, but he was stuck with me, so bad luck. So I'm gonna do a tiny spoiler here now and tell you what this talk is all about. What are the critical points in this talk? So actually, the reason why I'm here is because I think that workflows and workflow engines make a critical component and critical part in automating applications and automating your DevOps operations over the cloud. And also, I would talk about Amazon Opsworks a bit and I would claim that we're missing something like Opsworks in OpenStack and see how it can fit in OpenStack. What are the projects where it can fit in or where it should fit in and how it should be built. So that's the talk today and we'll start with a use case. So we're talking about a very serious and big SaaS company, Meet Petsy. They're supporting pet artists since 2013 and they're doing great business over OpenStack. So all of their operation, all of their production and testing and everything is done on OpenStack. They have quite an impressive stack here. They have, of course, their web front with the Enginix and the Unicorn and Postgres and all of that. And then they need to ship a lot of information into the analytics part with the Adupe and the Mongo and of course they have all those tools that everybody needs in production such as Nagios and Graphite and LogStash and even the Jenkins for continuous integration that they hopefully one day they get it up and running. So their business is doing great, especially those cat art products but rolling out new code is painful. They have a lot of problems because every time a node crashes, the MTTR is not that great. They couldn't really do any kind of serious integration testing. They don't have continuous deployment and they're looking to improve because they can't keep their margin, they can't keep their customer satisfied with that. So obviously they need to automate and they would like to automate everything and in this session we'll try to help them to figure out what's the right way to automate. So we'll have a closer look at DevOps processes. I'm going to go through most of the well-known process that one would need to have in order to have a SaaS operation or any kind of production operation on the cloud. And I would claim that it's all about workflows and triggers and let me explain a little bit what I mean by workflows and triggers. So workflows seems quite trivial. Everybody knows what a workflow is, what a workflow engine is. We'll see that many of the operations, many of the processes that we encounter when we try to deploy an application to scale, et cetera, they are actually composed out of very complex steps that needs, that has many dependencies, has delicate timings and all sorts of stuff that requires a workflow. So that's why I claim that workflow is very, very critical and we'll see that. And the other bit is triggers. So some operations can be triggered manually and that's fine, but we also need some other kind of triggers because if you want to be really automated, there are many things that needs to be triggered according to events that are part of either a simple scheduler or in more, in many other cases, something that we will call a policy, something that can check a set of rules, check your monitoring, for example, check other conditions and get a decision to invoke something and that something would be a workflow. So if we have a combination of triggers and workflows, I would claim that we can automate everything in a much better way and let's have a look. So the first thing would be deployment. So okay, not every day you need to set up a new environment from scratch in production. It happens more rare. For example, if you're starting a new data center, if you're upgrading your cloud, you're starting a new business, yes, but how about continuous integration? One way to do efficient continuous integration would be to automatically set up an environment from scratch, test and then tear it down. So the automated deployment is probably the most basic and most complex process that we need to have automated before we can go further. So what about automated deployment? So we talked already about the trigger. The trigger would be either a manual triggering or a continuous integration server such as Jenkins. How about the flow? So the flow is composed of what we often call the three layers cake. So we have the IS, we have the different cloud components, whether it's the network, the storage, the virtual machines, they all need to be created. They also have dependencies. So the VNs are dependent on the network. Sometimes they are dependent on the storage. So we need to time the creation of these objects. Then we have the middle layer. We have the middleware, the different servers, the different containers, the web servers, the application servers, the database servers, et cetera. Again, they have dependencies between themselves. They have dependencies with the network, with the VMs, with the storage. So there's a lot of interdependencies. And then finally we have the application artifacts that again have dependencies. They need to reside within those containers and often they need to configure dynamic connections between different components. And sometimes they also have some part in tweaking the middleware and with tweaking the infrastructure of the service. So we need a chain of steps that sometimes can happen in parallel, sometimes it must happen in sequence. And there are different checks across the way that we need to perform in order to know that something actually started and we can invoke the next step in the chain, et cetera. Going further to infrastructure upgrade, that happens a lot. It can be a security patch in the OS that can be upgrade to any kind of your container version, et cetera. Again, now we're talking about something that has to be done gradually. You don't want to have downtime. So now we have similar kind of process, similar kind of workflow, but in the middle we need to act based on triggers. We need to install one node, then pause, see that it's actually working and then add another node. And if not, we need a way to roll back. So it becomes more complex. And then going to continuous delivery. That's even more demanding. So now again, we need to push code. Many times we have to tweak the other layers. Again, we need to apply different policies in order to make sure that we're doing it the right way. If we're talking about red-black deployment, we're already going back to the first process that we've seen, that it's a whole environment setup. So again, we're talking about many steps and a complex process. And node failure. Node failure is something that happens and in the cloud the right way to remediate a node failure is self-healing, automatic healing. So again, we need a policy that would detect node failure that would know that we actually have a node failure. It doesn't have to be a whole VM crashing. It can be a part of our stack that doesn't function well. And in that case, remediation means setting up a new node. But again, doing the entire set of dynamic configuration going through the entire graph of dependencies, making sure that everything is reconfigured. Sometimes we need restarts. So again, a workflow process. And scaling is basically very similar to remediation, very similar to auto-healing. Just maybe a different policy, a different trigger. And again, we're talking about some complex flows. So to sum it up, what we actually see is that the tree layer cake is not something that we can silo. It's not something that we can install, we install the infrastructure and then somehow put the second layer on top and then push code into that. That's too naive of a model because we see all of those dependencies where the different containers have connections between them and have configuration dependencies. They are all dependent on the network and the load balancer and the storage. The application components needs to tweak those layers as well. So basically to boil it down, we need something that will help us to tweak the entire set of components to arrange everything in a way that would be a smooth process, that would be a reliable process. We need some kind of a workflow. So I'll do what we usually do. I'll go and look at Amazon. Amazon should I think serve always as a benchmark and also as something that we want to be better than. So let's look at how DevOps automation is done on Amazon web services. So if we look at Amazon web services, we can see two sets of ways to automate your DevOps. The first pass is this one, do it yourself. I think this is the pass that most businesses, large businesses took so far. And especially in I think in the lack of other frameworks. So starting from the most simple one was the most work for the user. You just consume the different APIs of the cloud and then you complete that with other tools like configuration management tools and then you need to do everything yourself. You need to write your own scripts that orchestrate everything. You need to combine that with your monitoring tools. So as good as you can code and as much resources you can put in there, the do it yourself would take you the far as far as you invest in that. But that of course means you do it yourself, there's no framework. Then you can take it a bit further. Sorry. Take it a bit further and use cloud formation. So cloud formation for those who not familiar with that, I guess everybody is familiar with that. It's a templating framework that basically orchestrate the creation of your IS components so you can create several VMs. You can bind storage to them. In Amazon you can also do some networking, basic networking components and add your VMs to the network. So again, that's only touching basically your ES layer but it's better than just using the API with your scripts. And then you have the higher level services that gives you a bit more convenience. And so in that area we see basically two different approaches. First approach is the PaaS approach. Amazon doesn't call Beanstalk PaaS but I think for all purposes it's a PaaS layer, it's a simple PaaS layer. It will help you to very quickly push code web applications into Amazon. But for you it's a black box, all you can do is push code. If that code fits in what they provide to you, the container they provide to you, then you're done. But for serious customers like Petsy, we need something with a much wider and much varied stack. So we need something like Amazon Opsworks, we need a DevOps automation. So if we look at the OpenStack equivalents, how that looks. So in the do it yourself part, it's very easy. OpenStack would offer almost any API that Amazon web services offer. So whether it's Nova, Cinder, Neutron, Cilometer for monitoring, everything is there. And as we heard today in the keynotes and all this wonderful event, there are a lot of developers working to make it even better. So in that front we're covered. And then there's Heat. So when Heat was founded, the goal was very simple, straightforward. To take, to grab away Amazon users and Amazon users had the cloud formation, so we had to come up with something similar. So basically, Heat took over the exact syntax, the exact templating system, almost the exact API and just build the copycat on the OpenStack. Excuse me for a second. And now with regards to PaaS, so there's OpenShift by Pivotal Labs, sorry, by Red Hat and Cloud Foundry by Pivotal Labs. And now there's a new kid on the block. We're very excited about this new project, Solom. So I think Adrian is here. So if anyone is interested in talking, yeah, sorry. So guys, this is the guy for you. If you're interested in Solom, we are very interested in Solom, very excited that it's coming. And, sorry. And the OpsWorks piece is missing. And I think that we need something like OpsWorks in OpenStack. We need something that will allow us to automate all the DevOps processes and help us to cope with those complexities of the three layers that I just showed. So let's look a little bit closer at OpenStack. OpenStack, OpsWorks, sorry. OpsWorks model is quite simple. There's a stack. It represents the different tiers of your application. So basically the topology. So in each tier, you can put the cloud resources that you want and you can assign the application stack that you need there. And then you can deploy it and you can define the scale groups to scale it. So this is basically a good starting point, but I think we can do better because it's still too rigid. It's still too limiting in the sense that there's a close state machine. There's a close set of stages that you have to use. And if you need other processes or if you need your process goes go in a different manner, you cannot do that with OpsWorks. So we're actually suggesting that OpenStack needs something that is let's say OpsWorks++. And the main differences that we see or the main goals that we would like to set is first of all integrate with heat. We have heat to orchestrate the infrastructure as a service components, so why not using it? It's a great shortcut. It's a great tool. And then we would like also to support cross clouds because we're big OpenStack fans, but users don't want to get locked into any kind of cloud. So our orchestration, our DevOps tool should work across clouds, should allow users to move freely from different OpenStack private and public installations between different clouds and even between bare metal and the cloud. Custom workflows is probably the most important requirement. As I mentioned, different organization will have different needs with regard to the workflows and there will always be new processes that you want to support. So having a custom workflow in the framework is key and that requires basically a workflow engine and some additional considerations. Again, Opsworks limit you to a specific tool. They use some flavor of Chef, not even the enterprise Chef. Different users, different developers have different preferences of tools and also one tool doesn't fit for all the tasks that needs to be performed during the installation. So we want a framework that can actually be pluggable, can play with many different tools. Finally, we want the monitoring and the policies to be open because monitoring is basically one of the most important inputs for policies and policies needs to be any set of rules because the business thinking of triggering an event is very different between one business and another, between one application and another. So keeping a policy engine allowing you to set any set of rules as a policy and as many policies as you want is very critical again to support all those different workflows. So how do we build this workflow piece? So I'm going to introduce you to Cloudify Cosmo Project. It's something we started around a year ago. It's an open source, it's 100% open source project. I'm going to show the architecture and the concepts. Cosmo is going to be released around Q1 2014 and today I'll show you some demo and everybody is welcome to fork this demo on GitHub and try it or come later to our booth and speak with us about the demo. So Cosmo as the title of this presentation indicates Cosmo was pretty much inspired by Oasis Tosca. So Oasis Tosca is an evolving standard for orchestrating applications on the cloud. We don't strictly adhere to Tosca. We're just mainly inspired by Tosca concepts. We really like the way of thinking. We think that when it goes to implementation details there might be two verbose. We don't like XML and so on, but the basic concepts are there and we really want to adopt them. So there are three main building blocks. The first one would be the application topologies. So a topology is exactly what you think about. It's a set of components and infrastructure that compose your application and how it's arranged, what are the dependencies and I'll talk about that in details in the next slide. Then there's the workflows that we talked about and the policies that I also described. So let's look at more details. So application topologies, they are composed of nodes and relationships. So nodes are the different parts of those three layer cakes. So it can be all the infrastructure, the service components, the host, the network, the load balancer, the security groups, all of that needs to be provisioned and arranged in a certain model. Then there's the middle work components, the web servers, the application servers, the databases and so on. And finally, the application artifacts themselves, the database schema, the application code. Sometimes there's more than one model, sorry, more than one model. And so again, the third layer in the cake, it's also a node. The nodes define actions. Actions are the operations, the different operations that you can invoke on each node. For example, install, start, stop, upgrade, scale. Each of these operations is abstract and then you can tie it into something that will actually execute it. And we call it a plugin. The plugin can be anything, it can be Ansible, it can be Chef, it can be Puppet, it can be a Shell script, it can be Python, it can be OpenStack SDK, it's of course heat and so on and so forth. So you can mix and match the tools that you want through declaring the plugins that you want to use for each action on each node when you create a template. Then you have the relationship. The relationships are really critical because at the end of the day, to have a working topology on a cloud, you need dynamic configuration, so you need a relationship to be implemented on the fly. So again, you need the actions to actually configure the relationships and you need to share metadata and runtime data using the requirements and capabilities. Requirements and capabilities are the counterparts. So requirements are the metadata that node declare what it needs. For example, how much hardware it needs or what kind of connections it needs. And then the capabilities is the, for example, the Postgres database during runtime, it will declare, okay, I'm a Postgres database, I am capable of providing connections on this port and this IP. So when the workflow works, it will actually act on the nodes by the order, by the types of the relationship and by the order of relationships. And then you can ensure that your nodes are created in a way that is really functional and each node when it's created, when it started, it has all the dependencies it needs in place. And so the workflow described exactly which steps you take on each node and in which order some things can happen in parallel, some things, like I said, needs to be timed accordingly. The workflow engine will actually read the different metadata from the different nodes and create tasks using the plugins that will execute the workflow. And the last bit is policies. Policies can be used by a workflow, for example, to make sure that the node is actually started correctly and is functioning, is available. It can also be used later to trigger other processes after the deployment has been done. For example, the side that you need to scale, that you need to auto heal, et cetera. So how did we build it in practice? So this is a sketch of the architecture. What you can see here is that the user can push, right now the user can push a DSL, a template written in YAML that describe the workflow and the topology and the policies. Later you will be able to do that from GUI and you don't even need to know how the DSL looks like. So once you push the template or the blueprint, as we call it, it's being parsed and split into the metadata, the different nodes, the topology or the topology information. And then the workflow engine has the different workflows. And then you can start executing any of the DevOps process that you want to execute. So for example, you can start by installing. So the workflow engine will traverse the different tasks in the different steps in the workflow. It will read the metadata, it will figure out which node needs which plugin and which properties and it will create a task over the task worker. By the way, we're using here a salary with rabbit and queue. And then you have agents. The agents can consume the task. So for example, here we can have a heat agent that has a heat plugin and it can actually create the different parts of the infrastructure that we need. In the same manner, we can do it on any other cloud using either cloud formation or a bot or whatever. And then we also have agents that are being installed as part of the workflow on the application virtual machines and they also can install whatever plugins they need like Chef or Puppet or Ansible or Salt and actually execute the creation and the configuration of the different nodes. Finally, we need the policies. So we need some metric collectors. Doesn't necessarily need to be something we ship out of the box. It can be anything you choose. And again, install the collectors as part of the workflow. You can also report to the policy engine from LogStash or any kind of other tool that you want. Finally, the policy engine is a CP. We're actually using Riemann. And the policy engine has set of rules written in closure and that can make decision around the different stream of events that it gets from your application, including schedules and then trigger events. Those events would be either events that you want to consume directly through third party systems or as a user. Or in most cases, it can trigger a workflow. So if we combine it all together, I can install an application. Once the application is up and running, I can add policies to check the performance and availability. And if I'm losing availability or if I'm losing performance, the policy triggered and the policy would actually trigger another workflow to remediate the condition. So basically, this is the architecture. Any questions about the architecture? Yes? Yes, yes. So I'm going to give a demo next. So, and by the way, all the materials of the demo are online. I'll give the URL soon. So the live demo that I'm going to show is installing Mesonite. Mesonite is a Python CMS application. The steps are as follows. Create the VMs. There are two VMs in this case, one for the front-end stack and the other one for the database. Going to install Postgres on the back-end VM. Install Unicorn, which is like an application server that runs the Django application. Then I'm going to install, sorry, install Nginx, which is in this case, it can be the load balancer. It also serves the static content. Then I'm creating the database on the Postgres for the application, pushing the application into the containers, configuring the connections, configuring the Nginx routing rules and finally starting the components in the right order. So let's have a look at the demo. Sorry for that, just a second. One scale. We have a partial look and then later you can see it online. So what you can see here is a network. In this case, I also defined a network in the blueprint. Then there's the web server host. And inside you can see the Nginx, the Unicorn and the Mesonite app in a similar way. I'm going to present the runtime and the progress later once we implement that. So right now you can see how the tool knows how to model in a GUI the blueprint. And then look here. You can see that using Kibana, I can already trace events that indicates the progress from the workflow and from the different plugins that are being installed in the application. What is Kibana? Kibana is Log Stash. Log Stash is a log collection tool and Kibana is a web UI for that, for Log Stash. So right now we still didn't implement a GUI that collects the events and presenting in a more, let's say user friendly way. So we're using Kibana, which is quite easy. The events are in JSON. Kibana knows how to enrich them and how to organize them for reports. You can query, you can sort of that kind of thing. So right now we're using Kibana just to give indication on how well and what's the progress of the installation. And finally you can see the application. It's all, by the way, it's all on HP Cloud. So you can see the application up and running here. If you want, you can actually add blog posts, et cetera, it's working. So that's the demo and let me go back to the presentation. So the DSL and all the different parts can be found in the Cosmo Mesonite example that actually uses the Cosmo Manager, which is the infrastructure. You can get that on GitHub. It's also included in this presentation that later will be available online on SlideShare and on the conference website. So looking a little bit into the near future, like I said, we're going to implement a full, all-blown web UI that will allow you to run any of your workflows, design your blueprints and also trace the progress and the runtime, including collecting metrics and monitoring and getting different reports on those metrics. And how do we see that fit into the OpenStack ecosystem? That's probably the most important question here today. So we see this need already identified by OpenStack community. There was a call for a Tosca-like DSL long time ago. It wasn't implemented yet. We're excited about the Solom project. We think it's a good place to bring this kind of workflow engine into. We see similar ideas in the Solom blueprints. We see the need for a cross-cloud. We see a need for support for continuous deployments and delivery. And as of last week, we have joined Solom and we are here to engage with the Solom developers and the Solom leads. And hopefully we'll be able to contribute the relevant party into Solom. And with that, I conclude. Thank you for coming to this presentation. You can meet us in Boost C27 in the exhibition area. Also, we'll be giving some other panels and talks. And you're most welcome to try Cosmo and follow Cosmo on GitHub. Any questions? Yes, can you repeat? My question was, how could you please explain the integration with it that you are planning to have? With Tosca, because looking at the workflows, looking at the diagram you showed us, it seems that most of the parts are already in it. So maybe it's unclear for me, but could you please clarify this point and maybe tell us the added value, not exactly the added value, but what needs to be done at the moment? Sure, so thank you for the question. The way we see heat, I think that heat is already providing good orchestration at the infrastructure of the service level. So in case our users don't want to bother with creating plugins for the different OpenStack APIs, and they want something that already provide this kind of service, they can create the lower level with using heat by using a heat plugin that we will provide and just adding a heat template into that plugin as part of creating the entire application. Yes, I think... So this could be related also to the question about heat. In the use cases you listed earlier about like updation or patching and stuff like that. That makes a case for workflow, I agree. So I would like to understand when you actually want to reason about those workflows, you would make use of the dependencies that are described in the model, right, in the application. And how do you, like where is that dependencies there? Like for instance, this web server needs this database server to be up or there is some software in this that needs configuration from other component. So it looks like those kind of things fit very well in the node description. And I would like to know like how you see that being used in the workflow and where does it fit in your case. Okay, I hope I understand the question. If not, please correct me. So if I understood correctly, you're asking how do I model the different connections between the nodes and how the workflow acts on that? Okay, so go a few, just a second. Okay, so the topology is made out of nodes and relationships. The relationships describe these. And then when the workflow, for example, when the install workflow goes, it iterates through the nodes, it finds the dependencies, and then it times the creation based on those dependencies. So for example, in the relationship, you describe whether this relationship needs to be materialized before the target, sorry, the source node starts or after. So for example, you have a database which is the target node and you have an application server which is the source node. In some cases, the connection needs to be configured after the target had been created, but before the source had been started or in other cases after the source had been started. For example, if you configure a web sphere application server, it needs to be done after the web sphere has started. So all of that is modeled in the relationship and the workflow understand the semantics of the relationship. The relationship also declare the actions to actually materialize the relationship. So the workflow just traverse the different nodes, finds the relationships and according to that, it executes. Now, we have a star detection policy for each node that you can declare and the workflow will actually wait for that policy to declare the node active. So for example, it won't start materializing the relationship for a node that wasn't yet indicated to be started. Is that answering the question? Well, actually, I'll talk to you more. Okay, you're most welcome. One more question? One more question? Yes, please. Yeah, you said that you use, I'm a member of TOSCA. So you said that you use TOSCA-like templates. Why did you depart from TOSCA exactly? You said that you didn't like XML. Is there any other things that you didn't like about the TOSCA specifications? There can be any way to make these templates you have created similar to TOSCA or compatible? Okay, thank you for this question. So I think that for the most part, we didn't like the XML and some of the verbosity. We think it wasn't user-friendly, although right now we're thinking more about a GUI layer above the blueprints. We still think that there are many users that would like to use different tools, different editors to create the blueprints and even create blueprints by automated processes. So this is the main reason why we departed from TOSCA. There were other smaller reasons. One thing that we started doing exactly like TOSCA and we didn't like that much was the interfaces. At the beginning, we started with the interfaces and then we thought that the actions needs to be very straightforward on the node. So we decided to flatten it a little bit and we're still working on that. However, I think that we can still use a TOSCA parser and translate it into our model or if TOSCA is willing to cooperate with us and find a YAML model, we'd probably like to stick with TOSCA. So that's great news and we will be happy to contribute our thoughts and some examples and to collaborate. Any other questions? Any more questions? There's one question here at the end, in the front. The end? I was just gonna ask, can we see that GitHub URL again, please? Sure. Yeah, one more in front. So you said TOSCA is very similar like DSL. So why you choose TOSCA, why not use DSL? So what's the gap between TOSCA and DSL? Our DSL? Or? DSL, yeah. Yeah, so I think this is a question I just answered. So we wanted to have something which is not XML and less verbals and that's why we created our own DSL and we were using YAML, which is a great tool for that. We wanted to have something very declarative and very simple. So for us, in terms of capability for TOSCA and DSL, so what's the opinion between so for DSL and TOSCA's capability? So for application description, if DSL can describe its application rightly, so does TOSCA can also describe its application rightly? TOSCA can describe, sorry, what? So my point is if TOSCA can describe one application in detail, so does DSL can also achieve this target? Yeah, so right now we still didn't put capabilities and requirements in place, but we're following TOSCA in the way of thinking. We're just changing the syntax a bit. We didn't change any major concepts from TOSCA, so we're following TOSCA. All right, I think that's all we have time for, so thank you very much, Yaron. Thank you, guys.