 All right, well, hello everyone. Thanks. My name is Steve Milner. I work at Red Hat. And I've been there for about 11 years. I'm currently in the Platform Infra team. And for the last 12 to 14 months, I've been working on the Commissaire project, which is for lightweight REST API for node management. So before I start, let me tell you what we're going to talk about. So we're going to talk about the problems that Commissaire is trying to solve. And we're going to move into the prototype that we did and the lessons we learned, the architecture, how you can set your own instance. I'll show a quick demo, and then how you can help the project and join in. So problems. Every project and product has to have problems that it's trying to solve. Without that, it's just kind of a fun little toy to play with. So there's a couple of ones I'm going to talk about right now. The first one being siloed management. So silos are generally bad. The products and projects that create silos are not necessarily bad. In fact, they usually are there to fix issues. However, as an example, host inventories can become a problem. So if you have virtual machines and you have physical machines and cloud infrastructure, you might have your host inventory spread across Libvert, Ansible and Puppet, maybe AWS, DigitalOcean, et cetera. So ops folks hopefully have access to all these things to be able to query and figure out what they have running. Generally speaking, something gets lost. Somebody doesn't have access to DigitalOcean, or somebody doesn't realize that they're using Foreman for certain things. So one thing we're trying to do with Commasare is to provide a common API to pull data from other sources, other products, other projects, other things, and let ops see that easily, query out what do we have, what clusters are they a part of, and where do they live. So that's one of the issues we're trying to solve with Commasare. Another one of the issues is that noted infrastructure a lot of times does not get the love that it needs. So before working in engineering, I worked in IT and other IT organizations as well, where it's really easy to see that people are super interested in the applications that are running. In this case, I used slides.com to do the slides. I don't really necessarily care what's under there. I don't care if they're on AWS. I don't care if they're spread across Libvert and other things. I'm just wanting to get my slides done. Well, that tends to be what a lot of people care about. And underneath the covers though, the operating system that are running those things, a lot of times fall behind on patches, fall behind on updates, things like that, and which also leads to the clusters starting to diverge from each other. So you may have a whole bunch of infrastructure out there, and the cluster instead of all being the same patch level, you'll have various levels of the patch. You'll have systems that are broken off, systems that are off that you had no idea were off. You thought the cluster was healthier than it was, and CommaServe is here to attempt to try to keep these things in sync and together. And lastly, we're trying to simplify cluster management. And in that way, essentially provide a REST API as opposed to assuming that people will create their own scripts or their own playbooks or their own configuration management and execute them themselves. And the last problem we're trying to fix is actually a good problem to have, and that is Linux is full of awesome. So we take for granted all the things that Linux gives us, whether that's simple things like the sys file system or stuff like system containers, which Giuseppe talked about in the first day, and if you didn't see that talk, please do check it out online, or system D, ProcFS, et cetera. So all these things have one thing in common though, and that is the awesome is out of the network's reach. So when you're looking at these things, you generally have to write a playbook or you have to write parallel SSH commands or things like that to look at what is in a cluster. And if we look at some other projects that I've tried to deal with these things in other ways, we can look at things like EtsyD, which took Etsy and put that somewhat onto the network. So obviously it's not exactly the same, but the same type of idea was put onto the network for configuration, and we wanna do that with a lot of the underlying subsystems that Linux provides. And not just at a host level, we also wanna do it at the cluster level as well. I wanna talk a little bit about the MVP architecture that we had and kind of the lessons learned before we jump into the current architecture as well as how you can set it up. Yeah. What is MVP? Ah, minimum viable product. So it's that, basically we did a prototype first and then moved into this minimal viable product before really diving in to make it as stable and as nice as we wanted in terms of an architecture. So initially this is what we had. We had the user or external system accessing Commasare and Commasare was one application, one thing. It was a REST interface, it was a process manager, it was executing Ansible, it was dealing with Kubernetes, it was saving its data to EtsyD, et cetera. We found this actually worked pretty well, but with one big exception, and that was not everybody necessarily wanted to be running containers for every cluster that they had. So when we were moving on to the new architecture, we decided to start splitting things out. So here's a good example of that. As you can see, a lot of this stuff is now split. The same flow is there, but instead of having one kind of monolithic application that's dealing with EtsyD and dealing with Kubernetes and dealing with all these other things, we moved it out into microservices connected to a quote unquote bus, which I'll talk about shortly. But here we also have pluggable authentication, we have logic inside the request handler and have split everything out as much as possible. And what that gave us, it was much easier to scale since you didn't have to scale the entire thing for just adding more services in. It gave us the global work queuing, which is kind of an obvious thing when you have a bus. Better separation of logic, which was easier for development. And the only negatives that we ended up having was that we had three code bases instead of one code base, which meant installation, meant you installed three things rather than installing one thing. So it was very minor when it came to the negatives, but for us it was worth it. So the REST interface is JSON based and versioned, just like you'd expect from a modern REST API. But we also use JSON-RPC inside. So on the outside it's general structured JSON, but on the inside we use JSON-RPC. And out of curiosity, does anyone here use JSON-RPC for anything? Okay, cool, couple hands. The REST interface is also the initial 0.4 logic. And more importantly, we have pluggable authentication and we have TLS built directly into the application server. So there's no need to have Apache or Nginx or anything fronting the application server, you can just run it as is. And in terms of the pluggable authentication, we actually have a couple community contributions that came in for Keystone, as well as the basic auth. And we also have client site certificates that we added not that long ago as well. So system bus. So earlier I said system bus, but I also noted that it's not exactly a normal bus. We use Combo to do a virtual AMQP. And how about anyone use Combo here? Okay, one hand. So Combo is really, really cool. It essentially lets you use other things that are not queuing systems or AMQP systems as if they were that. So for instance, you could use Redis as AMQP or you could use MongoDB as AMQP. And of course you can use Cupid or RabbitMQ or things like that as well. So you kind of get to choose what you'd like to use. And if you already have something, you can take advantage of that for the bus. Internally, we use JSON RPC at this level as well, keep things the same. And for us, it's significantly simplified process management. And it kind of makes sense, again, we're splitting that out from the REST interface, so it becomes a lot easier. And under the covers, we moved from kind of splitting things between Kubernetes and Ansible to just using Ansible. And we've kept all this within the microservices. So as anyone tried to use Ansible as an API, okay, we have a couple hands. For us, what we ended up doing is the reason we split it out and started using it as the shell is initially we did greenlets. So we had greenlets trying to execute Ansible as if it was an API, which means it was greenlets executing threads, executing multi-processing, executing code. And we found some really interesting issues by doing that. So we ended up splitting that out into the microservices and just using commands there. But that also helped us, we helped upstream with the move from Python two to three with Ansible, which was nice. And one other thing we're trying to do, we haven't done this yet, is moving towards standard playbooks. So we have playbooks right now that we utilize, but we wanna start using the OpenShift Ansible playbooks so we're not redoing the same type of work. So setting up, if you want to set up an instance manually, we have other options as well, but if you wanted to do one manually, there's a couple things you need. So first, you'll need a bus. As I said before, it's kombu-based, so you'll need Redis, Cupid, RabbitMQ, whatever, anything that kombu supports. You'll need a place for storage, and we default to etcd, so we put all our data right now in etcd. It is pluggable, so other things can be added. And then the other thing is an optional container manager, so OpenShift or Kubernetes. And by the way, that's just because you have one of those installed doesn't mean all clusters must be part of an OpenShift cluster or a Kubernetes cluster. You can have some that are hosts only, others that are two different container managers. So configuration, I'll go through this a little bit quick. Since most of it is self-explanatory, this is the configuration file for the application server. You can see the bus URI is the combo URI, which in this case, we're using Redis. You can see an authentication plug-ins here. You can have multiple ones. But what's interesting here is you'll see we have a Python package name here describing what plug-in is in use. And in this case, we're using the HTTP basic auth plug-in that's backed by a JSON file. So you can have multiple ones here listed, but you can also write your own and put it in any package that you would want and import it and use it here. So if it was something that you wanted to test before sending upstream or if it was something, say, custom for an internal company, you can do that without having to hack on the code directly. And you can use our plug-in system to do that. This is one of the services, the Container Manager service. Again, it's pluggable. Here we have a Python package that's being used. And note that this is an OpenShift container manager we just named it that. But right now we utilize the same API calls between Kubernetes and OpenShift. So you can use the same handler at this point. Eventually we're gonna split it and we'll have one that's Kubernetes and we'll have an OpenShift one that has kind of more options to take advantage of for OpenShift. But for now we can use the same ones. And then storage is the last service we'll talk about real quick. So when storing data, that's also a microservice on the bus. And you can have multiple storage handlers. Again, a package that's imported. And here we're saying all models are going to be saved using this storage backend. So you can split it across multiple XED clusters. And in the future, if other backends are added, say like SQL or MongoDB or things for storing data, you could actually decide that some of these models will go into different places, whether that is through for auditing reasons or for scalability reasons, et cetera. Now the command line that we use is called com control. And it's based off of kind of the coop control idea. Now granted, you could still use curl or a browser or something to access the REST interface, but com control makes it a little bit easier just to do it as an operator on a system. So the configuration file is simple. Again, it's just JSON. If you don't have one, it'll prompt you for each of these things. But here we have, it's nice to have the username and the endpoint defined and leave the password out generally. So it's not just stored in a file. But you'll notice too, here we have on one configuration, multiple endpoints listed and that's for kind of a cheap way of doing HA. So if you had multiple commissary instances up there without say an HA proxy in front of it, you could essentially just hit, it'll essentially look for the first one that will give it a response and use that, which can be helpful. So here's some of the commands and we'll show it in the demo here. But here's some of the commands that we support today. And we've actually added in the last four days a number of other commands which I didn't get into the slide but you'll be able to see them on the documentation. So most things take create, delete, get and list just like you'd expect. But some things have extra options. So like cluster has deploy, restart and upgrade which are cluster wide operations. Host related stuff has a status and SSH so you can access those machines or see what their current status is. And then we have some kind of, just kind of helpful functions like a pass hash which just creates a B-crypt password hash for the file based authentication backend. So let me show a quick demo here. So we'll look first at the configuration file and here we just kind of have a dumb testing user password there of A and one endpoint which is, this is all locally running here by the way and we'll use com control to list our clusters. And I have one that should be defined already. Yeah, there we go. We'll grab it and see its configuration. So it's OpenShift A1. It doesn't have any hosts yet. It's completely just, you know, brand new. So it's considered status okay because there's nothing that's falling back. And then it's in the default network which is automatically set up when you install Comma-Sare which you can add more networks and associate clusters with networks and all that kind of good stuff. But let's see here, we'll show next how you can install or set up a host record. So the command here, just real quick, this is one way to do it, another way to do it. Before I jump into this is you can use cloud in it to have infrastructure when it comes up automatically join clusters. And we have some tools to create that because it's specific to an organization. So you can use, you know, have things come up in AWS and automatically register in and take care of joining, you know, bootstrapping and joining into clusters and container managers. So just keep that in mind as you see me do it manually here it can be done automatic as well. So we will go ahead and create one based on a local VM. We'll give it the key and we'll add it to the OS cluster that we looked at earlier today. So OS, which is the dash cluster or dash C. And we should get a pretty empty response. That's because it hasn't taken a look yet. Now, if we look at it now, it should actually be pulling information. We got some data and it's in the bootstrapping portion. So now it's installing packages, it's configuring based off of what type of cluster it's in. And pretty quickly, yeah, it's already jumped into the Kubernetes system here. We'll add a couple other ones, as you can see, yeah, it's inactive. We'll add a couple other ones now, just to show, have a cluster that has multiple installs here. And normally, honestly, it would take a little bit longer because of network, this is all local, this is all running off in SSD. And the virtual machines also, I already did a DNF upgrade on them or update on them. So there wasn't a whole lot for it to grab and pull in, but you can see here that the first one's in bootstrapping. If we do a list, we can see all the machines. The second one's already active in the Kubernetes cluster and the third is still bootstrapping. But it should be done now. If we switch back over, we can see, yep, they're all there and they're all ready to take pods on. So as you can see, it was pretty quick, pretty easy. And that's because it was all part of a container manager cluster. And if we look at the cluster command, we can see that there's a couple of operations that I talked about earlier. We'll go ahead and show the restart operation. First, you'll see we have three total, three available, status is okay, nothing's degraded. But we can do a restart and if we look real quickly at that command, takes two options, start and status. It sounds weird to say cluster, restart, start, but you're essentially starting the process across the cluster. So in this case, we'll do it on the OS cluster and it'll go through each one, start and stop, which again will be quick being a local SSD and VMs. So we'll just give it a second and then check to see the status of that operation. Yep, it restarted all three, tells you when it started and finished and its status is finished. So if we just ping one of them real quick, we should see that it is still up or it came back up, I should say. Yep, cool. So that's a quick demo. There's obviously a whole lot of other options and command line arguments and things like that, other types of configurations you would wanna do or play with, but I'm running out of time here so let me switch to jumping into helping out. So there's a number of ways you can help out. Obviously development, anything development, bug fixes, PRs, features, all that kind of stuff, very helpful. We'd love to hear more about what people would like to see in terms of how the API looks. We have our own ideas, but having community jump in and say, well, we'd like it to look this way would be very helpful. Documentation, we're pretty good at documentation but we're not so great at how we lay out that documentation. Both Matthew Barnes, who's another guy working on this and myself, we are good at putting stuff out there but we're not exactly wordsmiths so we could do a lot of help, or get a lot of help there. And then QAN testing. Does anyone here use behave for any type of testing? Okay, no hands, all right. Oh, okay, real quick, let me show you an example of that. This is what a behave test looks like and this is what we use for our E2E testing. It's really awesome, it's English, it's just listing out what you expect to happen and so if you don't have time to write a feature or you don't have the technical ability to say to write those features yet, you can write out these type of stories explaining exactly what you want and these run against our testing infrastructure to verify we've implemented it so it makes it much easier for us to verify and then these continuously run to verify that nothing ever gets removed that's not supposed to be removed. So this is one way that you can help out if you're not necessarily technical or don't have that time. So you can find all our stuff at Project Atomic on GitHub. We hang out in the Atomic Channel on FreeNode and we use the same mailing list as the rest of Project Atomic for, so it's Atomic-Develop. So with that, any questions? Yes, sir, yes. Yeah, that's correct. So the question is Commissar acting as a central point that's then using Ansible to execute against the cluster and the answer is yes. So we have that all hanging off as microservices on the bus. So whether that's in one place, as an example we provide a Docker image that is like all in one and we also provide a vagrant system configuration that brings up a whole bunch of different machines where the microservices can be on different machines in different places. The REST interface takes the commands, sends it on the bus and then those Ansible executions happen off of those microservices that could be local or remote. Yes, sure. Yeah, so the question was what's the benefit for having Commissar between Ansible? Well, the main one I would say is you have a central point that you can do authentication against. You have repeatable playbooks so it's not somebody running it off their own machine which granted, a lot of times companies and people will actually set up a Bastion host, they'll put their Ansible playbooks on that Bastion host and then manage everything through SSH keys. However, this ends up being slightly, I wanna say easier to manage in terms of you have one place for people to execute things and it's going to be the same execution with authentication based off of say Keystone or based off of probably in the future OpenShift or based off local authentication if that's what you wanna do, versus saying, okay, hey guys, make sure you send an administrator your SSH key and we'll use that and revoke SSH keys and do it manually like that for authentication. And again, the other thing is if people do have it kind of more distributed, it makes it a little easier because you have one place for auditing or at least a smaller footprint for auditing. Does that make sense? Sir, cool. Yes sir, yes. Well, you can't do as much, you're correct but when we did the MVP, one of the things we got back was there was some people who had traditional clusters as well that they wanted to manage with it. So they wanted to do atomic upgrades or deployments or they wanted to do system management with that as well. And some of them also had both, they had traditional stuff or they had like OpenStack deployments and they had container deployments which were separate so they had clusters for different things but they didn't want to essentially say, well, when we're doing container stuff we're gonna use one system and when we're doing non-container stuff we're going to use bare playbooks or we're gonna use something else. So you're right, there's less you can do from the feature perspective but there's still things you can do from that. At the moment, right. That's correct and actually we have some folks that are interested in that and that are looking into that in terms of adding it as a function but right now, yeah, you can add it into the cluster of hosts but you can't necessarily register it with OpenStack just yet. So, yeah, we do plan to support that. However, we wanna do that stuff cluster-wide as opposed to host-wide and the reason is we wanna keep clusters from becoming, as I said earlier with the phrase special snowflakes, we don't want hosts in clusters to be different from each other and get out of date from each other essentially. So we would do that and we're planning on doing that in the future but we wanna make it in such a way that it happens across the cluster consistently and not, well, add it to this host, now add it to that host, now add it to that host as you would do, say, with a playbook or with Parallel SSH or things like that. Sure, yeah, if you're not interested in, so the question was if you were not interested in a CLI for this type of management, you would just use Ansible modules for that. So the answer is kind of. So another one of the benefits and I should have said this earlier is being a REST API integration into it for other systems is, in my opinion, significantly simpler. So calling REST APIs is easier than having things call a shell on a specific machine to then kick off for that because we're actually just doing that and providing the API in front of it. So you could do that and that's what probably a lot of people would do but if they were trying to do integration between the stuff, say, if they were gonna use CloudForms or something to do some of this integration, they would probably use an API or reintroduce some of the same type of functionality directly into their product if they were gonna do it that way but that's how it would work. So basic answer, yeah, they could. Right, exactly, yeah. Right, if you put Ansible on top of it, I mean, there might be use cases for that but if you put Ansible on top of that, then you're kind of pulling back into the same issue. One of the issues we're trying to solve which is, yeah, to have things specific and repeatable and simple, yeah. So, okay, thank you.