 Welcome to another edition of RCE. Again, this is Brock Palin. You can find us online at rce-cast.com You can also find links to all of our Twitter's and the blogs and stuff like that You can also find the complete back catalog there and a link to an RSS feed I also have with me again Jeff Squires from Cisco Systems and one of the authors of OpenMPI Jeff, thanks a lot for your time Sure, Brock. How's it going? All right, it's been a while since we've done one of these Yeah, so I think by the time we get this one out people will either have just Submitted their exceed papers or they're there or they're still frantically working on it depending on plus or minus a day or two When we get this out Yeah, the exceed 15 technical paper deadline is March 30th, and we're recording this on March 23rd So we'll we'll see if this even gets on the air in time for people to even realize some time right around then All right. Well, what do we got today, Brock? So today we have something that we've been talking about around our shop And so we finally just decide we should find out more about it. So today. We're going to be talking to Mike Place who's one of the creators of Saltstack. So Mike, why don't you take a moment to introduce yourself? Yeah, hey, Brock and Jeff. It's it's really good to be with you. I appreciate you having me on Yeah, I'm on the engineering team here at Saltstack. I have been here about 18 months I was the fourth or fifth engineer hired here and We're growing very rapidly. So we no longer have a very tiny engineering team We only we now have just a relatively small engineering team, but yeah, like I said, I've been here about 18 months. I've got Going on what almost looks like 20 years of experience terrifyingly enough going back to the mid 90s when I when I worked at a little indie ISPs around here and and then spent some time over at Novel and And now I'm here having a good time one of those lucky people that gets paid to write open source software and it doesn't really doesn't get much better than that Okay, so we're specifically going to be talking about Saltstack. So Give us the one paragraph. What is Saltstack? Sure Saltstack is a remote execution and configuration management platform and so Most people know us for the configuration management piece, but we're actually a much larger systems management A piece of software that then simply config management So to to start with configuration management, most people know the major players in that space That's folks like chef and puppet Cf engine And us and so configuration management for those who don't know is is simply the idea that You can manage the configuration on your systems in a stateful manner so that you can either declaratively or imperatively declare the intended State of your systems and then either with a single command or with a small set of commands Enforced that state across of course either a single system or many tens or hundreds of thousands of machines all at once Beyond that Saltstack is as a remote execution platform so because we have Mests masters and then we have agents running on large numbers of machines people who want to do things like for example Patch a security hole I can do so with with simply a single command and make sure that That that's done across many many servers now just out of curiosity further distinguish for this for me because The other configuration management systems also have at least some flavor of remote execution Built into them if nothing else, you know, just pure SSH execution these kinds of things What makes Saltstack different? one of the things that makes Saltstack the most different is that is that we focus very very heavily on remote execution performance We're well known for using a high-speed messaging bus. We started out with zero MQ under the hood that allows us to scale very very rapidly and so while many other configuration Managements may have a remote execution component to them We're very very focused on the performance piece of that component to to allow Remote execution to truly scale and that opens up a lot of really interesting possibilities That you just don't have if you have a slower transport under the hood So when you said a parallel execution, you're talking about not like serially running a thing on a Note at a time. You're not talking parallel processing, which is what a lot of our listeners are familiar with That's right. Yeah, we're not talking parallel processing. We're talking parallel execution the idea that from a single manager node one can issue a command that is run Simultaneously or near simultaneously across a large number of agent machines Okay, so how does this compare to something like kickstart? Like can you actually take a machine from bare metal or do you rely on something else for that? right when we Normally existed in the virtualization and containerization space We have done some bare metal work in the past The we aren't particularly focused on that most of the time We pick things up after an operating system has been provisioned that said we do have a piece of software called salt cloud and salt cloud is designed to bring systems up either in public clouds or private clouds or You know near to bare metal simply with with hypervisors on the ground Okay, so you said something very interesting there. You said a slower transport now. What do you what do you mean by that? Cuz I'm a networking guy I work for Cisco Transport has a very specific meaning to me, but I think you're you're talking about something a little more General than that. Sure. Sure so We've done a lot of work actually on the transport layer because we find that it impacts performance So heavily what I'm talking about transport and when most People who are in this space are talking about transport. They're talking about the difference between Say SSH as a transport versus HTTP as a transport Versus for us like I said zero MQ just running over TCP And we may get into this a little bit later But one of our efforts over the past year has actually been to introduce a new transport that we call rate the reliable asynchronous event transport Which is a reliable transport that we have built on top of UDP and That's how it's yeah, it's designed it's designed both to be used with salt or potentially on its own and And like I said, we've we've done a lot of development work in that area and so we're very excited about it All right, and that sounds actually pretty sexy to me. So you have your own little daemons running Either on top of a VM or in VMs and things like that for sending these kinds of control messages around That's how rate is used Right. Well salt is used You know most commonly like I said in a master agent mode We call our agents minions, you know from despicable me the movie for those who've seen it and so we have agents running on on all of these minions those agents can or they connect back to the master and They simply listen for commands which are published and then they act upon them and those commands can be Like we said something as simple as remote execution or something as complex as enforcing an intended state that from the configuration management layer Laying down, you know new packages or new configuration files or ensuring that those configuration files We're in a given state on a system so on and so forth And so that's the basic idea of what's happening. Of course, we build quite a bit on top of that But that's fundamentally what's on the ground Now you mentioned you you have spent a lot of time in terms of scalability and whatnot and can you Explain how that's affected as well I mean you have a Hierarchical kind of connectivity or a mesh like connectivity or or something other than linear or just linear just When done well scale out really well. How do you do that? Right. So, you know, we believe that you know performance really comes from making good architectural decisions For us especially in the beginning using zero and Q Has been a really good architectural decision for us. And so what we're able to do is using zero MQ we have a push-pull pattern between the masters and the minions and so the zero MQ is Effectively threaded under hood and so what it can do is it can publish commands Nearly simultaneously to all connected listeners in our model Minions then self-select whether or not they should run the command in question They immediately fork a new process so that they don't block do the work that needs to be done and then reply Back over another TCP connection when the work is completed. Now, of course We can also scale this model out so that it is hierarchical We find that that while that gets done We scale quite well in our default configuration often up to many thousands of connected nodes on to a single master And and as a result many of the people who end up scaling into a hierarchical model End up doing so for reasons other than simple simple scaling which which we're very very proud of So what are some of those reasons why if it's not for scaling what I would someone want to do that? Often we find that that they do that so that they can segment they can make logical segmentations for access control or for security Versus simple raw scaling because the message bus underneath is so fast that they find that they don't need to the secondary reason is that Most of the scaling problem we returns when they come back and so at times We can see occasionally a thundering herd problem handling those returns But again, we usually don't run into that on the master side Until many thousands of returns being handled at once we use on the master a Router dealer pattern in zero and Q so that we can spin up many back-end processes and simply forward the work off to them and Continue to have a high-speed listener to come in and listen to the received events And then like I said forward them to a queue on the back end to be processing to be processed rather salt is designed to be highly asynchronous at every step of the way and So we we work very very hard to avoid blocking operations anywhere in the architecture so this is kind of interesting because I'm thinking about the way I've ran HPC systems for you know a decade plus and Normally, I'm not that worried about the performance of my provisioning and configuration management system Except one loading the cluster which is only every so often so you run into cloud world I'm guessing the entire model in which People normally operate their environment around salt stack is different than what I'm used to can you describe the way? People normally use this and why they would need this performance Yeah, that's a great question and it's one of the points that we try and drive home, which is that We feel like salt stack instead of being a pure configuration management system Is a a high-speed event bus upon which configuration management is only one service We find that in many infrastructures people end up Building their own messaging buses. Of course, this is really common in the HPC world for example and they build those messaging buses to Connect applications together to connect applications the systems to what have you We see this vision as something that we call event-driven infrastructure And event-driven infrastructure is this idea that if you have a high-speed event bus Connecting all of your systems and applications together That you could make the data that's ingested and broadcast onto that bus Fairly you could make all of that data a first-class citizen whether that data comes from Kernel messages or that are being emitted or whether that data comes from Monitoring high up in your application stack Because you have this this high-speed event bus you can use that data and you can program against that data I you can watch events coming in from your application stack And use the configuration management engine as a service to let's say tune your kernel for example And so Putting all of that data on a single plane With a configuration management service that can be reactive to that data We think is what allows people to to build really interesting next-generation Highly scalable highly flexible high reactive and reflexive systems Okay, so this is why you see this is built with like the cloud because you're kind of dynamically provisioning and spinning stuff up as needed Sure auto scaling is is certainly one component of that and we get people Doing things like that quite frequently But it even comes into play in smaller cases say for example continuous integration where people Have points in their process That are blocking right so let's say for example You know a very typical continuous integration workflow might be Check-in code then wait until the tests are run and then if the test pass go out and provision things into a staging environment, right? Most people sort of tend that tend to glue those pieces together They have to glue their you know test framework into their you know code deployment framework and so on and so forth We think that by making all of that event-based, right that It makes all of those pieces easier to work with all of a sudden you can simply watch for your test system to say to emit events onto the event bus Saying that the tests have passed and then your configuration management piece can take over And start to deploy that code as needed so on and so forth So on your website it says that I'm gonna quote here to Assault Back is orchestration and automation software for cloud ops IT ops and dev ops and and I think I can grok Or you know what you've been saying here over the last several minutes into that But could you define what you mean by that and possibly even define what you mean by those three terms there? Sure when we talk about it ops, we're talking about traditional IT operations and You know, I think one of the things that that we say to ourselves sort of day to day You know we obviously we're very much in the dev ops space and I say dev ops kind of with quotes around it because you know dev ops is Is a term that is is increase is growing every day and it seems to encompass more and more More and more things every time you turn around You know and salt stack is very much in defense of the sysadmin. We're very much supportive of this idea that That you know while dev ops has its place and that place is very important trying to connect devs teams to ops teams There are still a lot of guys out there just doing traditional system administration and they still need you know Really good tools to do that and so we don't want to shirk away From the sorts of problems that those guys face day to day So anyway, that's what we would call IT ops. Of course cloud ops You know, we guys who are provisioning stuff either in public or private clouds Who need to deal, you know with auto scaling problems who need to be able to understand, you know What their cost metrics might look like and who need to build systems that? can dynamically provision themselves and can can have configuration of course laid on top of them after You know just the the regular cloud instances are provisioned and you know DevOps of course is the last piece, you know, we see We see DevOps, you know, very much in terms of of the original vision That was laid out, you know by by John Allspa and those folks in in 2009 when they spoke of velocity And talked about this idea that of course there is this common language that can be shared between devs dev teams and ops teams our stance has always been that We need to be able to connect dev and ops, but we don't want to dumb down either side And so when we come at dev ops, we try to come at it not by Creating this common language that's so abstract and so simple That it doesn't provide enough power for either side But by doing everything we can to speak to to both sides and voices that they can understand So I went to a meet-up just last week where they were demonstrating salt and they were running on a bunch of machines They were running on the AWS cloud But when they were doing things they were running on already running minions and it really just looked like another Configuration push out a patch install some package on these class of machines And they use pillar to put in dynamic data and things like that but I want to know more about is is when you say cloud integration and cloud ops and What exactly does that get down into meaning sure so normally when we talk about that We talk about our salt cloud product and salt cloud is a cloud provisioner So it allows you to make maps Let's say with your AWS credentials and your linode credentials and what have you and then Spin up machines You know of a given size or with given resource parameters Across a single or multiple cloud providers bootstrap them with salt if desired and Then bring them up into into their intended role So that they can be ready to be inserted into into a production environment So really it's that it's the step all the way from Deciding, you know, which which clouds be a public or private or what have you All the way through the instance provisioning step and then through the configuration management provisioning step And then of course configuration management or in general systems management can take over To turn that node and into something into whatever it is that you needed to pay Okay, so I think Brock and I are probably having a little hard time Wrapping our heads around this because like as he said, this is very different than what we do So it's it's against our bias, but it seems like these are kind of you're aiming at different types of applications Then then we traditionally do so something that that is More in terms of I hate to use this word because we already use it several times already But dynamic right that I want to start a service do some things take it down And then do some other things and potentially be reusing Resources with the next service that I start up and whatnot. And so it's much more involved with creating and destroying Individual elements that can be used to service Different actions that might be occurring in either a pipeline or some kind of parallel pipeline or many simultaneous Pipelines and and things like that Is that a little more accurate than what what Brock and I might be thinking that you know I really love the idea of how you explain that the provisioning side is is Programmatic or at least that's how I I groked what you said that I can write code that says okay now spin up this and I'll get back You know did that work did that not fail or what other fine-grained kind of results? I might get from that. Sorry. That was a big blob of a question there. I hope that made some I'll try and and respond to it as best I can I think that There are a couple things to say about it One is that One of the things that that I see personally is that the configuration management space Is starting to split down the middle a little bit? There are some tools which are very very focused on the provisioning side, right? Like what does it take to get this machine either from bare metal or from a stock OS? provisioned with The intended packages and configuration files and what have you installed? installed And of course, you know that becomes more and more popular As containerization takes hold right, you know and people talk Of course about immutable infrastructure, right this idea that you provision it, right? And then once it's provisioned it doesn't change all right or at least that's the theory But of course Anybody who's managed systems knows that that systems have a real life cycle And that life cycle goes beyond Simply standing up the machine and provisioning it and putting it into production That you have to deal with things like you know security patches for example day to day with you know configuration drift bit rot all the things that that caused systems to Well to misbehave over time, you know discs filling up what have you and so, you know We try to stand on both sides of that divide. Of course, you know salt clouds and You know does a very very good job in the in the provisioning side But salt the configuration management engine You know along with the agents that are running on these machines can allow you to manage the full life cycle of the machine beyond simply provisioning And be able to to use it and manage it on day to day So do you work with like native package managers and stuff like that when you get down to the configuration management part or? And then okay, if you do work with native package managers Do I have to know anything about those package managers if I say run some red hat and some Ubuntu machines and to use two different package systems? right We do abstract away Most of the differences between the various package managers, so for example with salt If you want to install VIM, right? It's package dot install VIM and whether it's RPM or d package or whatever it is under the hood We'll figure that out for you and we'll end up, you know shelling out to the correct commands That does not however account for the differences in package naming And so a very common example is that some distributions name Apache Apache 2 where other distributions Might name their packages HTTP D We do not abstract away those differences simply because The risks of doing so are a little bit high, right? You only want to abstract package management so far and so we allow for those differences by Allowing people in their Configuration files with salt that declare the intended state of a system Those are normally written in YAML right there. They're simply data structures But we also allow people to interpolate ginger so they can say for example If this is a red hat system install the package Apache 2 if this is a Ubuntu system install the package HTTP D what have you I may have reversed those I don't remember which is which off the top of my head, but I think you get the idea All right now you said something interesting in there, too You said I think you said salt dot install in a package name Were you referring to a particular language binding? Is there a preferred language that's Your your users use? Right, so when they're using the the salts configuration management engine They declare The states of their systems and the language in which they make those declarations by default is YAML With with some ginger interpolation if they wish to do so however salt Fundamentally is agnostic about the way that Those data structures are declared because we ingest them as native Python data structures Which is to say whether you want to use YAML and ginger Or any other templating language, I mean, you know XML if you're that's really what you want to do So long as it can get rendered down to a Python data structure We're good with it that could come from an external source as as far as we're concerned and so We're effectively language agnostic so long as They get rendered down to Python data structures that we can understand Which is great because it allows people to use the syntax that they would really like to use some people like YAML Some people don't and that's completely fine with us so One of the issues I run into a lot of times with these configuration systems is okay It's easy for me to install an RPM But our PMs don't come with the correct config files for that specific host as providing that specific need How do you handle like okay once you've installed a you know, httpd How do I get the right you know virtualhost.conf on to that node that's supposed to serve that one? Right So the the standard way to do that is to declare one state Which is to install the package and then a second state to lay down a configuration file Excuse me to lay down a configuration file needed For the package that you've just installed right. Yeah, I your customized configuration file Salt in its state system has a set of requisites which can do things like say First install the package and then lay down the configuration file because obviously doing those things in the correct order very much matters Can it even go a step beyond that like if you're looking at an application level? Say I'm adding more Http servers to serve a very busy site Can it also then tell the elastic load balancer to Add this extra machine to the round robin? absolutely uh, and so what it can do is um It can uh, it can it can create um those Apache configs, right? And then we have requisites that say for example Watch to see if on this given state run we have added An additional v-host and if we have Bump the load balancers Or configure more machines or what have you? So that requisite system allows people to create Stateful relationships that are actually associated With their deployment workflows So you just mentioned it's very interesting hooks there. Do you also have hooks down at Stay the hardware level too. I know you say you concentrate a lot more on on on virtual machines and hypervisors and containers and things But do you have things hooks into say ipmi and or other BMC or or bare metal kinds of things as well? You know, we just put an ipmi Module into our develop branch. I don't recall off the top of my head whether that is going Into our release which is going out the door in a few days But if not because salt is is very very modular. There's It's quite simple to just pluck a module off the develop branch and use it on your local system BMC I do not recall support for But I know that ipmi is definitely in So, uh, what if I wanted to add support to this? What's the Salt written in you've mentioned python a couple of times. I'm assuming that's what it is That is correct. It is written in python Creating modules is actually quite easy. It's So what we have is we have this idea of execution modules, right? So an execution module Is a is a collection of similar routines, right? So for example, Let's say package dot install package dot remove package dot upgrade for example And so let's say you wanted to Build out additional support for kernel tuning, right, which is there right now But let's say you wanted to add some additional functionality you could Do you know sysctl? You could simply drop in a new function called ip forward, right? And what that would do Would be to say turn on ip forwarding for the kernel Doing that is is really just a matter of writing python code To do what you want on the system The really nice thing about that is that when you're writing that python code You already have access to all of the other salt execution modules in that namespace So you don't have to reimplement All the details of shelling out for example to run a particular command and avoiding shell injection and all of that So you can actually use all of the other execution modules in your development And because you can do that Writing new execution modules becomes really really easy because you have access To all of the execution modules that have already been written So just out of curiosity since i'm a developer myself Something I like to ask other developer crews sure what version control system do you use and why? We are an open source project And we are hosted on github So we've been very very happy with github They are they are very kind to allow us to to use and abuse their machines We do we do quite a lot of traffic on github and And we've been very very happy with what they've done for us and every time we see those guys we We try to gush and go out of our way to really thank them because they do really really wonderful work so I'm going to ask our usual. What's the largest question and and for our high performance computing people It normally means what's the largest cluster? What's the largest number of machines? But I'm thinking there's a couple of different ways you could slice this for salt So first sure what's the largest number of machines? That's been managed, but then also what's the largest number of say unique configurations. It's ever dealt with yeah The largest number of machines that we know of isn't the many tens of thousands We have large companies like linkedin for an example. They manage many tens of thousands of machines And and the great thing is that they do it on You know reasonably sized hardware, you know, they don't have an entire rack of salt masters for example to do this To the best of my knowledge They have one Big salt master. That's something like 16 cores And and they have many tens of thousands of machines And there are a number of installations. They're not the only ones that are that are running at that scale So it's it's really not uncommon For us to to scale up there and that said we have a lot of installations that are You know a few machines a couple of machines a dozen machines. What have you so it's really all over the map So let me ask you a slightly different spin on that. What is the most unique Usage of salt stack that you've heard perhaps something that you really you you look at and you go wow I I never even would have thought to use salt stack that way yeah, um, we we had we just had our our user conference and We had a guy submit a proposal for us about How he's using salt in sub-saharan africa And it was it was a wonderful talk. I believe it will be on our youtube channel soon I know that there have been some uses in africa They've they've used it to to rapidly provision machines to respond to the Ebola crisis there I have also Talked to some folks also from africa Who have used salt to to manage remote machines that are off in In very remote locations be it in small african villages or what have you Where they need basically machines that they can leave alone and not touch for six months And ensure that they are going to continue to run and that everything will be happy And that when they are contacted that they can use a very lightweight quick to respond Remote execution platform So yeah, so I I think the the machines in remote african villages is probably our most surprising use case So you mentioned it's an open source project specifically what licenses salt distributed under It's distributed under the apache license Okay, and then so finally where can people get involved and find more information about salt stack? Sure the best place to to come is to Our github page on our github page, which is github.com slash salt stack slash salt They'll see links of course to very typical open source project Things like our mailing list and our our irc channel We have many hundreds of people in the irc channel and it's very very easy to get involved there We we have a very very active developer community and it's it's quite simple To submit patches or feature requests or what have you We do tend to pride ourselves on the quickness of response for the the bug reports and the The patches that we get so it's it's very very common for people to submit patches to us and see them merged into salt Either in a couple of minutes and usually at the most frankly an hour or two Is a very typical turnaround for us to to get code that's been submitted to us into the code base Okay, so open source is great But a traditional thing that you hear and and not everybody understands is that it is still possible to have a successful business model Even with open source software even though you're giving away the good so to speak this is something that I hear a lot So what is what is your business model? How do you guys, you know put food on the table? right We do that in a couple of different ways and the first thing to say about this is that Tom hatch our cto gave a wonderful keynote address that I think Extends far outside of salt stack About how open source businesses can make money I expect that by the time this recording is posted that should be up on the salt stack youtube channel And I really encourage anybody who's interested in how open source companies can make money To go and watch that the second thing to say is that salt is not open core And that we have absolutely no plans to be open core We do have proprietary software the proprietary software that we sell Is in the form of a management GUI which just got released last month And uh, it's it's a management GUI that allows you to bring a lot of the uh, you know, admittedly abstract concepts into A very nice clean web interface and and manage your systems In a way that uh, you know the entire breadth of an IT staff can understand And so there's been tremendous interest in that as a product. Of course, uh, we also Go out and we we offer trainings We do a lot of integration work and of course, we sell many many many support contracts for Larger installations that are interested in In having developer support So what's coming to future versions of salt stack? You've mentioned a couple of things. What what do you think are the important bits coming? right The stuff that we've been working on Recently, we've been very very excited about we've just released Two new features that we call engines and beacons I have mentioned a couple of times this this high speed event bus that we have that allows you to To have this singular message transport For events whether they originate, you know from the operating system or from the application and so beacons are a technology that allows you On the salt minion to effectively monitor certain Events on that On that minion For example, I notify is one of the beacons that we just wrote so you can watch it a particular file or directory And if there are changes to that directory an event will be emitted onto the event bus Which can then be programmatically responded to Or simply logged for all Dean or what have you The other side of that coin is engines Engines are processes which run on the salt master again, that's the the command side of our command and control model and Engines allow you to To watch that event bus And do whatever you like with events that you see and so it gives you tremendous Flexibility because all you have to do is watch for events on the wire And then write whatever python you like to go out and respond to that be it you know using configuration management to go back and Make changes on your system in response to events, whether it's alerting whether it's Simply notification or whether it's just logging for an audit trail Engines really allow you to take this idea of an event driven infrastructure And allow you to actually sit down and write code To to respond to those events. So we're very very excited about that technology Of course, we're going to continue to develop Rate, which is like I said the reliable asynchronous event transport That we believe is going to give us even more flexibility in the future Because While zero and q has been really really wonderful for us We've reached a point where we want to be able to do some things that zero and q Really Doesn't make it as easy as as we would like and so we're very excited We're going to continue to do a lot of development In that area and of course the last thing to say is that we're working very very hard on our our enterprise GUI and And trying to make sure that that's as good as it can be. So those are all things that we're going to be very focused on in the coming months Mike, thanks a lot for your time. Uh, again, what's a place where people can find more information about salt stack? Uh, either salt stack dot com or our github page github.com slash salt stack slash salt Thank you. Great. Thanks so much, Mike. Great. That was good fun