 Welcome to another edition of RCE. I'm your host Brock Palin, and I have with me again Jeff Squires from Cisco Systems and OpenMPI Hey Brock, good afternoon. This one's this one's gonna be a good one It's kind of near and dear to my heart because my role is a you know a parallel systems developer guy. I've got about 50 plus nodes at Cisco that I use for development and testing and I am I am low enough on the peon scale that I don't get any technicians or sys admins that To help me out and keep these things running So I have to do it all myself and so any tools that can help me do this are are very greatly appreciated I want to hear about them Yeah, so the tool is a BCFG2 And we have with us two people who work on that. We have Naran Desai and Corey Leaning hander. I think they're both from Argonne National Laboratory outside Chicago. So guys welcome to the show Hi, hello So go ahead and introduce yourself say your name and say a little bit about you know how you got started I'm Naran Desai. I I work at the mathematics and computer science division at Argonne National Laboratory and I'm a Sort of a half step between a system administrator and a system software developer so I Work on system management software and things of that sort Basically when I started working on be config I was a system administrator responsible for a variety of HPC systems around MCS And we needed something to help us cope with the configuration complexity and scale that we were seeing at that point. This was in 2002 And I'm a Corey Leaning hander. I'm one of the system administrators on working with the the leadership computing facility here at Argonne working on our big 40 rack blue gene system and the Visualization cluster that goes with it and all the file servers and login servers and all the the extra support pieces that go into keeping a large HPC resource alive Why don't you give us a little bit of intro maybe a little bit about where BCFG came from? So we call it be config Basically when I started out here at the lab, I was a system administrator responsible for a bunch of the large-scale HPC systems here We had at the time a relatively large cluster was about 320 nodes called Chiba City and a small administrative staff and Basically configuration problems, right? so we were supporting a group of computer scientists that were doing development on a variety of sorts of system software HVC system software and numerical libraries and things of that sort and These researchers needed access to a wide array of types of machines and large machines and things of that sort And so we needed tools to be able to effectively manage both heterogeneity in our environment and Configuration complexity that you would see from requirements from a lot of different sorts of users so we have moved sort of a little bit more towards system software development and so Basically the transition has been from a system administrator to halfway between system administration and software development Which I think is actually a really interesting place to be cool, so so Actually, I'm glad to hear you call it be config because to me BCFG to just doesn't really roll off the tongue very easily And when I was googling around for information, but you know preparing for this interview here I found all kinds of things about natural gas sizes. Yeah, so I started working on be config in about 2002 and Picking Googleable names was not at that point an important Thing to consider when naming software and so I realized about a year later when I actually did a public release and and started Seeing you know actual external visibility for be config that when you search for it You got a billion cubic feet of gas and It wasn't a term that I was previously familiar with It's a very let's hope that your software doesn't produce a billion cubic feet of gas Yeah, I don't I don't know. I mean it the thing was really kind of funny at this point people always sort of grumble about the name because it doesn't actually Communicate much about what be config is other than it has something to do with configuration and Where does the to come from because on the on the logo on your website? Is that a squared or is that a two? It's it's a two there was a be config one and it was a miserable failure And basically it got to the point that it was deployed on a bunch of systems here at the lab But it wasn't very flexible and when you actually tried to extend it from a single Sort of small group of administrators to a larger group with a more complicated environment. It really didn't work very well Flexibility wise and so we scrapped it and went and redesigned the way that it worked We kept a bunch of things about the overall operational model But but made it more flexible and replaced it with an implementation that scaled better So this is actually version two of be config We're actually close to version one of version two Clearly I don't have this story very clear Clearly so what does the be and be config stand for well originally? the idea was built around this notion of validation so The basic idea was that You want to have something equivalent to diff and patch for configuration, right? So you have a machine you've got two machines and one of them is different from the other and you want to be able to say How are these machines different or given this difference between machine a and machine B? Let's apply to machine B right so intuitively it's a pretty simple idea and The once I started digging into this sort of conceptual model the thing that I realized is that we needed some sort of validation mechanism to build a configuration from a machine and so As you start looking at the way that software gets layered on to machines you frequently have An artifact on the machine where you take something like an RPM and then you reconfigure it with some configuration files And you associate that with a service that you've turned on and all of those things are very interdependent And so we call those bundles and be config and so the be originally stood for bundle and so it's bundle configuration tool It's not a particularly meaningful aspect of be config at this point but the name has kind of Rolled forward with us plus it gives us that billion cubic feet of gas Google hit There you go. I guess if you actually want that So, okay, so does be config get involved anywhere with installing or do you when installing say like some bare metal Do you still rely on like kickstart or auto? Yes, or something like that and then you just call be config Like is be config something you automate or not? I mean, where's it get used? So it's it's very useful at that point if you're starting out say building a cluster and you have all the bare metal sitting around there if you've predefined What classes of machines you want then it does drop in after the kickstart or whatever building phase you have so whatever Whatever distribution you're using the method it uses to build machines You build just a base machine for all of the different styles you want if you want compute machines or login servers Or management machines or whatever you can build them all from the same kickstart Image and then be config will will run on top of that to turn everything into the individual types of machines that you want Is it Linux only? No, it's not it's actually used on Solaris and OS X to some extent we have some interest in Windows port But we haven't actually Put any time into that there's some folks around the lab that are actually really interested in that But we don't have any usable software at this point on Windows Generally, it's POSIX. So if you support POSIX then be config can do useful things for you Okay, so does be config Redo everything or does it just rely on the underlying packet managers and stuff like that like does it just support RPM and you know the OS X packages and things like that or is it something more like like I don't know I don't know if you ever heard of a tool called rad mind which literally tracks by the file and to check some on that file and You lose track of all the RPM data Right, so it doesn't work like Radmine does okay So it actually has integration logic to talk to different package management systems and surface management systems and things of that sort And then there's a driver to talk to POSIX back ends, right? So for example, we have package management drivers for apt for RPM for OS X packages that are The OS X package driver. I'm not sure if it ever got integrated Because the functionality there it it doesn't give you everything that you'd want But we have fully functional drivers for solar system five packages IPS Gen 2 package management and all the service management systems on those platforms are well supported as well Okay, so you were lying the natural so can you do like hierarchal kind of configurations? Like say I have a two different classes of login nodes say like a GPU login node Can I kind of say like the GPU login node inherits from the login node? Yeah, exactly you can okay, so I don't have to like repeat myself for things that are common on a single system Right, so be config is actually it might be useful to describe the basic architecture be config so it's a client server model the client is responsible for Basically taking the configuration specification it gets from the server Validating the current state of the client figuring out a Reconfiguration path between where it is and where the server says it should be and then performing those options are not depending on the command line Options supplied to the client, right? so if you're in dry run mode it won't do the changes and things of that sort on the server side there are definitions for what machine should look like and this might include things like a Group that's associated with login nodes or compute nodes or GPU nodes or whatever and Then you can build what are called profiles which are combinations of groups that describe the attributes of a given machine and so in the example that you were describing basically what you would probably end up with is Two different profile groups one for login nodes and one for GPU login nodes And then a series of groups that that are included in those profiles So there might be a group associated with GPUs that would you would share on you would sorry You would include on all of your machines that had GPUs regardless of whether they were login nodes or not And then you can basically add the bits for GPUs to the bits for login if that makes sense Sounds a lot like inheritance Yeah, it basically works like inheritance. There are a couple of Details that are slightly different, but by and large. Yeah, and multiple inheritance clearly works Okay, so how do you how do you specify these things? I admit I was trolling through your website before And I saw you know references to a configuration file I didn't dive deep enough to see what it is But you know, how do you specify, you know how this stuff? You know is configured you listed by RPMs. Do you list by RPMs and files? I mean, can you mix the package managers and you know, how do you? Yeah, well, how do you specify first? Well, so the structure of these be config specifications basically amounts to a series of Entities that Describe a given sort of atom in the configuration and these atoms are all typed right? So you might have packages or services or config files or things of that sort and these can be mixed and the Basically things like package or service Those entries have types that associate them with the drivers on the client side. So in some cases you actually end up with Machines that have multiple package management systems or multiple service management systems The way that say Solaris systems look with a combination of SMF and legacy services are an example of this And so you need to be able to use both drivers at the same time and so That's all supported through the system The the way that this looks on the server side is that you you build this configuration Specification that includes a bunch of definitions. So say for example You you want SSH on all of your machines and you want a particular SSHD configuration The way that this would end up looking in your configuration is that there would be a rule saying if you're running red If you're running rel 5 this is the version of the SSH package that you should get and it's of type RPM And it has this version and things of that sort and then in a different place You would have a configuration file that rel 5 systems should get, you know for SSHD.conf or SSHD config and For SSHD config you can actually go through and use groups to define different configurations So say if your login machines allowed logins from a particular set of locations or didn't allow root logins or things of that sort You could then register multiple instances of the SSHD config with beacon fig and that would be served out as Appropriate to clients and and merged into their configuration So to say the same thing in a different way Be config has a concept of what a package is and what a config file is and what a service is and what what all these Individual things that live on a computer are and has the ability to aggregate those together for interdependencies So you can tell it that the SSHD Service depends on the SSH package, which also depends on the individual configuration file. You've put together So you can take all of the individual pieces that exist on on your machine given to you by the distribution Put them together into the bundle Which then it knows how to install on the different classes of machines that that you've defined in the hierarchy of Groups for that inheritance bit Cool. Okay, so that actually sounds Genuinely useful. What is what is the security model here? Because obviously a lot of this stuff is gonna have to execute as root So do you sign commands back and forth or authenticate some other way? How do you know how can I know that you know this client is talking to an authenticated server and vice versa? So we use HTTPS with SSL certs The the thing that we've actually found kind of tricky in Cluster environments is that the bootstrapping problem is a real Issue for people when they're doing multi-hundred node installs and so we've basically implemented Everything up to client side certs So clients are authenticated based on their certs and the server has a certain everything is signed by a CA and so forth But there are also lower modes of security that can be used in Situations where they are appropriate So it's sort of the best of both worlds So beacon fig you mentioned before it's you know, you were talking about a cluster environment I assume that also kind of Translates well into you know other kind of data center applications as well But do you also targets, you know the desktop environment as long as they're of the the POSIX flavor? Yeah, actually originally when we developed beacon fig we were thinking mainly about a cluster environment But once we had it working well in our clusters We found that it actually satisfied most of the requirements that we had in our sort of desktop workstation environment as well and so we've deployed it across everything at this point and the We've actually been really surprised by the wide range of environments that beacon fig has been deployed at in production I mean we see basically everything from Sort of web style shops to cloud stuff to finance and you know Other stuff in the national labs and clusters and and then you know every once in a while You see that guy with 20 machines that runs beacon fig and so we get really the whole range Okay, I'm actually curious about the scale of loading some of these systems Does does the beacon fig server itself actually host the system native packages? Like what I actually dumped the rpms and beacon fig moves them to the client or is that handled some other way? well, so generally that's handle it through handled through the underlying package management plumbing and so Beacon fig doesn't provide the transport for that In some cases those resources are co-located on the beacon fig server But that's not universally the case and if you need more bandwidth you you can always move it to a different location Okay, that was the next question through through URLs and things like that. Okay. Yeah, cuz that was gonna be the next question I mean what what do you do when one loading server for like doing you know a thousand machines at a time is no longer large Enough, but it's like for example if I'm running rel Um it would just be config would just tell the clients use this yum repo Exactly, okay, and I can just set up however many young repos and they could be a local NFS mount or HTTP or Anything right? Yeah, exactly so completely flexible my choice Yep All right, so that actually scales out pretty well because then your beacon fig server is just really a metadata server You're sending out instructions and config files and and relatively speaking That's probably not a lot of data and you could scale up to a very large number of clients. Is that a correct assumption? Yeah, these configurations tend to be relatively small And of the volume in the configuration specifications The thing that we've seen that usually tends to be the largest is in large environments Your SSH known host file gets pretty big But it you know, we're still talking on the order of maybe a megabyte of data per client Okay and so that you know the bulk of of data movement is really you know like through yum or yaster or Something else and you can set that up in whatever scalable fashion you you need for the number of clients You've got it. Is that kind of what people tend to do? Yep It also makes it really easy to start integrating beacon fig into an environment where you already are using Yum or after whatever you have for the underlying package management You can easily bring beacon fade in define all of your packages and since it's running on top of the same thing You're not actually changing anything on the machines. You've just got an extra layer to help you manage them So then how does beacon fig differ from something like say red hat satellite server? Well the major way that beacon fig is different is this validation model that I talked about a little bit earlier, so many of the other configuration management systems that exist out there including the red hat up-to-date kind of stuff are Mainly imperatively driven and so in some cases that means that explicitly there are commands that are pushed out that are executed on clients or that the The the semantics of the protocol between the clients and servers is functionally imperative in nature A beacon fig is a declarative tool. So the specification that gets passed to clients is a set of Descriptions of the way it should be configured not a set of steps to make it be configured in that fashion if that makes sense Basically the idea here is To let the beacon fig client Handle the hard problems of figuring out the transitioning between states and then that way if you have a client That is in an unexpected initial configuration. It's easier To robustly handle the reconfiguration process So it will verify it's right bud Okay, so beacon fig will verify. So that's actually a nice thing because Can you actually run beacon fig almost like in a trip wire kind of mode like run it from cron and Notify yourself on something's not quite You know right absolute yeah, so this is actually a really interesting thing someone pointed out that Early on that you can actually run beacon fig without any configuration specification and it can tell you things about your machine Right because it supports these discovery mechanisms It can look at things like what services are enabled on which systems and which packages are installed and which clients and things of that sort that means that You can You can basically so so yes You can kind of use it like trip wire the the thing that makes it slightly different from trip wire is that it doesn't go down to Such a low-level sort of security focus, right? We try to keep things at kind of a high level to provide useful data to Sis admins right so if someone goes out to a machine and upgrades a version of a package You'll see that the version of the package was upgraded not that eight hundred and fifty two files were changed Okay But what if that machine is not supposed to have a package on it will be config actually remove it It can so it will detect it and depending on the mode that you run the I mean So this is one of those things where we find that there is near violent disagreement between our users Some of them absolutely want the beacon fig client to remove those packages and other ones are completely terrified of beacon fig Removing those packages and so both modes are supported Okay Here's another common configuration issue. I would imagine comes up What happens when you've got the smart local administrator, right? So so let's say you're doing Well, it could even be an HPC cluster, but probably more apt for a desktop kind of environment where somebody you know locally installs their own RPM and Possibly it even conflicts with something that beacon fig said should be on there. What do you what do you do? So there are a variety of different modes that you can deploy beacon fig in as you guys mentioned earlier one of the Actually really popular ways to deploy beacon fig is to run it in dry run mode most of the time so a secondary part of the system is a reporting subsystem and Basically the way that this reporting subsystem works is every time a client phones home It gets its configuration Specification it figures out the actions that it could take and it may perform some of those actions depending on the mode that it's running in After it's finished doing whatever it's going to do it uploads its current conformance to the The configuration specification back to the server and this information is stored in a database And there's a web front end and all that kind of stuff that means and this this data that gets uploaded contains everything from Extra packages that may be installed incorrect versions of packages that may be installed Dips between the versions of config files that are on the system and the version that the beacon fig server thinks should be on the system Right and so this is actually very close to the conceptual notion of a configuration diff That I talked about earlier and that the they're actually tools on the server side to take that and render it into configuration rules on the server side That's a process that requires the use of an administrator's Expertise in order to figure out where and why Well why that change was made and which client group it applies to so we don't tell generally do that in any sort of bottomated fashion, but there are programs to streamline that process and so There are a variety of sorts of configuration patterns that you might want in your network So the one that everybody thinks of to start with is kind of a star pattern where you have one source of sort of truth If you well and configuration changes get pushed out to all clients from there and in some sense this is the simplest workflow to support because you don't need to worry about pre-existing state and Capturing any data. You just need to worry about turning the pre-existing state into the desired state Another model that you could have is the exact opposite of that where the be-config client runs in dry run mode collects all sorts of information about the client configuration and sends that to the server where it can then be processed and merged into the configuration specification by an administrator and so if you basically have a group of development machines where people are making Deliberate changes to configurations all the time you may want to run in that mode It'll probably be system administrator intensive to do that But all the plumbing is in place to do that and to sort of support combinations of those two models where you have particular clients that Changes are being made locally on and and those changes get pulled back upstream through a an automated and I'm not a made it assisted process Okay, so you kind of hinted on actually even earlier in this in this conversation. You've hinted on You know the applying the diffs of configuration information and and I think from what you're saying it sounds like you know You're examining a wide variety of things and you have a smart client that knows how to apply the diff So for example you look and you say oh, there's lib foo version one two three But the configuration says I'm supposed to have version four five six Then you'll do an rpm dash you or whatever is appropriate to to do that and since you know This is an rpm. It'll do an rpm update But if it's a local configuration file either either copy a new one or apply a diff to that file And then kick the server to restart it or something like that. Is that about right? I mean you have kind of these you know drivers that you mentioned earlier for the different packaging systems And they all kind of have a general engine that they follow for upgrading and removing and editing and gathering and things like that Sounds like you're ready to become the config user. You've got it all down It's because I have to mint and manually Administrate my own cluster. I get more exposure to system administration than I than I really should So yeah, that is the the general Way that things go you hit the other server has an idea of what versions of packages are available What's the latest one and everything the clients have their knowledge of what they have they ask the server What you got and it sends back and the on the clients the If it's installing a package, it'll do the rpm upgrade that install whatever needs to be done and then Anything that has changed due to that will get checked one more time if config files that go with those they might have changed because the The package might have obliterated what you have there It'll check that make sure that the config files associated with that bundle are actually correct Check the services restart them if they need to if you're running SSH or NTP bundle with Things like that that you need to restart the server when a new package or new config file comes and then reports back to the server I'm all clean so you can look at it in your reports and know that you've got a nice clean set of machines on your network So I have a different question Yet we're talking about diffs in between what the server expects and what the client expects What about within the server for example? I Have a set of updates. I want to apply to a bunch of login nodes. I apply them I find they're incorrect as be config have the idea of go back to the way everything looked 12 hours ago so generally the way that we handle that is we keep the repository under some sort of version control system and Going back from the perspective of the server is a matter of changing the specification from version acts of you know package libfoo to version y of package libfoo and Then we count on the underlying package management service management tools on the client side to do the requisite downgrading and so I Have kind of a funny story. This is actually back when we were working on be config one Someone messed up a configuration for a machine that was running a test of Debbie and Sarge and The production environment here was Debbie and Woody at that point and the rules on the server side got changed So it says basically describe this system as no longer running Debbie and Sarge but running Debbie and Woody and the be config client Beautifully downgraded all the packages and when we next logged into it. It was running Woody again You know, this is a kind of an outlier But the the server supports everything that it needs to be able to do that kind of thing Cool, okay, but I can't look inside be config and actually say okay Here's the way it looked you're relying on the revision control system. Just kind of keep track of your be config configs Well, so we rely on the the version control system to keep track of the repository state The reporting system actually keeps historical data about the state of the clients at a given point in time And so you can actually go and reconstruct if you realize for example that the machines weren't clean at the time of The upgrade and they had the incorrect version of lip-foo. You could then go and look at the reporting database and figure out Which version they had been running. It's more of an auditing system Okay Well that makes perfect sense to me I mean it sounds like you're using you know building blocks of other tools where it's appropriate because you know there for as Many different version control systems that are out there You know, there's a lot of different Philosophies on how to do it you want a central repo you want distributed repo and you know also why why duplicate that functionality in be config when there's Pajillion other tools that that do it very well I mean I would imagine you also have to resist the kind of feature creep to you know Keep it a configuration management tool not for example a health monitoring tool I could see an easy temptation to say oh well I can put network checks in during while I'm checking all the software and make sure that you know The the InfiniBand or iWarp or mirror net or whatever it is You've got is up and running properly and put that in the database too Do you get kind of free to request like that from time to time? There is I mean so one of the mechanisms that the That the client supports is the use of client side probes and so Probes are basically a way for a client to assert information about itself in the be config metadata that you can then use for configuration generation And this is everything from you might want to probe system architecture or probe the hardware configuration of machines or things of that sort And this happens before the configuration specification is generated people have done all sorts of Interesting slash horrific things in there to pass information around in different cases that might be better served by a monitoring system, but One of the things that I've really learned over the last Seven or eight years through the process of working on be config is that it's really hard to come up with hard and fast rules That sort of describe all system administration situations There are a lot of weird requirements out there and sometimes having a full-fledged monitoring system Isn't necessarily possible for some reason and so in a pinch. It can be used. I wouldn't recommend it, but I'm sort of Nervous about saying that you absolutely shouldn't do it either That's fair enough. I I was on mute so you didn't hear me, but I laughed very loud when you said interesting slash horrific You know, you've made it right For something that you a could not have predicted and be you're not quite so sure about That's when people start using be config as an email client, then you will have made it. Yes So a follow-up question on that so you mentioned these probes and things like that How are how are probes implemented? Do you have them kind of as plug-ins? Is it easy for people to add their own probes? Like, you know, I do have Either a wacko package management or I only want to apply this configuration if you have a particular kind of hardware or Something along those lines. Is it easy to extend like that? So the the probes Functionality is very easy to extend It's just a shell script that you roll that you want to be run on your clients and whatever your shell script returns Whatever it sends out to standard out gets pumped back up to the server and then the server side You can use templates or other plug-ins on the server side to chop that up and do whatever you want with it And so it could be as simple as your probe goes app off and asks What video card is installed in your machine so that you can decide whether you want to install the NVIDIA drivers the ATI drivers or whatever weird piece of hardware you have up to Asking it for a full hardware inventory or what its uptime has been or whatever you need to do crazy things inside of the Templating engine or the other plug-in engines on the the server side So these different classes that can inherit from each other can I say in a larger site Have like a group of sys admins are responsible for just providing stock red hat 5 and then delegate to other people Okay, you can set up you can play in your own little area over here and put your custom configs on top of this But you can't touch this other groups or is that normally handled with multiple be config installations? at this point a majority of that kind of stuff is handled with be config installations, but We're pretty sure that It could be implemented using version control system commit hook kind of things right the Basic mechanism that you would use for that is you would define a series of groups that correspond to the different sort of support Responsibilities and then static priority levels for them and you would enforce the fact that any check-in of new configuration rules Corresponded to a group that for which this user was authorized at a priority that this group should always apply at But effectively what this amounts to is process right when you actually try to start doing distributed stuff You quickly end up sort of in the weeds of figuring out what your process should be and who's actually responsible for what and how This should actually work and so that ends up being actually more complicated than the technical aspects usually Okay, and this is actually on a different kind of point. Do you see be config used often on the diskless systems? so We do actually in a bunch of cases the the major benefit is that it provides a A fairly flexible mechanism to describe your different roles right and so if you end up with a differentiated system You still get a lot of mileage out of the fact that you can Describe a relatively sophisticated configuration Compactly and those Effectively those configurations can the only difference is that those configurations can be worked on offline Right, you don't need to actually do reconfiguration operations on the clients themselves Sort of similar to VMs a Reconfigure operation would just be a reboot then right exactly Well, yeah, yeah, that's that's one way of looking at it Here's a question out of left field here What's the biggest installation of be config that you're aware of and when I say biggest? I guess I'm asking in terms of number of clients Well So the Pleiades system at NASA is the largest one that I know of so that was a I Think number three on the top 500 list last November Let that top 500 was actually was a really good list for us because we had two systems in the top five. Oh cool How many nodes was that do you remember? I 65 6600 nodes cool Sorry, the typical configurations we end up seeing are in the sort of 300 to 2000 node range and you can actually scale out to multiple servers and things like that if you need to Okay, actually, that's exactly my next question So when you say multiple servers, do you mean multiple configuration servers and metadata's that federate each other or what do you mean by that? Generally, you just synchronize their versions of the repository with your version control system And then you can run multiple instances of the be config server that different client pools talk to you. Oh Nice. Well, that's a good use of building blocks there. Okay. Well, let me let me ask you again another somewhat random question here These are coming from my notes all the things that we've talked about and I wanted to go back and ask more about What what kind of tools does be config give you out of the box? Like let's say I've I've gone to deploy it Assumably, there's a rich set of commands that I can use you made a fleeting reference to a GUI web front-end kind of thing What what kind of tools would an administrator have to play with? well when you initially deploy be config what you have is a server-side daemon that reads a file system directory hierarchy that contains configuration rules and then there's a client-side tool that can be used to Validate the client configuration install configuration changes and then report it state up to the server The next thing to set up is the reporting system, which is a database back-end Backended web app that you can use to look at the current states of clients and and their past states Then there are also a variety of server-side command line tools for interacting with the repository verifying it interrogating it about what it would do in particular Situations and things of that sort Okay, so actually I want to back up a little bit So when you said the largest install you referred to the largest number of clients being managed I'm actually curious for a single be configured Installation the largest number of classes as like unique configuration setups for groups of hosts That's an interesting question. Um, I don't know I can say that For our configurations or for configurations that users have shared We've easily seen Upwards of 75 or 100 profile groups, which are these sort of classes of functionality the clients are directly associated with The number of groups in these configurations, which basically correspond to aspects of the nodes themselves. You can end up seeing Several hundred in your repositories Actually, one of the tools getting back to the previous question that Jeff asked one of the tools that we've got is a Program that analyzes your be config metadata and generates graph is output Which is then suitable for printing on your plotter for wall display because generally there are several hundred nodes in your graph Fantastic So we've actually found that that tool is really useful for providing new administrators with a map of what's actually going on on the Machines and how machines are similar one another and what roles they play I Would imagine it's also excellent fodder for all the pointy-haired managers You can they can say hey, what's our machine look like you can just point right at it and say there it is sir And they say excellent and walk away In some cases it can be used to scare auditors as well. Yes Yeah, I keep a copy of it laying around so that we can show it to the auditors to show Yep We know what our machine is looking like and what all the machines are doing and here's all the packages are on them and everything Okay, so What's the strangest used to be configured ever seen? I'm guessing you know some strange device out there that happens to Be POSIX compliant What have you actually seen people use it on besides a traditional desktop setup? Well So there's I think Corey's gonna give me a funny look when I start talking about this But there's a machine that we built here. That's kind of fun. So it's a cluster that CS researchers can check out nodes on and basically the nodes are built on the fly configured for them and They have a variety of custom or not custom but high-end networking hardware Infiniband mirror net that kind of thing GPUs. It's basically our hardware zoo and We actually use be config in that in order to support handing out dynamic root privileges to the user that's allocated the node and Providing custom configurations and things like that when they're needed So that's a an example of a highly dynamic environment that's been managed and it's kind of a cool thing at least for folks doing HPC systems research because this is something that a lot of our users have said that they've wanted in the past Right, it's sort of like clouds without virtualization. I think that the the most unexpected strange things that I personally run into are the the Interesting templates that people write up everyone saw I'll run across somebody who's written up some strange thing that I would have never thought of their people are using templates to do account management to create all of their accounts and and Delete them when they need to all of that doing static Network configurations, so you don't have to bother having a DHCP server that might go down in an environment or you don't want it the all of the the sort of things that can come up where I've written a Programming language and how people are going to do crazy things with it fit in with the templating engine that people will put out strange configurations generated in strange ways that Aren't always the the way that you always that you intended them to be in the first place So one cool template that's sort of an example of what Corey is talking about a student this summer wrote a template that would go out to machines and it would Make sure that the machines were only running the set of kernels that were specified on the kernel Or sorry specified on the server side and any kernels that they currently had booted So that you would never end up in the case where you had a machine that was running a kernel that it no longer had modules for And he did this with a combination of templating and probes to figure out what the current kernel was But you see all sorts of interesting and weird stuff in the templating space Corey's definitely got that one on the head So This actually brings up a different idea this System of having be config kind of dynamically change things How do you talk to it because I could definitely see like having Moab a cluster scheduler? Actually reading in like this job says I need this configuration set and it actually tells be config. Oh, hey Blow away this node and install this Class on it. Can you actually do stuff like that? That's actually exactly what we're doing with it's not Moab But it's a resource manager called cobalt that's written here at Argon We're doing that on this machine called breadboard right now Basically what happens is when the user schedules their job they submit which OS it should run with and at that point The machine gets provision and it doesn't actually get provisioned through be config We actually were very careful to make sure that be config didn't end up in that business because provisioning machines is a really really ugly process particularly if you support multiple architectures or You know change hardware at some point in the future things get really hairy So We've got a real thin provisioning system that uses system imager to push out node images And that gets called dynamically when note when a job starts and that system imager image Actually calls into be config in the same way that you would from kickstart in the post install Okay So what what's coming in the future to be config? Well, so we're literally Hopefully a couple of weeks away from a 1.0 release candidate And 1.0 is substantially better than the current production release of be config It's considerably faster and it supports a lot more flexible things like So I've described the metadata system in a little bit of detail basically all the things that we've talked about so far our introspection so that a client's configuration reflects things about that client The new capabilities that we've added in 1.0 a lot of them have to do with Giving constructs so that templates can not only reflect things about the client But reflect things about other clients or collections of clients So for example one of the things that you can do with 1.0 now is you can say when you're configuring NTP You should actually configure all NTP clients to point out all the clients that are currently NTP servers and things of that sort so Someone down the hall here has written a template that Completely configures ganglia either for multicast or for unicast Using templates and client group memberships and everything else happens automatically cool So yeah, we're really excited about 1.0. It provides a I'm a more interesting level of flexibility where you can really describe Not just commonalities in configuration, but your overall patterns of configuration across your machine And so we're working up an example repository for clusters specifically that we're trying to to make suitable for all of our clusters here at in MCS and And and hopefully most of that will be suitable for publication Just you know, you need to worry sometimes, but we can get most of the private stuff out of it pretty easily I just don't share the config files that have the important stuff, you know, how you the secrets So what license do you guys release this stuff under be confused BSD licensed? Okay, and and what what's kind of your development model for this? Are you the two main guys or do you have people who submit patches to you or you know, how do you do that? So I'm the development lead. There are about a half a dozen people That send out patches periodically what we've frequently. I mean our goal Originally, I mean, so it's actually really interesting writing software for system administrators because At least I found that system administrators tend to be very Concerned about things that will break on them in the middle of their production operations And so our goal is actually that user should not be writing code That said we frequently have people come in and write new features or submit bug fixes or things like that But that's I just want to emphasize that that's not part of using be config on a regular basis the development community consists of me and probably three or four other people that I get patches from on a regular basis and then An irregular group of three or four more that I get patches from from time to time the The user community, I don't know. We've got an IRC channel, and I think there were 42 people in it earlier today So be config is used by a Moderate number of sites. I don't really have a sense of scale You know and and since be config is included with a bunch of Linux distributions We don't actually know who's downloading it and using it Oddly enough I only found out that NASA was using it for Pleiades on the SC show floor last year Right, so you don't always know who's using it for what and I found that sysadmins in a lot of cases particularly if they're in commercial environments are Hesitant to actually share what they're doing. So we don't actually have very good data But we know of upwards of a easily a hundred deployments of be config in a variety of environments and things of that sort Yeah, this is a pretty common answer we get back from a lot of open-source projects And we have the same exact issue in open MPI I mean you ask how many users to open MPI have I I couldn't tell you For all the exact same reasons that you just cited So random question that I asked all since I'm a developer myself And I asked this to all other people open-source developers that we interview what version control system do you use and why? Well originally back in the old old days we used bitkeeper and We liked that a lot until the license got difficult for us being at the lab Right until it was no longer cool. Unfortunately, right Then we switched to subversion because of track largely and so we've been using that though at this point I'm using get SVN To access the to commit into this the subversion repository and I'm doing all of my development and get as are At least one or two of the other sort of core developers So I think that we're in the process of transitioning to get but we haven't quite gotten the Get isn't supported by where we are hosting our code right now in MCS and so as that changes I suspect we will be migrating Okay, guys, so how about some contact information? Website IRC channel, you know, where can people find be config? so we are at be config 2.org BC FG 2.org and We are in pound be config 2 on the free note Okay, okay. Well. Thank you very much. Just was it's really informative and actually I'm going to have to check this out myself Thanks a lot guys Rachel. Thank you. Appreciate that