 Hey guys Brock here the following news a joint podcast between the RCE crew and the food fight show Which is a podcast that focuses on people who use chef The first half of the show is going to be us kind of talking with them to explain kind of what they need to the HPC community are For my view as an admin and from Jeff's view as a developer And then the second half we get into details of what chef are for us and our listeners understand So hope you enjoy the show Hello and welcome back to the food fight show. I am your co-host Nathan Harvey with me today as Usual my co-host Brian Barry Brian. How are you today? Oh? Brian the mute button. Don't forget the mute button with piping So I'm coming to you today from Annapolis, Maryland where it is sunny and beautiful Brian is coming to us today from Rome, Italy where it is Mute and quiet So maybe Brian will fix his audio issues here in just a moment, but while he does I wanted to share some Some information for you some chef news actually. It's not necessarily chef news, but those of you that are listening If you're listening to the food fight show, I bet you're you have something to say about DevOps Or system administration things like this We have some great conferences coming up where you should submit talks to or you should at least plan on going to these Conferences so one talk or there are a bunch of Conferences coming up around DevOps days a lot of them have CFPs that are closing very very soon So get your talk submitted as soon as you can those DevOps days that have CFPs closing soon include India London and Portland probably a couple of other locations as well check the DevOps days or site Also cloud stack collaboration conference in Europe has an open CFP that's going to be closing soon So if you're doing some stuff with cloud stack, maybe that's worth checking out Brian can we hear you yet? Yes, we can and The CFP for one of my favorite conferences is coming up. That is the free and open source developers meeting for Europe And that's in Brussels. I believe February 1st and 2nd The call for proposals is due for the main track. I believe October 1st Yes, or the and I believe The call for the dev rooms to organize a dev room is on October 15. Is that correct Nathan? That sounds about right the dev rooms I know there are always lots of dev rooms that are a lot of fun Brian You and I participated in a dev room last year, right? Oh, we did. It was it was crazy fun Yeah, and it's it's just neat because we go to US conferences. They're fun You get people from all over the US, but you get them even much wider spectrum of people in Brussels Yeah, it was the thing about Fostum that I remember most is it was huge. Oh my gosh It's so so big But then the other thing that was cool was the config management room. It was overflowing The entire time that's both cool and sad because it meant some people couldn't come in and hang out with us and talk about config management A couple of folks have come up with a solution for that this year. You know what I'm talking about Brian I don't always I do know I do know and that is config management camp organized by our good friend Chris by Terz that will Be immediately after Fostum for two days So you can get a little bit of configuration management in at at Fostum But then follow on for two whole days What I like about this is that last year Chris did he's been doing puppet camps? Which is great But I think he noticed as many of us have also noticed that Whether it's puppet or chef we're solving a lot of the same problems and it makes sense to have a broader broader topic Absolutely, and so yeah, I'm working with Chris and a bunch of other folks to help organize this config management camp That's gonna be the two days after Fostum I think maybe we should have Chris come on the show and talk a little bit about it Here in in the next episode or two. What do you think? Oh, definitely? I agree. I also think that for a lot of Americans you may feel like it's a long track but I think it's a really great opportunity to come to Europe and And see and do something technical, but at the same time see it, you know go to a new place and meet some new people Absolutely and drink some fantastic beer It's good enough for me All right, so enough of conferences. We're here to talk about chef and HPC today So we've got actually this whole show I think kind of spawned out of an email that came onto the chef dev list from a couple of guys over the RCE cast And so I'd like to welcome today. Well, we'll start with you Brock Brock. Welcome to the show. Hi Thanks for having me um Forgive me. I'm still recovering from a little bit of an illness here. So I may sound a little funny But just introduce myself I am Brock Palin and I am one half of RCE cast and the other half is gonna speak here in a minute and correct everything I say that is wrong, but RCE stands for research computing and engineering I personally reside at the College of Engineering at the University of Michigan I am both an alumni a student employee and now a full-time employee there where I am one of the admins for the high-performance computing group Had this idea to start a podcast which became RCE cast which is RCE dash cast calm Where we just like to explore and find out other things that are going on in our it's a relatively small community but there's a big split between science and you can say like Administrators and we like to kind of keep track of everything that's going on in terms of best practices and what new things are going on out there excellent and Jeff introduce yourself Hi, hopefully you guys hear me. I'm first time do hangouts user. So hopefully I'm doing this right here My name is Jeff Squires. I used to be the MPI guy at Cisco MPI is kind of the Linguafract of what's used in these high-performance computing codes It's middleware library that stands for the message passing interface But now I'm actually one of the MPI guys at Cisco. We have tripled in size So there are three of us here So it's actually really cool to be working with a bunch of other MPI guys here at Cisco But Brock kind of came up with the idea for this RCE cast several years ago It's I think he hit most of the the the high points there that the idea is that as Assistant administrator Brock gets exposed to different things than I get exposed to because I'm a developer And I developed the tools that that people use but I'm not a user And so there's at least these three different communities that we kind of try to bring together that we talked to different Administrators we talked to different high-performance computing projects We talked to developers and try to just kind of make sure that everybody knows About some of these cool new things that are coming that people aren't necessarily aware of and so we just Basically what it comes down to is we find cool people and talk about cool topics and sometimes it's not directly related to HPC But the infrastructure around HPC, which is how we got Turned on to chef and that's how we reached out to you guys Great great. We also have Ben Ben cotton my my colleague and I think condor and chief at cycle computing Ben Can you introduce yourself? I'm Ben cotton and my Brian said I am the the resident condor expert or I should say HT condor It's like all computing and I previously worked at Purdue University in the research systems group I guess Brock's counterpart at a better Big Ten institution, but Alright, so we get to fight today on the food fight show. That's awesome. I brought my knife, so You guys can argue over obscure American sports Go Irish No idea what you guys are talking about HPC oh and we have it as well The the wonderful Matt Ray a longtime contributor great to have you here Matt and also representing the University of Texas with our I actually got to work on It was Ranger at the time, which is the HPC cluster here at UT Actually where crops go though and haven't done any HPC stuff in like ten fifteen years, but I Do work with a lot of people with some very large deployments and hopefully I have something useful to add to the conversation Yeah, and I think before we get on You know this year the University of Maryland where I'm an alumni bought their way I mean moved into the Big Ten So I have to come and represent go Terps You're talking about the Big Ten of supercomputers right right the Big Ten of supercomputers So that we keep using this word HPC What does it mean? Brock Define HPC for us. Yeah, okay. I'll start and everyone can fill in the holes. I miss um, so HPC it depends What you want it to be the most common case we see is things that are horsepower bound on scientific applications or business applications We're seeing more business applications that are Limited by the amount of horsepower either in a single system or just that sort of thing So we tend to have large stacks of machines. I have about 1500 nodes here at Michigan, and we're actually a relatively small deployment Purdue installs that every year and we do that every four years and then places like, you know UT they have these gigantic national lab setups where they've got some of the largest machines in the world with over 100,000 processor type systems and things like that. So that's the traditional That that's the traditional type system where you have a lot of CPU horsepower And then it's kind of interesting, you know with a condor guy here There's also these high throughput computing things that I'd like to include those guys in high-performance computing. They're the Large-scale serial job farm. They're the need to get a lot of small tasks on quickly as opposed to a large single coupled task So there's no parallelism in the sense that you need an MPI library for communicating between nodes But you have this need to maybe run a million Say you have a million little images of the sky from a telescope and you want to run a little analysis thing on them You could farm those out on something like open science grid or the Purdue condor pool Using what we call high throughput computing But that's still very tightly and dear to us in the traditional high-performance computing and then there's people who just need more Memory, I mean I've got machines with a terabyte of RAM through exceed I get people on machines with 30 terabytes of RAM and a single system image I mean most people are happy if their server has 128 gigs or 512 gig and we're you know We have people yelling at me that's like a terabytes not enough of system memory And I tell them just paralyze their code and use multiple nodes But so this this is generally the case of CPU horsepower memory horsepower Through put horsepower just if it's bigger and you can't do enough come and talk to us and we've got a way to get it done So what distinguishes you know traditional HPC from something like Hadoop Okay, so traditional HPC is what I would consider CPU bound hadoop is what I would consider data bound Normally we have a small Hadoop setup here. We're exploring more for people who have data intensive applications There's two fundamental ways on a traditional HPC cluster You have a batch queue for people submit work and when resources become free We run the application there on that resource and we move data from a file system to where it's running Hadoop takes the opposite approach the data are already distributed on all the compute nodes and we move Computation to where the data are because moving to computation is the cheaper thing So data intensive as opposed to Computationally intensive. So this is actually I want to dig into this a little bit. Are there are there HPC systems now that are more and more Doing things like Hadoop that is moving the computation to the data Seems like it'd be a quite an expensive task to always move Well, I would say that the lines are becoming pretty blurred here actually so right I think Brock's analogy of there of moving the compute to the data or moving the data to the compute these are the well Moving the data to the compute has been the way that people have been doing it for 30 years, right? That's the HPC way the Hadoop way is only a couple of years old and it kind of Started with this idea of I have so much data that I can't move it And therefore I want to bring my compute to the data rather than the other way around but The Hadoop community is is still fairly immature They're starting to realize that like oh, yes now we can do these problems that we couldn't do before But they're starting to realize, you know, they don't actually perform that well So, you know, yeah, I can do it, but it takes a week to compute it And that's better than not being able to do it at all but really I'd rather be able to do that in about an hour or two and so there are some interesting Researchy and Industry kind of experiments going on right now of seeing what what can we blend between the two? What can we learn from the HPC community and apply that to big data kinds of things and What kind of big data techniques that have come out of the last couple years can we apply to HPC? So I don't think there's an interesting thing to say or a definitive thing that can be said about You know, what is what big data stuff have we done because we don't know yet everybody's still exploring? That's a weasel answer, but it reflects reality at least I just want to put it out there that the biggest pain in my ass and dealing with HPC systems is NFS Continue the conversation I can give you some things that are common in the HPC world that are will make NFS seem like a cake walk but But we use them for a reason there's actually a lot of things if you want to talk about infrastructure of what an HPC infrastructure Looks like a traditional one There's a lot of things we use that are weird that your your general IT shop even a large scale like a webshop that may have you know Something that looks like a compute cluster that we have that we won't have we'll have these parallel file systems luster Gluster pan FS EFS that I can write a single namespace a gigabyte a second Or a DVD a second about five gigabytes a second and my luster file systems very small Matt mentioned that you worked on Ranger Ranger had this very large luster file system that was doing you know tens of gigabytes a second And now we have the Sequoia file system, which is 40 petabytes single namespace 1 terabyte per second and this is completely outside the scope of any type of And this is a POSIX file system right this is not an object store This is this is a POSIX file system, which is this is really weird and odd and then we get into these weird networks Like POSIX yeah You know the world that Hadoop grew up in the big prevailing assumption with Hadoop was that the network was crap Right and and that is not necessarily true anymore there the Hadoop way was let's just write everything to disk because that's universal and Whether that's a network file system or a local file. It was almost always local file system But they even used it to cross between nodes Sometimes but that's just not the a correct assumption anymore And so if you look at the HPC type of way, we assume that the network is awesome And do things at the expense of the network because we have super low latency and super high bandwidth And so that's what I was talking about of the meshing of the two ideas and where where are we going to fall within that? I don't really know but just thought that was a point to bring out there Yeah, jeff's been working on some really interesting stuff with the Cisco gear My local cluster the one that has my my best network is uh Was 14 000 processors on a 40 gigabit per second infini band network. So every node has a 40 gig Now it's not 40 gig for data. It's like 32 gig data. Yeah, but that's still pointing out anything But you know, yeah 32. Yeah, I mean you're still looking at three gigabyte per second capacity approximately for every server And I've got You know 14 000 cores all on that network and these aren't the uplinks I mean these are these are host level connections that are you know three times faster than 10 gigi that most places It's still only used for uplinks And this same network with you know sub 10 micro second latency And We do direct memory access over a network. There's a lot of weird things we do in the network and the assumption that the network is awesome And hpc is totally true um And this is why we say sub 10 micro latency. That's a half round trip ping pong through the core right So that's actually less than five in fact depending on what you're doing it's uh It gets all over the place. So sub 10 is is the safer thing People will quote you sub two and things like that But I mean we're you're talking really significant latency differences here and performance and everything else. So This is some of the things we do that are different Some of the things that we do are different Interesting. So are it sounds like most of the people that you talk to on your podcast are using Their own hardware rather than the cloud I right now. Yes, um, but we're seeing more of that um, we see it more for things like Your htc type work because of the network and the cloud not always being deterministic or Reliable or perform it honestly. So there's a lot of fear of virtualization Uh, because virtualization steals performance, right? So right off the top You're gonna lose i'm gonna make up a number and i'll probably make all the virtualization people angry at me But let's say you lose five to seven five to ten percent of your performance cpu eyes And at least another five to ten percent off your network performance Just because of the nature of how virtualization works and Traditionally hpc shops have been all about the cycles, right? I want to get the maximum performance that I can I run my hpc servers at a hundred percent We don't have idle time. So the assumptions that many people make about virtualization Don't really hold for hpc where you're you're taking advantage of all that idle time on the server to run You know a different vm and things like that that is just not uh, not what we do in hpc All these codes are designed to use all the cores Lots and lots of memory and a hundred percent of of the cpu now that being said some people are exploring the use of Virtualization because they want to do things like oh my code runs in red hat six your code runs in Slash, whatever your code runs in ubuntu, whatever and so I want to give you your custom setup Um and virtualization is a convenient vehicle to do that sometimes But it's got this baggage that not everybody has really Figured out how to really tweak the performance when you're running at a hundred percent yet And I think the interconnect speed is really kind of the big issue You know, it's hard to find somebody who's going to offer you 32 gigabit reliably in the cloud at this point Yeah, I have no doubt that that's coming You know, I'm sure there's a variety of vendors who would love that kind of business and a lot of cios Who would like to move the data center into somebody else's hands? But for the time being if your if your code is interconnect bounds and the cloud just isn't an option a lot of times And also getting your data to the cloud, right? So a lot of these hpc codes have Patabytes worth of data that they need to chunk on and so You might be down to the you know the traditional analogy of dump a bunch of tapes or dvds into a station wagon or FedEx and Drive it to the cloud and that's faster Yeah Amazon has it as a service so So there's a couple of things I can tell you at michigan We're actually working with amazon and are trying to make that an option but mostly for excess capacity two things HPC systems, especially the larger ones have always had some sort of batch queue system Which really kind of makes The hpc system look like Like some sort of like private Service cloud that provides a single service of we provide compilers and red hats five or six and you just go And so like the service I run here researchers can come and Least CPU time from us On a monthly basis and so they get the infiniband network the high performance latency network They get our software library. They get our luster file system They get expertise in terms of you know things a lot like what cycle computing provides to you know people who want to You know use amazon specifically or some other cloud provider And I have over a hundred unique research projects 500 unique users a month and this lets me smooth out my usage So my local system looks a lot like a private cloud And we've already hit certain economies of scale because we get academic pricing and things like that now That doesn't necessarily apply into business space But we can look a lot like a cloud that provides a single service And that's a good point. I mean from the from the standpoint of an individual researcher You know the the days of having your own cluster Talked up in a corner of your lab somewhere. I mean that's just not Manageable when you can have a central, you know university or business or you know, whatever infrastructure that you know Where you can share resources because not everybody's going to be hitting the cluster at the same time Right, but I still have the problem every now and then where even though I have um, you know You know this Into five digits in terms of processor cores I have a research group that likes to show up and use us for burst capacity And they want to use us for burst capacity at 2000 cores at a whack And that's a significant percentage of my system that I can't necessarily always provide that Um, most of the time I can um, but sometimes it'd be nice to be able to just Let a couple hundred cores trickle out to amazon and extend our network out to them because they are doing serial processing They don't need the high performance network. Well, actually they Do because they're a genomic shop and they're moving large quantities of data, but um, If I could only trickle a couple hundred out of this Right, right So I could what I feel like is a weak link in the current Um, current setup that you uh, you're describing is probably yourself That is it's In it's a very competitive job market that we see in the dev ops world and I would think it'd be very hard For universities to pay and retain Uh, really talented administrators Well, you'll notice I'm not at purdue anymore Exactly You know, but that's that's a problem. I mean, you know universities And you know, I'm trying to speak too broadly, but Um, you know, they have certain benefits that they offer that the private sector can't you know tuition discounts or you know stability or whatever But you know when when you're if you add a couple zeroes those those are overwhelmed Yeah, but you know on one hand if you have an offer from you know, uh, a national center out in california or You know University in the midwest and one of them's paying you twice as much and is a you know Much more exciting geographic location You know, I think the job market is it's definitely a uh It favors the practitioner more than the employer for hpc Yeah, I mean it quickly gets into calculus just like there is with with any job Kind of figuring out what you're doing, right? I mean the cost of living too at the university of michigan I'll wager is significantly lower than living in silicon valley, for example Um, and so, you know, there there's a lot of a lot of data points But I will say too that there is a dearth of qualified hpc system administrators Right, so brock is among the few the typical way that things are done is that, uh, you know Random chemistry professor is like, oh, I need a 32 node cluster to run my my codes So they get a bunch of funding they buy a bunch of stuff and then they say hey you graduate student Go run my cluster and you get a cluster that you know kind of sort of works on tuesdays Um and and finding somebody qualified to run that and keep it running You know as you dev ops guys know it's a complicated job And even though in some cases they run very well by themselves Every machine needs care and feeding and someone who actually knows what they're doing so Yeah Brock you're a rare guy Well, I can tell you though like my background my um, I was a student here in the engineering school. Um My background is nuclear engineering. I'm a scientist type thing. Um, and this was a summer job for me I was swapping dead fans and processors on a 32 bit amd cluster Which at the time was in the top 500 with only 105 nodes and now I've got over a thousand and I can't scratch the list You're gonna have to row in space what top 500 is there bro. Oh, okay top 500 Top 500 org and that's 500 It's the there's a standard benchmark for hpc systems at all You can get into a gigantic religious fight about how valid the benchmark is but It's a top 500 publicly acknowledged because I know some that are not on that list that would be on that list um HP c platforms in the world and the interesting thing is some of them aren't hpc platforms You'll go on the list and you'll say you'll see unnamed internet services company. I doubt that's actually a traditional hpc system I bet you that's just a cloud platform that they decided to run the benchmark on real quick um Yeah, but it's it's interesting, but yeah It does an interesting data point of where these giant machines are and what they're being used for and what giant means right, you know in terms of core in terms of Interconnect in terms of ram Things like that and when brock and I say machine We're usually talking a cluster of commodity servers of some flavor or another right? So, you know brock was talking how many we had a bazillion cores But usually refer to that as one machine because to the user that's really kind of what it looks like They submit their job into the machine and it runs for a while and comes back and gives them their results Right. I mean it's it's all about providing a specific service I mean a cluster normally consists of a head node which will run like some scheduler and batch queue software and some other administrative tasks Or head nodes or yeah, you could have a bunch of head nodes. If you were a better school You would have a bunch of head nodes, right? Yeah, right, right So we have a head node and we have a couple of administration nodes that run a couple of auxiliary services And then the main pieces are but normally a basic cluster you can get with a head node Which could also double as a login node, but normally you want your own login node and then many many compute nodes These are the stamp out as many as you can afford lowest possible cost System and so this means they have one hard drive hard drive blows up. We don't care We slap a new one in we reload the box Normally you don't have redundant power supplies You don't spend money on that stuff because no node is sacred. You don't care you reload them they need to all look the same anyway and That gives us some flexibilities and that really from my point of view we have Three or four system images. We have a head node a login node and a compute node So you kind of have these golden images that you have pre pre right, right? Many of the largest hpc clusters out there don't even have hard drives in the nodes anymore They just take our network not just money. I mean it takes power Right. Yeah I mean my system's relatively small My data center has a touch screen at the end of it that says how much power i'm drawing to keep that thing going I was in there. I was a quarter of a megawatt 250 Kilowatts was being drawn and I wasn't even under full load and the data center is only half full And we draw a lot of power I remember a couple years ago. I was down at sandia national laboratory And uh, they had a really big machine at the time. It was called thunderbird for any hpc years out there Who remember this kind of stuff and we were trying to do these top 500 benchmark runs using uh, I think it was on the order of 4500 servers or so And this is a couple years ago So I won't mention core count because it's kind of different than it was today But they actually had two power stations On either side of the building to feed the data center And I remember we had walked out to lunch So we left the building and we were just walking back to the building and I got a call on my cell phone From one of the techs who said yeah, we just had a crash Um, so I was talking to him like this to this to this right now now run the job again And I I heard him click enter on my phone and then I literally heard and saw both power stations go Just spooling up to feed all the power to those servers putting them back under load again. It was pretty impressive Yeah, I mean we've actually our last data center it was constructed in a location based on what the local utility could provide us for power It was like where we could get land and where that was zoned correctly and where there was a megawatt of power Yeah, which is what this kind of brings that I was just gonna say that's why you see companies like google and microsoft in it and facebook building out in and organ You know near hydroelectric stations, you know, they can buy cheap electricity for for cooling and power Yeah A lot of those are all refab the aluminum factories and you know the The process of refining aluminum uses tons of electricity. So it's a power's already there. Yeah So this kind of brings a conversation around chef, which is what you guys approached us to talk about A few months ago, so what what Can you talk to us at what's what's your interest in chef or or what would you like to know about it? We have a few experts on the panel So, um, I think I'll take that one because I was the one who actually put the call out there Um chef was actually requested by a listener of ours. Um, it was a user request Um, I had heard of it. There was a lot of places where it was popping up But other tools and you can correct me on this to see how they're the same or different Puppet be config CF engine Those tend to be pretty popular in our space, but we do have this problem of I have a thousand machines I want them to all look the same um, I want to push a change to them. I want to Reload them all from the network because they don't have a hard drive I want to do all these things and I have a couple machines that need to look a little different And so there's we have this same configuration management problem that any large infrastructure would have And I want to do it scaleably and I want to do it reliably and I want to have a good bus factor too, right? So that if you know Brock gets hit by a bus the next guy can pick it up without having to be a freaking genius, right? It might be a grad student. Yeah, I want to manage 10,000 nodes with one guy. Can I do that? Yeah Yes podcast over, thanks guys So, uh, if uh, if you've got a chance to see maybe some of the talks from our our conference, uh Back in the spring, uh, facebook gave a talk where they they talked about they've got a staff of Six or seven guys. I'm not sure exactly what the number is, but they said, you know, we've got We support facebook infrastructure six or seven guys and our our, you know, our our rollouts are You know, we have clusters of 15,000 multiple clusters per data center multiple data centers per continent and You know, and we're on multiple continents. They wouldn't say any numbers, but you know, this is a staff of you know Seven or eight supporting that many physical nodes Um So chef is definitely up to the task We'll make chef different from a lot of the other tools that you mentioned is we push the the, uh, Configuration out to the edge of the nodes So rather than have You know a single server in the middle that's going to calculate everything in advance You know and and sort out the relationship between your 10,000 machines for you It's going to provide a search engine. It's going to say hey, you know, you are looking for your cluster head You know, you're looking for this machine You're supposed to be a worker node. Here's what you're, you know And then you come back as the worker node and you say hey, I'm a worker node. What am I supposed to do? It says here's your cookbooks Go configure yourself it comes back and says I need to know about that cluster server the chef server says Oh a search here's your result And then that node configures itself pushes it up to the server And if somebody needs to talk to that node they ask the server, where's that guy? There's none of this You know I'm building this map of everything in my infrastructure at one time And I'm keeping that on the server this the centralized server is just a search engine. And so it it's a different Architecture that allows you to scale a lot better. You know, maybe maybe the cycle computing guys could talk about that Oh, sure. Um, but yeah, we my uh, CEO Jason Stowe He's stood poke at the chef conf and he talked about how we've run clusters as big as 10,000 nodes Uh, where chef is a major component of that um, and uh You can do a lot of things with different languages Uh with different tools like puppet or cf engine Um, but because chef uses a full-blown programming language That gives you, you know infinite flexibility You you have to have that flexibility. It makes all the difference Also because it's a full-blown programming language rather than a small subset Or a small limited language. It's easy for people to build new primitives on top of it Um, that may make it does make it more complex to get started with But it also means that we have a really vibrant ecosystem of components that people are able to build on top of Um, and that makes a big difference when you have a lot of different systems Now in our case of cycle We have many different customers and each has a very different stack So we will never have three images that can take care of everything Um, and for us chef gives us the flexibility uh to accommodate all that variance So Okay, so two things uh, so you guys like for cycle wide you run one chef instance that you kind of build these Different customers up on or no different uh different ones preach one per customer Okay, so there's no right enough in one per cluster. It really depends It depends on the setup Okay, and then so you say this full-blown programming language. What is it? Is it something that like you can just find and learn? It's ruby. Yeah, so even the configuration is written in ruby. Yeah, so, uh, it has a you know a configuration subset of uh You know a dsl built into ruby for configuration So you don't have to know everything about ruby But if you need to scale out what you're doing if you need to do something more complex If you need to access ruby libraries or you know Do something interesting that chances are good. You'll need to do you have a full-blown programming language available to you Um, and then everything that you do is backed in source control So, you know, you said hey, I've got this graduate student. Well, you can probably find somebody who can learn ruby uh, and Yeah, it's not An uncommon language and then everything that they do for deploying your infrastructure is checked in You know, hopefully, uh, you are using you know something like getter subversion You're checking in everything so I can recreate any piece of my infrastructure from source code Yeah, so does it understand native package managers and stuff like that? Yes, so it provides a number of resources that are abstractions away from the underlying operating system components So if I'm running, you know, if I need to install, uh, you know, mpi library Uh Of a certain version, you know, I just say package mpi library version, you know 2.1 And then under the covers chef knows. Oh, I'm on red. I'm on centos. You know, I need to use yum I'm on a boot to I need to use apt Uh, you know and assuming that those packages are available to it. It just goes and handles that Uh, so your code your infrastructure becomes More agnostic about what it's actually running on Yeah, let me dive a little more into that. So when you say we're using ruby, uh, you know, you said it's for the configuration files but What else is there is there an api because I you know, based on what brine was saying you're saying it's kind of infinitely Flexible and extendable. So what does that mean? So so the the server does have an api All communications between the client and server are done over, uh, htpp rest interface And there are so you can you can the client talks over that api And then their client library is written in ruby and, uh, java and, uh, python, you know I think there's a dot net one floating around a php one So you can talk directly to the server to push data in from external sources if necessary or to query Excuse me to query it. Uh, if you need to, you know Manage chef from another source, uh, you know building integrations with other systems is is pretty simple Uh, what I was talking about the components. I was referring to the dsl That we have for chef. Uh, that is an easy way to represent common resources Now when you're talking about people who don't have much experience It's really great to present them with kind of a dsl that just matches the problem they're dealing with For example, I have a written a, uh, C mount I call it see it's like an esb mount resource that represents, uh, an md an md sorry, um A software raid on linux, uh, that's striped at raid zero with either xfs or ext4 Now to write that kind of code whether in ruby or bash would be A ton but I can represent it with ruby specifically with the chef dsl with only four lines I'm easily able to extract out the common case So that someone who has less experience or is less involved with ruby can easily reuse it And you see this pattern again and again where people take abstract very common pattern And are able to represent it very simply So, uh Yeah, what's a dsl domain specific language? Okay, that was my question So we have we have things like file we have like a file resource Uh, we have a mount resource Uh For every or in ip tables rules. Yeah package resource Like interface services templates I mean Most of the things that sys admins care about There are already resources built into chef and you can add your own as you come up with things that You know like brian said you find yourself Writing a bunch of shell scripts to do something over and over again You can turn those into reusable resources and build up, you know libraries of how applications are managed You know things that you do as a sys admin And when we when we talk about the resources There are a couple of things that are common across all resources So for example, uh, let's take that package resource as an example So with a package resource what you're going to write in your recipe, which is the the Thing that you write within chef. It's your basically your configuration recipe What you're going to write is a package resource It'll have the name of the package and then it will have some attributes about that package Maybe the version number that it's uh going to be you know the version that you need installed and things like that Underneath that resource Conceptually underneath that resource is this thing called the provider and the provider is where all of the heavy lifting happens the provider knows That i'm running on sentos or i'm running on Ubuntu and so I know which package provider I need to go out to and what system calls I need to make to make sure that that package is there But the other thing that the provider will do is make sure that The resource is in line with the policy So for example, if you're saying a package In your recipe, you're essentially saying that that package must be installed The provider is going to verify that that package is installed and is at the appropriate version And if it's not then it will take an action so it will install that resource It will install that package so that it does meet the policy And then you can think of the same thing around all right, so I've installed a package now. I need to configure Uh its configuration settings We have a template resource that will allow you to do that And so the template resource you'll specify other attributes like the the ownership for that Template for the file as it gets written on disk The the permissions on that file and then maybe some variable content that goes into that template And we have a whole series of attributes that you can use To specify you know in in various environments or in various roles you get different values for those attributes and then of course What chef will do is it will update that template to ensure that it is in line with policy So with chef as you write your recipes you're really stating the policy This is what the configuration of the system should look like and over time as you execute chef your system converges on that policy All right another pop-up question somebody earlier made reference to cookbooks. What are cookbooks? Sure, so basically what what a cookbook is is a way to package up a configuration thing So, um, let's think about You know a piece of software that you need to have running on your applications So maybe an npm or a an mpi Provider or something like this right a cookbook will include recipes and the recipes are things like This is the package that i want installed these are configuration templates that i want to write out to disk But then within that cookbook, uh, you will also have some metadata about that recipe or about the Cookbook itself and then the template files that you're going to use to write out to disk you may have other libraries that are included within the cookbook So the cookbook is really a way to package up A note a rest one or more recipes and then other files that are Required to be used with that configuration thing that you're configuring So here's a question can is there a way to Have the client send something to the server saying like hey, I found I have a gpu in here We have machines that have these gpus for gpu computing and they need extra software installed And so it can kind of push back like oh you have that you also need this recipe that installs all the nvidia drivers Absolutely. So so when you're writing when you're writing your recipes, uh, you would say hey, uh, you can look at yourself You know, you can inspect the node that you're on and you can say hey Do I have you know this number of cores or this? You know is this installed on my pc i bus If so, let's make this other you know set of configuration available to us That's one of the advantages of being in ruby Um, and then other machines if if say i'm a job scheduler and I need to you know Send work to some other machines. I can ask the chef server Hey, I need to find all the other machines that are running that have You know this mini cores this amount of memory and you know this on their pc i bus so I can you know send them packages So search is is kind of uh an integral part of what chef does because you know the example We're usually using the things like uh, you know a software load balancer searching for the applications that balances or Hadoop workers looking for their master Well, you know in this case we can say hey, I'm looking for you know things that I'm allowed to work on Yeah, that sounds a lot like our traditional resource manager. It's like what what do you have and what's it currently doing? It's not as dynamic As a traditional resource manager. It's it's a very nice very simple interface to deal with the chef search on the server side But it's not as dynamic. It's um, it's only updated after each chef run successful chef run Ah, so that's actually a question. I wanted to ask is Well, actually What do you define as a chef run? Is this something invoked by cron or invoked? Upon demand or upon reboot or what when is a chef run? Yes So here's why I was asking is One of these you know blurred lines that I was talking about why people are exploring cloud-based computing is I want to change my environment basically Based on the user or group of the user and like when this user is using this node I need these 16 extra libraries And I don't want to put them on the network file system because that's just traffic then that I got to deal with I'd rather stage it locally on those nodes or do something because that user is now running on that node And when that user is done, I want to Take that off or do some other kind of configuration action Right. So so the chef client You can trigger it you can put it in cron You can run it as a daemon where it checks in you know every 10 minutes or whatever You can run it on demand. We have SSH where we can push and there are integrations with lots of other systems And we have our own that's coming soon that does 0mq so we can send you know 2,000 machines at a time. Hey everyone run your chef client now And and grab the latest pieces But one of the things you kind of Mentioned is taking a machine and repurposing it. That's that's a little bit of a different use case Typically we tend to push people towards thinking of the machines as ephemeral So I might have a big beefy box and I run a workload on it and when I'm done I want to recycle that box I don't want to uninstall and reconfigure it for the next guy because maybe You know they trash temp and I didn't notice that they filled the disc You know chef's not going to unless you tell chef to clean up after itself It's not going to clean up after the previous person So, you know typically you want to just recycle boxes as you as you use them So as the next guy comes on he's guaranteed a fresh box Uh, which is kind of interesting. I uh, I do a lot of open stack stuff and I see Open stack is picking up a lot of traction and and hpc Especially using like the the bare metal excuse me The bare metal driver for the the scheduler For for nova whereas instead of using virtualization You actually have physical machines that map to An image type so I say hey, I need 10 boxes You know triple excel boxes and those are available to me and it puts a fresh installation Of whatever operating system Yeah, tough time I feel like getting all choked up. This is really funny. I love this stuff No, but uh, I I think I you know, I expect to see a lot of open stack showing up in hpc environments really soon Uh, because you get your quotas you get access to bare metal Um, you can manage all your resources and recycle them automatically You know and put them all behind an api Yeah Yeah I didn't know it existed. Yeah It's it's already the case that the scheduler we use for our local resource called um moab Um from adaptive computing I already keep track of like what os it is and like image type and you can actually request that when you submit your job And this is actually how they do their like dynamic You have one scheduler that runs both a linux and a windows cluster simultaneously and they like reload them dynamically But they rely on some third party tool like chef to do that Um, and so it could just be like, okay, these four machines are idle and they would fit It could just send a thing to chef saying like hey make those windows machines That work Um, we would tell open stack to make those windows machines or you know tell some other provisioning tool chef doesn't Manage chef doesn't a provisioner. Uh, it takes over after the os is on the box um, but You know we've integrated with just about every provisioning tool out there. Uh, so, you know, it's easy enough to do Yeah, there's um a system that uh a new extension to the chef server that should be coming out open source any day now So I hear that And uh, you can't talk about it Yeah, uh, you know from the osco guys, I hear it's coming and I and I really push them because it can't come out soon enough You got right the description and you cut out in the middle I said, uh, there's this new component coming out from uh for the chef server and uh It just can't come out soon enough because it would be very instrumental for this kind of dynamic reallocation of services It wouldn't be an engine for driving the rules, but it would certainly be as a much better Channel for communicate communicating with your different nodes In a real-time basis and discovering what what uh what resources they have Okay, so let me let me ask a hypothetical question You guys being chef experts if I wanted to reload 10 000 nodes all at once How would I set up a chef infrastructure and assume uh red has sentos environment because that appears to be most popular in our in our space Um, how would I set up an environment to take bare metal and make sure that they are config the way I want as quickly as possible So so in the hpc space there are lots of tools that do hardware provisioning, you know, like rockbox or That's the only one I can think of off top of my head. Um, but things like xcat razor We used this kickstart cobbler Yeah, so so kickstart or aprecy put the os on there, but there are there are applications that manage that pixie booting lifecycle Yeah, um, and and so chef people have integrated chef to those things where after they put the os on the box There's usually like a a late command And it says put the chef client on this box and then that box when it boots up checks into the chef's server and says, you know reporting for duty What am I supposed to do and you can pass in a run list when that machine is provisioned? So you can say hey go get me a Centos, you know triple xl image with you know 64 gigs of ram and when it's done I want it to be running this work work stack and so it will go provision the os You know bare metal pixie boot provision the os put chef on the box checking with the chef's server says Somebody told me i'm supposed to be running this and then the chef server says hear your cookbooks configure yourself Check back when you're done and then it joins whatever sort of cluster is out there And so that's actually a pretty common workflow around bare metal provisioning So is the chef server going to be the one hand and out the packages that it wants installed after the initial kick start Or does it still just use yum and yum uses http or wherever you have those repositories? Yeah, so so chef is not the repository for your packages It holds your cookbooks the the chef server is holding your cookbooks Which are just the the rest piece for how to set up these applications You can still use you know apt repos yum repos Local mirrors remote mirrors whatever you need to do chef's not going to you know do that for you But it's flexible to the point where you know you can have whatever mirrors You know you need to get your applications on your boxes What a lot of people do is they will have a for the sake of scalability have their own mirrors on Something like s3 Or even i'm as on cloud front so that they're they're they're pretty much guaranteed that That whatever provides their their packages won't break down in the middle of a big run So it sounds like one of the secrets to your scalability is that chef is really Very crassly speaking a metadata server right that it's just saying what to do It's not providing the actual stuff like the packages which consume all the bandwidth things like that And that's how you can scale out to like 10,000 servers and whatnot. Is that absolutely? Yeah I mean, you know, it's kind of the uh the use of search is you know We give the analogy of you know when the internet started you had yahoo who tried to provide a a directory of everything on the internet And then you had people like altavista came and said, you know forget that here's search go find what you're looking for Um, and that's kind of you know chef's approach is you know, we we can't hold everything for you But uh, we can provide you know we can index as much as the data is possible with search You know where you get their packages from local mirror s3. That's not chef's You know expertise Okay, so Like if if I were to install chef I would still need to set up my own ldap server my own Pixie boot server my own whatever server and things like that and chef is the traffic cop that says Oh, you know 37 that just came up by the way set yourself up to talk to that ldap server over there Set yourself up to that evil nfs server and and And yes, and you have cookbooks that have set up your ldap server Register that box with the chef server. So the guy who says I need to log into ldap. Where is that? You know, you don't have to know these things in advance your machines You know and then if that ldap server goes down you redeploy it on some other machine without having to care You know without having a big spreadsheet of ip addresses where you're like, oh ldaps now over here You know update all the configs, you know those configs are dynamically generated Oh, I think I finally get it Because traditionally what we'd have is is like on our configuration management thing We'd have a config file that would have you know the server name hard coded in there And you're just saying that you can refer to that resource that chef keeps track of what box currently has it Yes So it really doesn't care So this works totally great in like the amazon ec2 type world where you really don't know what they're going to give you It's just you just have something reporting back saying hey, i'm yours Do something with me or or if you you know if you have a data center that you're you Frequently recycling, you know the machines You know you're putting different workloads on them. You don't have to care about the previous tenant You know you blast them and say here's a hundred boxes go to town So can I steer stuff a little bit though like you know, I said our head knows generally they do have mirrored hard drives Because it is a critical piece of infrastructure Compute nodes don't like is that one of those things I can kind of tell it like find a box that has two drives That's more of your provisioning tool. Okay, so so when you say you know when you go to Uh, you know your api for your hardware you say hey, I need this I need a you know a box that is this size and when it reports back go and Uh put the the you know go put this run list on it. This is a a a uh cluster controller node So I can see it as being extremely useful for the extreme large scale. What what would you Recommend to somebody who's actually got a relatively small infrastructure say 32 nodes or less Is So, uh obscodes internal open stack deployment is currently 25 nodes. Um, and we run about 300 bms on it Um, so that's you know, that's kind of our use case and of course, you know We're gonna use chef for everything my my own lab that I do a lot of open stack deployment on is only four boxes right now I lost another yesterday um, but you know pretty much I'm always Destroying my hardware and recycling it. Uh, and so I use chef because I don't want to have to manually configure anything I can you know pixie boot these machines in in five minutes And you know, they're up and running, you know up into 1304 with whatever I need or the running rel or you know It's easy to recycle my hardware And I would say anything larger than zero machines is a good candidate, right? I mean even if you just have one machine You know if you're you have your your cookbooks and they're in you know The cookbooks are backed up somewhere if that machine gets hit by a meteor You get your replacement and you restore from the cookbooks. You don't have to sit, you know spend that first day Getting everything tweaked just the way you wanted it again I have a great example of this and that is um, we run, uh our own git repository using the git lab software, which is an excellent, uh Tool to manage your git repository And I have a set of cookbooks that I use to set up Git lab It looks just like github now when it comes time to upgrade git lab Uh, I will use that cookbook to set up a new server And migrate everything over And the migration process will be a million times simpler because I have uh, I have you know I have basically have the the configuration in code for this very critical service They mentioned the the learning curve and you were just kind of emphasizing that again by writing code So it's kind of prerequisite for this that you need to be able to read or at least Tangentially grok ruby code Not really. I mean if you can write bash you you'll probably be fine with uh with chef Oh, how um All right that confused me a little bit because you were saying before that ruby was the language of chef So where does If you can if you can grok simple shell scripts you I you'll be able to grok understand the The the chef configuration language. Okay And we have lots of you know resources for new users. Uh, you start at learn chef.com Learn chef.com learn chef.com all right conversation Brian, uh, you got cut off again, but I know what you were trying to say was we're reaching the end of our time together for today So any any last thoughts or questions before we start to wrap up for the day No, just like I say thank you for doing this a joint show. It seems like you know, we answered your questions and you answered our So this is gonna be pretty good. I think our our listeners and hopefully you know, some of them will find out about chef Yeah, I certainly learned a lot about hpc today. So I thought it was uh, I thought it was just a great conversation Yeah, well, so with that why don't we move into the picks? So this is the point in our show where we uh allow each one of our panelists a moment to share Something that they'd like to with our audience whether it's a technical or not. It doesn't matter Can be technical a book a beer whatever you like, uh, brock. I know you have prepared some picks So i'm gonna go with you first So I had two different things One is I have been for about the last three years now really enjoying a website called marginal revolution.com Which is actually a blog of two econ professors at george mason university, but the one guy's a foodie Very non-argumentative unlike some of the econ blogs out there about policy and things like that I've so it's like it's food. It's weird things that people actually sell in other countries It's current policy, you know in places like Greece and stuff like that And it's it's really been an interesting read of a condensed kind of view and I really like the way they present it Uh, the other thing I had I actually need to go grab it just a sec because I knew I'd be bouncing the entire time is I use standing desks For those who can't uh, or listening I had a back problem Hey, me too standing desk, but the other thing I do when I can't stand anymore is I sit on a yoga ball This actually helps me quite a bit with um, my lower back problem and but I know I couldn't sit on this during the show because I would just be sitting here Bouncing around like a hippity hop like when I was like five years old Don't let it stop you don't let it stop you Matt like Matt Matt often joins from a very special type of desk also matt you can tell us about that So, uh, uh, are you are you done brock? I'll go ahead. Yeah, go ahead segue into my picks So, uh, I had a back surgery two months ago. Um, but I have a treadmill desk And uh, I've been on food fight a few times walking on that Uh, the setup that I have is from if you go to treadmill desk diary dot com I pretty much have his setup. It's a really cheap Ikea desk on top of a cheap treadmill Um, and so I usually walk, you know an hour to a day And it helps my back. I'm doing much better So that's I guess that's my pick treadmill desk diary dot com And uh, my other pick is a music pick There was a 2007 movie called sunshine, uh, danie boil the director slumdog millionaire 28 days later Really interesting good sci-fi movie But the soundtrack is really hard to track down. It's only on itunes. Uh, no DVDs or no CDs of it But uh, it's been popping up in a lot of commercials lately and I couldn't place my finger on where I'd been hearing it So, uh, go check that out. It's really really good I'm a combination of classical music and, uh, some of the guys from underworld Awesome Um Jeff how about you for some picks? All right. Um, I got uh, I got two Uh, so one I've been playing with recently is a friend of mine When the raspberry pies came out. I was like, wow those are kind of cool But I can't think of a reason of why I would want one I was talking to a friend of mine about it and he said you're an idiot It's 25 bucks buy one and then you'll figure out what you want to do with it. I was like, okay That's a good point. So I bought one and um, the the idea that I came up with was I have one of those nest thermostats in my home Uh, that you know, it has a nice motion sensor and has a nice interface. It's it's spouse friendly Right and uh, it programs itself well and things like that But one of the things that's a bummer for me is that This nest thermostat is actually in a room. It's in our dining room and we don't walk through there very often So the motion sensor bit really doesn't work very well for us and a couple of times it's done a false positive of You know, it thinks that I'm not home even though I am home and so how this all ties together is that Somebody wrote nest dot pi a little python script that will go talk to your nest and it can query it and get all the information And it can also set. Oh, I'm home or I'm away or I can set certain temperatures Or you know all the all the stuff that you can do on the normal nest interface and um So I wrote a little a little program that's actually up on github. It's called ruby slippers and It's probably a Incredibly poorly chosen name because it has nothing to do with ruby. It has everything to do if there's no place like home Um, and basically all it does is it monitors for my iphone So if I'm at home and my iphone is talking on the network It's like, oh he's home and it makes sure that my nest is set as home And vice versa when it doesn't see my iphone it sets my nest as a way. So it's kind of my replacement for Uh the the motion sensor and it's kind of just been fun The little recreational coding on the weekends and nights and things like that to uh to get that going and along the same token Probably because I work for sysco The network that I have in my home is probably a little more complicated than what you find in most people's homes and um I have a I have a variety of sysco gear in my house because you know Everybody needs a catalyst router in their house to do the complicated l3 forwarding that you need to do um But recently and this is going to sound like a plug but I don't mean it as a plug We bought this company called morocchi about a year ago or so and they have An incredibly slick interface for managing um switches and whaps. They have this really nice web cloud-based interface and I just bought a bunch of morocchi gear that's supposed to arrive later today and That's what i'm going to be playing with that stuff all weekend Setting it up and configuring it and seeing what the power of it is it actually Brings network management down to a much simpler level So you don't have to be an ios expert or an nx os expert or or things like that And so it's going to be going to be kind of fun to play with that and yes I am totally a geek and I love this stuff awesome Ben, how about you? Okay, so my first pick i'm going to throw out the lisa conference hosted by use nix. It's uh, November 3rd through 8th in washington dc It's the large installation system administration conference and there's a lot of really sharp sys admin types there Every year some great trainings and technical sessions and a lot of uh hallway track and beer Yeah, i'll be doing a uh full day intro to chef during lisa I will be learn more about chef come join me. I will be writing that up for the blog team actually, so I guess we'll see you there all right and then For my second pick i'll go ahead and throw out today's google doodle, which is the uh pinata game if you haven't played it yet If you're listening on the audio later, then you know go back and find it But i've managed to waste several hours of my company's time by sharing with my colleagues this morning, so Yeah, you really kill the productivity this morning I do what I can Anything else ben? That's about it. Cool. Um for my picks. I want to pick one of our former guests That is matt wise who came on the show Uh to talk about zookeeper Yeah, I've been working a lot with zookeeper and uh, he's recently given me some great tips specifically on how to use s tunnel uh to encrypt connections between clients and the zookeeper ensemble Because zookeeper doesn't actually have a security model when it comes down to it And uh for that next I want to pick a music pick and then is the band called churches with a v Those are my picks Awesome. I'm going to follow up your pick uh your music pick with another music pick Uh, and this is one that's been going around the office at ops code a lot lately It's a band called unlocking the truth Uh now if you haven't seen or heard of this band It is a sixth grade metal band And they are incredible if you listen to the music now that I've told you that they're a sixth grade metal band You'll know that they're a sixth grade metal band But if you just listen to the music and you didn't know that You would have no idea. Uh these kids are absolutely incredible. Absolutely wild Uh and then for my second pick, um It's a metal Hanson It's more like a metal crisscross I've just dropped a link into the uh into the chat window Those of you that are watching are gonna maybe I'll drop a link in uh IRC also But yeah and definitely in the show notes, uh, but check that out. But um So while we're talking about children today is my daughter Shannon's actually can ask what what truth are they unlocking? Like multiplication tables. I mean go and watch watch the video and you will see it's uh, it's it's pretty awesome It's pretty awesome. I will say that the first comment on The video that I just posted is from a guy named David and his comment is today's lesson Be a bully and wind up homeless. Do not mess with metal And there's not to say a couple other things, but so that's the that's the truth that they're unlocking. I suppose Wow Yeah, I auto completed that in google search for me. So there you go. It's it's good stuff Yeah, and my second pick, uh, I started to say this. It's my daughter shannon. It's her seventh birthday today So I just want to wish shannon a happy birthday Happy birthday shannon. Happy birthday. Happy birthday shannon All right Thanks so much everyone and if you're listening on the on the uh, youtube stream if you want to wish my daughter A happy birthday. I'm sure she would appreciate it In any case, uh, that brings us to the end of another awesome episode of the food fight show I learned a lot about hpc. I hope you guys learned a little bit about chef. Um, and yeah, this has been great So, uh listeners of the food fight show. Thanks again for tuning in be sure to tell your friends about us You can find more information on food fight show dot org Where can we find more information about the rce podcast? RCE the best place is rce dash cast dot com. Um all the entire back catalogs there. We don't care Excellent excellent. Yep. We have all of our Episodes there on the food fight show dot org as well. Uh, brock Jeff thanks so much for coming on the show. It's been great to have you and ben and matt. Thanks for joining us. Well, yeah Thanks for having me. Thanks guys Excellent. Well until next time chefs Keep it hot