 Welcome to another edition of RCE. Again, this is Brock Palin. You can find us online at RCE-cast.com or in the iTunes Podcast Library. I also have with me Jeff Squire from Cisco Systems, also very famous for the OpenMPI Library. Jeff, thanks again for helping me out. Hey Brock, how's it going? All right. Good. So, this afternoon we've got one project that I've heard about for a long, long time, but I actually don't know too many details about it, so I'm kind of interested to talk to our guests and find out what's what. Yeah, at Michigan we've toyed with the idea of using Xcat. We were a big customer of one of Xcat's big supporters for a while, and we're still using our old system. We're still kind of into market for something like this. So, our guests today are Valard Benicosso and Egan Ford, who are here representing the Extreme Cluster Administration Toolkit, also known as Xcat. So, Egan, why don't you take a moment to introduce yourself. Thanks. This is Egan. Right now I work at IBM. I lead one of our team of HPC and Cluster Architects. I'm also the leader of the Xcat open source project and it's a creator. Val? I work for a startup company called Tsumavi. Previous to that, I was at IBM working on Egan's team. As one of the lead developers at Tsumavi, I still contribute to the Xcat project and have a lot of fun with it still. Valard was on both of my teams. He was both a Linux Cluster Architect as well as one of the core Xcat developers. So, why don't you give us the 10,000-foot view of exactly what Xcat aims to accomplish? Xcat is primarily a provisioning system. The goal from the beginning was to get an operating system on as many boxes as quickly as possible. It's still one of its primary goals. Originally, it was not for HPC. It was development sort of 1999 for Web 1.0 and there was this explosion of people that needed to provision Linux at scale. That kind of fueled the ideas and the development and the design of Xcat because at the time we couldn't find at scale provisioning solutions. The big picture is get the OSs on the box because once you have the OS on the box and you have SSH, well then there's a lot of different tools out there that you can use to manage your environment. However, as Xcat development evolved over time, more and more features were added so that they could use Xcat as more of cluster management and do more than just put OSs on the box. But the things that Xcat does, hardware control and console management and discovery and boot target control and OS provisioning, they're all kind of in the provisioning family. Let's get OSs on the machines at scale as quickly as possible. What exactly is IBM's relationship with Xcat? Was this something that started off inside IBM? Because if I remember right, IBM had another tool that kind of did what Xcat did for its power series. Yeah, that's actually what I valid was. When I first joined IBM, I was tasked with the job of working with that group. It was a tool called CSM and Xcat had already been created, but it was just used by the field technical people. It wasn't used by the developers and it wasn't really sanctioned and has IBM's official way to go. And so, you know, even in his team kept plugging along and adding all these great features and the one that was used for power, which was called CSM, actually just went into life about two years ago. So, and IBM's strategy has been that now Xcat will be its HPC deployment tool going forward and CSM is in the life. And just to add to that, for CSM there was PSSP. And I think that's may have been what you remember that predates both Xcat and CSM. Unfortunately, PSSP wasn't wasn't available for non-power platforms and doesn't support Linux and, you know, some of the things that were taking place as we were doing Xcat for Web 1.0 and then later HPC. So something new had to be created and that's how Xcat got started. And then IBM as they started looking towards the future on a way to take the best ideas of PSSP and the best ideas of Xcat and then they created CSM. But there was enough penetration with Xcat. And Xcat was doing some other things and supporting other OSs like Windows and so on that kind of fell outside of CSM. But we kind of came back together again in 2007 and said, let's just take the best ideas of Xcat again and take the best ideas of CSM. And we started together on Xcat too. And it's new development, new team, new architecture, new code. But 10 or 15 years of ideas rolled up into that. So how did the decision come about to open source Xcat versus keeping it proprietary? What was the thought process that led to that? Well, my thought process personally was that I liked open source. And Xcat 1 wasn't open source. It came with source code. But that's not the same as open source. You couldn't modify it and redistribute it like traditional open source licenses. It came with source code to offer certain amount of flexibility. But Xcat 2, there was a decision by all of us. And we agreed that it should be open source. Some of the interest in doing that for IBM was that we could develop cluster management with a smaller investment on IBM side because we would have the open source community to help with the development and the evolution of cluster management. The HPC space moves very quickly. And so an open source development model and that type of collaboration with customers and users can help speed that along. And we asked our customers. And they came back and said, yes, we would like something that's open source and something that we can contribute to and collaborate with like a lot of other things in the Linux HPC space. But we would also like support. And so we focused on creating something that was open source and met our customer needs and our IBM needs. But we were also able to come up with some way to provide support contracts and so on for those higher maintenance customers. Gotcha. So then I'm kind of assuming that Xcat supports a variety of different hardware and software platforms. What's kind of the gambit of what you support? Right now, as far as I'll take this one. For hardware right now, we support a lot of, of course, all the IBM Intel machines. And then there's also the IBM P series and the Z series. Those things are in there. But pretty much since we have this IPMI support in there, most of the light boxes will work with doing remote power and let's face it, a lot of them are all just the same pixie boot, get up and going type environments. But in addition to that, some of the guys from HP actually wrote an HP plugin so that we have control over their blade chassis. So we're able to do all those neat commands that we could do with. Well, some of them that we could do with the IBM blade chassis, we can do those HP blade chassis as well. I'll just add to that that support means different things to different people. And so what hardware Xcat works with is one thing and what the community supports is one thing. And then what vendors support via support contracts and things like that is another. And so when you say support, I'm assuming that the Xcat code, what platforms can it automate or function with? And Valor pretty much stated that. But if you're asking, well, if I want to buy Xcat support from IBM, what do they support? We're limited to support the environments that we can test document and develop on. And so it's not going to be as expansive as what Xcat might work with. Yeah, actually, I meant what Valor answered, but your clarification was much better phrasing of my question. And so both answers are appreciated. So I've got a question. So host side for managing a host, the bare minimum to make things work is you just require IPMI, you said. Is there any host side software that needs to be installed to kind of have continuous management of a host past it being installed? Or does your OS load not really need to be modified to work with Xcat? Well, that was one of the design things we learned very early on from customers is that they didn't want any clients on there. And if they did, then it was something that they would already have had in their environment, like some people really like ganglion, some people like nodules. And so from the Xcat perspective, we just installed the machine. Once it's installed, it signals back to the management server saying, hey, I'm installed. And he flips the pixie boot or whatever boot mechanism you use so that it'll boot from hard drive or whatever it may be from the next time in the case of stateful. And it'll just boot back up. And if you didn't put any agents on it, there won't be any agents on it. So Xcat doesn't require them anymore. It doesn't require them. But it doesn't even have any. We don't have any code to install on on the machines. I wanted to backtrack your previous questions. We didn't we didn't completely answer it. You asked about what software OS is Xcat supported. AIX Linux and Windows. Really, you guys support Windows? Xcat server, does it run on Windows or can you actually like load Windows boxes from Linux using Xcat? The latter. Xcat's a provisioning system that requires Red Hat, SUSE, Red Hat like or AIX as the OS for the management node and the management infrastructure if you have multiple nodes to assist Xcat. But for the target target machines, it can be Windows, Linux, AIX, I guess I've ESX, we support ESXi, VMware's and that's it. Those are the target OS's that we can provision. Oh yeah, I was going to say that there's also the ways that we install the machine. So the Windows, you have several options. You can install it via image X using Microsoft native tools or you can use the standard image image not image X's but using the unattended file that we use that Microsoft uses and you can do it on a schedule you can do right onto the disk. And then with Linux you have the stateful options you can do it stateless meaning that the operating system and runs in memory and the other option is you can run what we call state light which is where some of it's in a fast route, some of it could be in memory, some of it could be on some other hardware hard drive partitions in my house. So it's quite flexible. And for ESXi we do that stateless as well. I think we're the only fairly certain we're the only open source solution that can actually take VMware ESX and then just download it directly in the memory and execute it. We do that with all the hypervisors that we support with Zen and KVM it's fairly easy since we have stateless diskless Linux today we can do that with those hypervisors as well. Yeah I was just working today I was actually last week I was trying to get that new ESXi stuff has actually has a kickstart file on ESXi 4.1 which is pretty different from what VMware has done before and so we're able to kickstart those just like the normal ones and you know install it if needed or stateless or stateful your choice. So is this the limiting factor then these different bootstrapping mechanisms for what operating systems you support because you said before there is no client-side software so assumedly any limitations must come in how you bootstrap the operating system is that correct statement? I think so I mean there's all these different ways right so there's also part image support and so people will take you know images using part image and then we'll just take that image and then we can blast it out the same way and but I think generally and then there's also this concept of boot target so if you already have your things set up and you know exactly how you want it to be then you know you just tell us what you want your pixie file to look like and we'll just make it in there and everybody can boot off how you want. Is that sufficiently vague or enough detail for you? That is absolutely perfect. But the boot target supports are actually quite nice we have a number of IBM customers using xcat that have developed their own provisioning methods or have you know some other external system or open source or otherwise and they just need as part of cluster and cloud management they need something that can build the tftp structure update dhcp create the pixie or gpixie files and and the you know the definition so it knows how to boot and and so on so we'll continue to have that for those people that want to want to roll their own provisioning systems. Yeah they may just want xcat just so that they can power the machines off and on in a cool way and so some people I know just use xcat for doing the remote power control and the remote console control and they do their own way of provisioning the system so it's quite flexible and what it can be used for. So I want to clarify something the xcat since this has nothing client side if you want to use xcat for pushing out updates and stuff it doesn't really do that out of the box you need another tool to do that xcat just kind of gets your boxes up. So the short answer would be yes the primary purpose is to get the box up we do have tools for after the fact administration things like parallel shells and parallel copy and ways to push out updates and files and things like that so we do have some code that does that but that piggybacks on top of ssh or or rsh and so that's that's the demon that we're using on the machine to perform those types of functions we don't have a dedicated service that sits on the machine and waits for xcat to tell it to do something. Now there's some philosophical differences we've been advocating the use of stateless since 2005 and we've had it in xcat since 2005 and and early on we used werewolf and then later on we transitioned to to writing our own because it became very important to us and very important to our customers and it's even stated in their rfps and so on and that's that's a different way of doing things you you you move away from this concept of I've got a thousand machines and I have to go out there and touch them and manage them to managing centralized core images that you just push out and if you need to make a change you make it someplace central and then you you reboot or reprovision the machine and in the last five years the vast majority of the customers I've worked with that we've stuck with this model and so we haven't really developed you know really nice easy you know life cycle management tools you know things that can hold your hand and put out a new kernel on your machines for you and stuff like that you're we have some tools that will help you do that at scale in a in a distributed fashion but you're going to have to kind of know what you're doing in those types of environments so I'm and I'm still a strong advocate of the stateless and kind of want to stay down that path because it just makes administration so much easier so you mentioned something about turning machines on and off I want to hear more about this what does ask have to do with controlling the machine's power state oh we wrote our own ipmi library that would just go out and you know we tested it at scale and it just fits within the same infrastructure that xcat has where you have a list of node or a node range that you could use and it will just go out and you can turn them off around you can also do it with different machines like blades you know you talk to their management interface and which they may not be going over ipmi the other thing you have with that is you're able to do remote console which just gets the serial console and the other some vendors offer a remote video support so you can just see that so instead of having open their web interface and get to it you just want to command like rwbid and it'll just pop open the interface to see it so this power controls is integrated with any resource managers or other allocation managers or data center management to kind of like power off less critical machines on certain states um so yes but it's not it's not xcat making those decisions xcats um the senses and and the muscle but it doesn't have a brain it's not a you know an adaptive management type of type of solution so in environments like that we partner with adaptive computing's moab moab understands how to talk to xcat via its client server xml protocol and so it treats xcat is one of additional resource managers that you might have in the area and then if uh there's no uh workload uh destined for those machines then moab instructs xcat to power off those machines and we have these environments in production that at a couple universities and the department of energy and one bank on wall street right now and it's kind of one of the paths that we're taking with xcat is to have it be this platform for hpc cloud and to provide not only powering machines on and off but we do support power capping and and ibm hardware and we have interfaces for that and and and making decisions on other metrics that can be collected this is the machine healthy maybe i shouldn't put workloads on there if it's destined to fail because it's had too many ecc errors or so on and maybe i want to do thermal balancing or or find other metrics that i want to use for for making decisions as to which machines to turn on and and uh what workloads to place on them so so yes it can participate in that because it does have that ability to remotely control power and read other hard hardware metrics in the data center but xcat by itself can't can't make those decisions some something else needs to make those decisions we're just so it'd be more accurate machine go ahead i'll just say we're just we're just like a you know when you have those cartoons and they have the evil guy who does all the thinking and he's got like some big guy that does all the work or something that's that's kind of how you would do xcat is the big guy who does all the work and then there's some smart like pinky in the brain or something i don't know so it'd be more accurate to say that xcat is integrated in with moab than uh to say that moab is integrated within with with xcat right ah i don't know is that a poorly phrased question what i don't know how to answer that it's uh each is a standalone product that can stand alone on its own um moab needs resource managers whether it be torque for launching jobs which we'll call an application manager or xcat to install os's which is a provisioning manager uh you it needs resource managers and it supports a whole bunch of different ones xcat just happens to be uh one of them but moab doesn't need xcat to power things off i that you know they make sure there's a resource manager whether it works with the day that does ipmi natively for example and that's all you have in your environment uh you wouldn't need uh xcat uh and then maybe they could do something else for provisioning who knows at least with xcat you kind of get it all there at once and and again xcat doesn't need moab to operate you can hire administrators that are intelligent they do exist and and and you can ask them to turn machines off for you let me follow that up with another question and so uh you guys have done some partnering with moab have you done partnering with uh other resource controlling or resource manager uh types of application suites um others are interested um one i can speak of would be virtual computing lab out of nc state uh they've they've been automating xcat for about uh five years now and they use xcat for both bare metal as well as virtual machines it's a application or a desktop on demand type of environment but they also do hpc jobs as well but if you're a student at nc state you need math matica you go to this portal and you say any math matica on windows and uh 30 seconds later it's got it provisioned and ready for you and gives you instructions through a web client or rdp or whatever to get access to that application and so vcl would be the workload manager and the portal that provides both the job queuing and and requests as well as the scheduling and then that controls xcat on the back end vcl is open source a part of the apache project and uh currently supports uh both xcat one and xcat two cool yeah we we at sumo we are also talking to a number of other uh people in this space as well so it's been really interesting but you know partnering with moab was was easy uh we've had a 10-year relationship with them uh most of the largest exotic data centers uh use moab and it's going to be those same data centers that are going to start looking at um building larger systems and and uh consolidating various different os personalities and want to provision them on demand they're the same ones that have the challenge with consuming megawatts of power and they want to be able to save energy and so on and the same ones that aren't very tolerant to job failure and when you have lots and lots of machines when you have 30 60 90 thousand memory dims you can have daily failures and the machine is sick you want to know it and avoid it and so uh uh partnering with moab made uh made perfect sense and so we started this collaboration in 2005 and and uh around by 2008 we had we had demo where and and we've been rolling it out since early 2009 so xcat doesn't actually like the os that is tied to a physical box or given vm is not xcat doesn't really care like it's fluid i can say this host is now windows 2003 and three hours later now it's red hat five four and two hours later it's red hat four based on customer demands like is that flexible that simple as long as xcat has the images it can load it yep yes okay i gotta get me some of that that'd be ridiculously handy yeah it's been quite nice and then you know all you do is just run your one command you know set boot and then or net boot or whatever it is and then it'll just reboot it and it's just great okay so then xcat like i would have to actually ask xcat like kind of what the state of all my nodes are at a given point in a data center what you said it also collects other information what's the most common metrics i can kind of get from xcat what what power does xcat give me to look at quickly the state of my data center we give you temperatures if your ipmi supports it you can get the power usage that you're using you get if a machine is on or off you can get the serial number which is quite handy when opening support tickets against hardware that's failed you can get fan speeds and you can get some cases you can get the BIOS versions on there a lot of it is just dependent on what the hardware vendor supports but there's quite a bit i mean you're just querying either the service processor or you're querying the node via ssh you can get whatever you want so a common one that we would use you know let's make sure all our clocks are in sync and then you just run your pshell command and just see if all the clocks are there it's kind of it's very similar to what i know a lot of you guys might use pdshell so it this one just comes natively with xcat it's one that we had from the beginning there's additional information you can get and as we start moving more towards hpc cloud we're learning that we have to deliver a lot of other information about the state of the machine and sometimes you don't get all the information from one source an xcat would be one source so so velar just described all the hardware information that we can gather and that's that's pretty agnostic and os independent we do collect some os information we do have a node stack command that that moab leverages to understand more about the state of the machine is it pingable what services are up and things like that and we use that by automating nmap to fingerprint the machines to hand that information back to moab through our xml that helps moab understand that well i i i told the machine to provision i gave it 10 minutes and i know it's on and i know it's pingable but for some reason torque hasn't checked in yet or they have their own resource manager for windows that hasn't checked in yet so i'm not entirely sure the state of the machine is in a situation like that that they usually rely on torque or or some other resource manager to provide some information as the state of the machine that said xcat does have a monitoring infrastructure we don't supply monitors you you have monitors plug into the infrastructure and right now we support ganglia and pcp and and ibm's rmc are different monitors that you can have out there and then as they collect certain bits of information they get injected into the xcat database and so you can when you query the database you can collect other information and so as we move forward you're going to start seeing that not only are we collecting a machine or os status but we're also going to start collecting application status and the state of that information is going to be stored in the database so that you can make queries very easily and make decisions very easily instead of asking xcat to go out and get the information for you the data is and will be time stamped so that you can make your workload manager can make decisions whether that information is new enough for falls outside of policy and you have to ask xcat to manually go and refresh it so so we've got a lot of that in there now and moving forward we'll be getting more and more but we have to start doing application and more os monitoring and it's really going to rely on existing monitoring solutions because we don't like to reinvent the wheel unless absolutely necessary or if it's a lot of fun but for the most part we want to use establish things like ganglia and rmc and pcp and and collect the information and get it in there so for actually loading new equipment clusters are getting bigger and bigger and probably the simplest system i've seen so far is rocks you run insert ethers you turn on a host and when it sees it you turn on the next one how difficult is it to add systems to xcat especially if i'm ordering a thousand nodes or one of these full shipping containers from ibm filled full of machines um that's so i think that's one area where xcat really differentiates itself um xcat was written by myself and and valid and and and people even lazier than us um we don't like the concept of powering machines on one at a time that's too error prone can't be automated uh so since 2002 we've been collecting mac addresses in a deterministic fashion by mapping the max to some physical known quantity um in the xcat one we used terminal servers uh terminal servers were very common there wasn't serial over land or or any type of uh uh console over ethernet and so you had terminal servers and so when we built a new system the one known quantity that we could rely on was that this node was plugged into this terminal server in this particular port and then uh when you turn machines on we we have a if the mac address is not known it gets a dynamic address and downloads a small version of linux into memory collects information on the machine and then uh xcat then starts um querying those terminal servers and finding mac addresses and when it sees a mac on a certain port it knows exactly which machine that is and then can update the appropriate appropriate tables um with xcat 2 we've moved towards doing that with ethernet switches instead because we don't have uh determinal servers just are i don't think i've seen one in the last couple years it's all uh serial over land now uh so the the one known quantity for us is the ethernet switches and we query them via s and mv v3 it's it's all encrypted and secure uh we've been told as part of xcat 2 requirements we can't have any plain text on the wire and uh we we query the switches and it's it's it doesn't matter what the vendor is they're all pretty consistent and it's it's not too difficult to uh collect the mac addresses out of the machine and and so that kernel image boots up it keeps pinging to keep its mac address alive in the switch and we know exactly how everything's wired and and uh that's stored in the database and you can store it via regular expression so you can be even lazier when defining your system as long as whoever cables it does in an orderly fashion and we will collect the mac addresses mac addresses that way and and then for things like blade systems where you have absolute addressing blades are in certain slots and you can query that from the blade chassis manager that's that's that's always been very simple you don't really have to do anything but just query the blade uh like the blade centers of the advanced management module just query it and say give me all the mac addresses for the blades and i've stored in my database which nodes are in which slots in which chassis and so therefore i know uh what the mac addresses are so so mac address collection's always been automated for nearly eight years now we call it auto magic uh discovery and if a node fails and you've got your someone calls ibm they can come out replace the node walk away and it'll it'll self discover if it gets a new mac address it'll say the system board got replaced the mac address is new and the discovery process is fully automated just boots up downloads notifies xcat goes and gets the mac address looks at the uh the chain of events table and says oh well after i click your mac address you're supposed to flash your firmware and then i need to put this os on you and it's fully automated so that even a manager can do it so as long as people cable it up correctly i'm going to have to talk to some people i know about that well that's good for you though because you want you want you want to have a good record of where your machines are connected to your switch for you know just if you have to analyze network traffic or any other types of events that can come up and so instead of having just this excel spreadsheet where you keep another the data you have an active database that is updated and you can get reports from xcat as to where everything's connected up it's pretty quite handy so that sounds like you guys you put 10 000 nodes and it'll figure out where 10 000 of them are and load them up right away exactly what kind of i kept hearing you guys say scale scale scale just what kind of scale does xcat actually work at like how long does it take to boot a 10 000 node stateless cluster using something like xcat um so that that's one of those uh it depends answers um you have to design your systems for scale xcat can't give you more bandwidth or you know fix the way your environment was cabled or or the poor decisions that were made in the network topology so you you have to exercise a lot of common sense and make sure that you design the system to be managed at scale that said we did do some benchmarks on a couple one 10 000 node system and a 4 000 node system and we could provision both of them and in under 10 minutes that if xcat was up and if xcat's infrastructure nodes were up that's part of the secrets to its scale is it's having more than one thing out there to provide the necessary bandwidth and throughput so if the infrastructure's all up which can take you know once the xcat management node's actually installed and configured and set up which is the hardest part that can take days but the infrastructure nodes take about 10 minutes to boot up they're stateless as well they have a special image on there and a special configured version of xcat and they they you don't work with them they actually xcat kind of forms a cloud with its infrastructure nodes and it's just kind of one thing and but as long as all that stuff is up you should be able to and you provide enough bandwidth we we understand what the network bandwidth ratios are and we use those when we design these systems but as long as all that's in check you should be able to provision any size system in a reasonable amount of time our design point where xcat 2 is 100 000 nodes is is our design point which has become quite handy especially as as we've been doing more and more virtual machines than the scale is becoming something that you know if you didn't think about it before when you're designing your system management systems and it's really going to hit you right now as you start dealing with more and more vms so through the course of the conversation here you've mentioned a couple of differences between xcat one and xcat two but you said in 2007 you kind of re-architected and and took the best ideas and started from scratch what are some of the user noticeable differences between xcat one and two um real database one yeah it'd be easy to tell you how they're similar um they got the same name well that's that's that's not true because we started calling it extreme extreme cloud yeah jump on the bandwagon there um the command structure like the command line uh the commands and the and the command line arguments remained the same um but everything else is different xcat one was written for one customer and then we just kept evolving it and it was written mostly in corn shell and uh it and although our largest customer had 30 000 machines being managed with it uh it uh and we had the concept of service nodes and stateless and all these things that we learned from from 1999 to 2005 uh are important things that should be in xcat it was getting uh more and more difficult and we did have three layers of abstraction to kind of help make things a little bit easier but uh it wasn't designed to to scale with developers and so 99% of the work was done by myself and um and and where we learned the most was in 2005 when when uh Dave jackson and I got together and said let's let's try to solve one of the problems that customers have and that's silos of of compute resources let's find a way to combine it all and then provision on demand that was just you know we had an on-demand center and then there was a lot of talk about utility computing and and so on in 2005 and um so we we we started doing that and we really discovered that that xcat one was was not it wasn't written to be automated it didn't do a good job of handing back information and especially if you ask a hundred machines to turn on you can't get an exit code of zero or one if a subset of the machines failed you need to be specific about which ones failed so that the decision could be made you don't want to constantly query uh the information uh to to do corrective actions and and things like that you want to know exactly what failed and then perform the corrective actions on on those ones and so um so and we found that that even these large environments updating the DHCP environment for 30 000 machines could could take hours and uh because it was all written in in script and it wasn't very efficient and and so some of the the big items would be client server you know all communications encrypted database we still are maintaining multiple abstraction levels so that you know when you plug in a new power method you can write a pearl plug in you put in our plugins directory and xcat can start using it you know after rewrite a bunch of the front end so that that remained xcat 2 has documentation that's a huge change and man pages even man pages for the database structure um eclipse public license that's a major change but it's not it's only xcat name and we kept that name because xcat had a good brand recognition in the us and and in asia and some parts of europe and uh but the architecture it's it's it's from the ground up it's completely different it was done by a team of people not not by any one individual and um so that's you know it says the whole development process is significantly different as well so there's really nothing similar between xcat learn xcat 2 except for except for the name and uh so some of the commands so that as someone went from xcat 1 to xcat 2 it wouldn't seem completely foreign but that's about it so what's the largest public uh managed machine managed by xcat the large one that i i know would be the uh lantel road runner it was number one the top 500 list for 18 months i think it's number three now it's got roughly 10 000 elements it's uh it's a hybrid cluster of cell blades and anopter on blades all stateless second largest i can think of would be sinet 4 000 nodes hpc cloud it's uh was number 16 when we benchmarked it last year um i'm sure it's in the top 30 still um roughly 4 000 machines again all stateless um this is one of the cleanest data centers in the world as well we power the machines off and they're not in use we avoid sick machines uh users can provision whatever else they want on demand so it's kind of a true hpc cloud environment but uh very green as well and and its lint pack result was then over gigabit ethernet not not over infiniband um the the the cluster is a hybrid of of various technologies and infiniband is only on about one fourth or one fifth of the systems and so my wife has to make sure it puts the right workload on the right networks and there's also a power six cluster as part of as part of that system as well are those the two systems you mentioned earlier that only takes you 10 minutes to provision those things once xcats running yeah so on on the roadrunner system uh and we tested this in manufacturing not at the customer so it's easy to test these things before you put the customer's configurations in there uh but each connected unit has its own broadcast domain and each one of those by themselves we could boot up in about 10 minutes once the infrastructure was up and running um so in theory since each of those things are standalone uh systems from a from a networking and broadcast domain point of view each one of them should boot up in 10 minutes in parallel uh signet is uh really it's a 4 000 nodes on a single broadcast domain where we have 12 service nodes that kind of operate in uh uh ha uh and and load balancing uh fashion and that one we we timed repeatedly at booting that up with a stateless version of linux in about eight minutes so let me take a different text here who else is involved in the xcat community so you mentioned uh various points teams and multiple people and you guys obviously represent two different organizations who else is involved who are the contributors core contributors users things like that how do you guys function as a community um i know i'm outside of ibm so i basically communicate through the mailing list so there's me i know that adaptive computing has has added some uh features to xcat and um i think there's a couple other people on the mailing list that'll just say you know i found this feature to be useful and then they'll want to contribute it so we'll just give them a spin write access and then we'll just write add their code up in there after it's reviewed by eagan jordan stuff so i'd say that the the vast majority of of developers with svn commit access is are from ibm and they're spread across you know china and and pakip see and and and raleigh is where the the concentration of of those developers are and they're representing uh um some of the different hardware platforms that uh we we have to support but you know sumavi and adaptive have svn commit access uh uh landl has svn commit access i'm actually going through our source storage page right now and uh um and and some individual users and and it works the same as any other open source project you uh submit uh some patches for review and we review them and after time we we gain some trust or we get tired of looking at your patches and we give you svn commit access all communications done through either private email or through the mailing list but valard has been a guest speaker on our weekly xcat architecture calls we have two-hour calls every thursday to discuss xcat roadmap and and and ibm's objectives and customer requirements and so on so we we you know the xcat's roadmap is uh aggressively steered towards the uh input that we collect adaptive computing has participated on some of those calls when it's like you know what do we have to do better and and in in terms of xcat for cloud and and things like that so so sometimes there is that face-to-face or uh conference calls where we we get together and discuss these things so one question i like to ask a lot of other open source projects is uh you know what version control do you use and why and i heard you say subversion so do you have any particular reason for using subversion? sourceforge i don't know yeah it's just it's just you know i said the sourceforge project one of our senior developers jared uh checked in the code and we we've used svn ever since i don't think there was a a relate debate or a strong preference um by many svn seems to be a fairly uh common standard yeah i know it's umavi in our in our private code we'll use github i just don't think we even consider github back then i didn't know that we knew much about it i don't think i don't think it was considered either i we just used svn it wasn't really thought about yeah so what's coming for the future for xcat we got a lot of cool stuff coming up i mean there's going to be more i from our perspective we want to add more hardware support for different vendors i'd like to see uh i'd like i'd like to do more partnerships with people from from different companies in software companies to do more integration um specifically there's been a lot that we've been doing around vmware right now um you know we just added the the sx i 4.1 kickstart support which is pretty cool and um i think uh i'd like to do a little bit more with windows just because i i've had quite a few customers asking me for that um it'd be nice to have some more one of the things that we recently added was an image export ability so that people could share images and that we could you know not have a store but just have a repository that more people could just grab standard vanilla images on download them and even make xcat that much easier to use so um those are some of mine i don't know you can maybe you have some that you can add or you maybe look on the wishlist uh the the image the image capability that valard's working on is is um very nice and i think very critical for the future and to be clear it's not an image as much as it's a complete definition uh because it could be stateless or stateful or or various different provisioning methods and so it doesn't have to be a pre-installed uh it's not it could just be a it could just be a kickstart file with a couple of post install scripts that are in a tarball that you you know you export them and you could import it in somewhere else yeah so that's that's a very nice uh a very nice feature um xcat has several road maps uh a a an obvious roadmap would be you know support the latest from an ibm perspective support the latest ibm hardware as it comes out um support the latest os updates as they become available and so that's that's always on our internal roadmap is uh we know when we're going to come out with new product and we and we know when os's are going to release from vendors and we need to make sure that we have that support in place uh at the time of release or we already have we already have red hat 6 support in there which is nice yeah so so that's that's a given um you know there's a lot of requests for cloud-like solutions and and automating xcat and uh we are learning a tremendous amount about the type of work that we have to do uh air reporting and robustness and scale that is necessary to work in these cloud environments and that's that's constantly uh evolving uh uh xcat uh you know and and we're learning that not everybody who tries to control xcat does it in the same fashion um it's it's better to make single large requests to xcat let it figure out how to do the scalability then send in lots of individual ones and and xcat's not unique in that fashion uh uh you know databases and other uh uh services out there make a lot of little requests it's a lot more difficult than than fewer large ones and so we're working on our own queuing system and way to aggregate uh similar things to to make xcat more efficient and to to alleviate the pressure that can sometimes be put on xcat um in a in a in an automated environment these automated environments it's like having a thousand administrators constantly doing things and and 99 percent of it's just querying of information and uh so we you know we're working towards uh making that a little bit more robust but if you really want to know what's in xcat's future if you want to know what the most exciting thing is ask yourselves or go out and start asking people uh we we are driven by uh requests uh specifically you know the the customers that we work with uh they define our roadmap uh more than any other and uh things that we had never planned on putting in xcat uh were there and just in the last year esxi support and this new statewide provisioning system that valard created uh were two huge efforts to uh get into xcat that were driven strictly by uh customer requests and and customer demand and so i i can i can see about six months into the future because i know what i'm working on but i can't tell you what's going to be an xcat next year because i need people like yourselves to tell me what you want next cat and that's what we're going to use for our direction so val what was the idea that required you to start up sumavi and why what you're doing there is different than what's ibm was doing with xcat well i i was actually very happy at ibm i i had a great time there i you know i learned a lot and stuff but i just wanted to do more with xcat than i could do there at ibm i wanted to be able to enhance the code and i i wasn't being funded to work on the code and i wanted to do uh get a team of people that could be more focused on the requirements that i was seeing and so we started sumavi with the intention of you know let's let's enhance xcat let's let's make a commercial version of it because there was a lot of user unfriendliness to it um you know once somebody understood it then it was very easy but we wanted to go after more you know enterprise and and uh people that might have been more on the windows side where they they weren't so used to the heavy command line use so with sumavi with xcat we've we've really focused on developing a very easy to use web interface and we've enhanced um some of the plugin make it so that they're a little bit more uh reliable on some of the data that we're getting back so we actually um didn't we've contributed back to the xcode cat source as a as part of that because some of the things like we don't want this to fork from xcat so it's been a very uh very good experience and i i think it's been uh helpful to um you know people who aren't so good with linux and just want to know how to just want to get their system running okay well thank you very much both egan and val and this show will be up soon thank you thanks for your time thanks guys yeah thanks a lot for having us no problem thanks