 Welcome to another edition of RCE. We are back with another episode with a post SC hangover finally wearing off and Going into holidays and already had some work holiday parties. I have again Jeff Squire from Cisco Systems and one of the authors of open MPI Jeff. Thanks again for your time Wow life is tough for you over there in Michigan already having a holiday parties Yeah, it's terrible. It's terrible terrible. That's right. It is good It's like you said it's the post SC detox, you know all the crushing deadlines of Supercuting or past and we're heading right into the end of year round up here So if you're do a couple more recordings and then well, I think we're gonna the plan is to release one and the next one I'll be next year, right? Yep. Yeah, this will be the last recording for 2012 and we already have lined up recordings for 2013 So let's roll into our guest here Our guest this time is Stuart Martin from University of Chicago. He is one of the people working on the Globus toolkit There's been a lot of buzz around Globus and implementation and stuff recently So I'd like to get into the details of what Globus is a Stuart. Why don't you take a moment to introduce yourself? Hi Jeff. Okay, everyone. Yeah, Stuart Martin. I've been working on the Globus toolkit project for about 15 years I've been employed from Argonne and University of Chicago at present And started with the toolkit in the early days when at first first few lines of code were in So what is the toolkit seems like there's a lot of pieces to it? Yeah The toolkit comes as right a number of components that work together There's the toolkit really is Middleware software plumbing, you know for high performance file transfer across organization security remote job management So those are kind of the main parts to it remote log in as well and and middleware as I said for for it's really designed for developers in communities like energy physics like though Earth systems grid Open science grid basically to integrate into their custom and end solutions to use the toolkit for that So I actually learned about a lot of the pieces of Globus under different names Could you go cover some of these names at the big pieces? Right so so for high performance file transfer there's grid FTP and then For security for the cross-organization security, there's a global security infrastructure GSI My proxy is a service to provide proxies for users and services And then there's open SSH SSH using GSI for for secure login And then for remote job management, there's grand grid resource location Management that's what that stands for. So those are yeah, those are the main pieces in the toolkit names So what's the the history of Globus you kind of implied in your intro there that it's been around for a while You said you've even wrote some of the first lines of it long ago. What how did this all start and where's it going? so Yeah, so maybe it's just start back in 1995 or so like a supercomputing conference There was this iWay project and it was basically like a one-week experiment at supercomputing to build the national scale grid with major computing facilities With their high-speed networks virtual reality cave displays, you know with approximately 60 applications as demonstration So basically was you know an event that galvanized what became the grid computing path really with Ian Foster Steve Tiki Warren Smith Johnson Johnson Geisler Basically, you know, you can almost say it was Globus toolkit negative one version And then in 1996 we got a DARPA grant to actually fund the Globus toolkit And that got us kind of started And then over the years we've gotten funding from National Science Foundation Department of Energy NASA NIH we had even private companies IBM Sun and tell fund, you know parts of it at times And so I don't know in the late 1990s or early 2000 You know we're GT really got traction with the high energy physics community and LIGO, you know as a fundamental You know part used in their Custom solutions that is integrated into their custom solutions so I know maybe over the last five years It's really transitioned into a stable production service for Tens of thousands of users, you know per day installed on thousands of servers worldwide I mean another thing in our use of statistics we see you know over a hundred over over one petabyte per day moved using grid FTP And we see several hundred thousand jobs per day using a gram for job submission So you used the word grid in there a lot and grids one of those buzzwords has been around for a long time Could you expand in the Globus world exactly what grid means? Right grid is is a term that's used for you know an organization building a a Out an infrastructure that you know will serve their scientific community researchers To do things easily across and a set of computers. So you've got grids in the US for like open science grid and exceed exceeds maybe a set of 10 to 12 sites or in machines Open science grid. I think is more on the order of likes. I'm not sure but 20 and maybe maybe a hundred resources and machines. So it's kind of like Assembly of smaller communities universities throughout the United States Largely And then Europe has grids and so what the grid is is making it right easy for scientists that would maybe normally use just one machine To be able to harness maybe a sets sets of machines and be more opportunistic To use a variety of computers where otherwise they would have to implement some of the remote interactions on their own and grids provide that Standard remote interaction to then use those remote computers more easily Okay, so you've given a few illusions there of who uses these grids So you're talking about scientists, but you've also talked about Developers as well particularly with the toolkit aspect of it So could you tell me let's focus on the toolkit part of this here? Who is the toolkit for who uses it and how is it used? Right, so So the the toolkit, you know who uses the toolkit is You know thousands of installations worldwide and these these are like grids as we've said so, you know So Europe in Europe. There's a comparable institution like price which has a set of Super computing resources typically or University compute cluster resources Assembled together to make it easier for scientists to do things the scientists would do all disciplines from high-energy physics to Chemistry computations genome searching You know all disciplines really it's it's it's not targeting one So so I guess the toolkit and grids maybe are tend to be in for infrastructure to enable a variety of then In user custom solutions for the different disciplines Okay, so let's let's talk about some of the individual parts and what's unique about them So let's let's start with grid FTP because I think that's the one that's probably the best well-known What exactly is grid FTP and why should someone use it over traditional file transfer methods? right so So good FTP is high-performance file transfer service. I mean it's not just a simple upload download manager You know, and it's it's not a tool where there's always a human in the loop You know, it might it might it's something where you can write delegate credentials and then Off-line a service can do things on your behalf And so it has elements of you know reliability and security In large-scale environments You know basically And great it itself is is a set of enhancements on top of FTP So some of these enhancements for grid FTP are Well, they ask security extensions to FTP And then also parallelism pipeline File offset markers for restart ability Now when you say pipelining and parallelism, can you go into a little bit about what you mean there? Are you talking parallelism across multiple network links for example? Right, so parallelism is the ability to do multiple TCP streams to a single process So and then that that by doing it that way you can you know send data much faster So there was a thing called a striped grid FTP server. What exactly is that? right, so so striping with grid FTP is is a way to send Basically, I would boil it down to one file extremely fast over the over the network, so Striping is that Basically transfer a portion of a single file at the same time using multiple grid FTP processes And then and then transfer a single file to it to a single file But over the network you're really breaking up that file into multiple Processes and TCP streams to them and bring it back to a single file Okay, so with the striped servers when I had done it I had actually used multiple machines and an exceed site Is that a common setup? Right. I'm good point. So Yes, many do use that method And I'm right can achieve really high bandwidth throughput using that method Okay, so how does authentication get handled in all this you said that it might not necessarily involve human intervention What how is authentication handled in something? right, so one why one of the components their GSI security infrastructure in the toolkit allows for The delegated credentials delegation is a key Thing identified and that would be needed with these types of forces and grids so that way a user can come on and and They don't have to be present all the time and then services can do things on their behalf. So So right so then you've delegated you can delegate a credential to a service Using X 509 and security Standards and my proxy and Clovis and GSI implement these standards to implement delegation So how is this different than say, you know just plain vanilla passwords or SSH keys or or a technology like that it's similar in some ways and that with SSH you need to you know put a key on the remote machine and with Credentials and GSI You do need to you get a you get a unique ID DN With your credential and then that needs to be put in a grid map file, but there's also more complex ways that you can Set up authorization and like see a log on and other Technologies are coming on where a your access at your campus Can be used to then access remote machine on like a group Authorization so it's much easier than yet access at a site that if a site wants to enable a set of researchers from University of Chicago Or a set of researchers from University of Michigan that can be done at a at a larger scale and is more Resilient to changes for various personnel that the authorization can be handled at the institution I see and so GSI is the part of the global toolkit that allows this to happen in kind of a uniform Standardized way across multiple different organizations. That's kind of the key here, right? Right and in GSI, you know again what the toolkit provides is a little bit of plumbing that then my proxy kind of implements as a On top of it as a as a service Okay, so there's a thing called Graham and distributed resource management and such. Can you go into what that is and how it works? right so One thing that Scientists, you know do is they can log in and use a remote to computer You know on that computer. There's usually some type of a batch scheduler some something to You know allow all scientists to submit their jobs that they would like done and to schedule all of them across these large supercomputers And what Graham does is basically provide a remote interface to that service to that local service So it it will allow you to submit jobs from remote and cancel jobs and and do that securely Okay, so does this sit on top of the local sites the local resource providers resource manager and such so You still use PBS or LSF or something like that Yeah, one talks to that Right one key distinction. We did early on was we would not require this to like be something that a local site would have to Change their local system to adopt that Graham works on top of The local system so that you can add it if you'd like local users can still use their local access But then remote this enables a new capability to to then interface with that Machine and system from remote so they can do these remote jobs So when you say that Graham sits on top, what does it do? Do you actually interact at an API level with the underlying schedules or just basically invokes CLI commands on behalf of Remote users or how does that work? Right, so there's kind of like a layer and Graham that is for interacting with their machine with the remote with the like super computer and that batch interface and Typically, it's done through right the command line interfaces that are enabled by or allowed offered in In the remote batch scheduler So like you choose Q sub for PBS and Q stat these these commands are typically what Graham uses Additionally, it could use APIs and things but that's that typically hasn't been done It seems to be easier to implement things with the command line tools Okay, and then how well how much do you expose the local policy to remote users because every Site has a different religion about how things are scheduled and where you're allowed to run and how long and you know We actually had a previous show Talking about three entirely different models from three different institutions about how they run their HPC resources How do you expose that to remote users but preserve some degree of commonality? You know to preserve this ease of use that you're you're clearly going for Right I mean, I think the most common way is that this cues you can set up remote You can set up cues in most all batch schedulers Did I know and that's one way they can they can provide a some some similar submission, but yet some control and some secondation if they want the different jobs and Graham can interface to those cues as well to Allow users to specify those cues and and then submit to those cues Okay, so Graham submitting these jobs to all these different remote things you're tying together all these different remote resources A user may have access to Does the user actually have to have a local account on every machine or does Graham run the mas like a globus user? Yes, you do need a local account that they at the remote machine And you need authors to be authorized and accepted just like you would any other user And then this just enables a new capability for you if instead of logging in locally you could then Do remote and you could maybe take advantage of other grid client tools that are more high end For for your your needs So when I let's say I'm a remote scientist and I want to submit a job to your organization through, you know various globus kinds of commands How does my executable get to your Organization and any support input files and the now put files and things like that Does this all kind of happen under the covers via grid FTP and delegation and things or do I have to kind of pre-stage everything? You just do different things They Some will stage Them those through grid FTP Maybe some files are already staged Maybe if they're very large files of stage them by hand, you know Individually and and then maybe as a process stage some files through again grid FTP and then submit their job and then stage some In addition through Graham it provides the also staging capability before and after job So you can do it in a variety of ways and users do it differently Depending on their needs Okay, so Graham does a lot of things in which cases should Graham be used in which cases should Graham not be used Um Good question I'm Graham, you know what some will use Graham to submit Everyone of their jobs in some cases maybe where you get Short-running jobs then some users will And you have many running jobs, you know, some users will use Graham as more of a pilot job and basically You know what that is something like maybe condor g which is kind of a high high level client to manage a personal queue of jobs That then would submit a one job through Graham to start up kind of a remote condor Demon and then condor would talk directly to the demon. So it's a way to kind of bootstrap maybe your own chosen You know grid client method for submitting jobs Down into the super computing cluster Okay, let me switch direction here a little bit. I'm an open source developer kind of guy and I know that you have quite the community around all this development effort too Can you tell us a little bit about how your community is organized? What's your role? How do you guys make decisions things like that? Yeah, so so right we're open source and have been from the beginning So we've interacted with various communities that kind of have come and go And and we've done things. I mean, I guess we've changed over the years, you know, we We haven't always been agile scrum. That's kind of come along more recently, but So we maintain open mailing us so in Globalist toolkit user email list globalist toolkit developer email list Commit list so you can see our commit commits if you'd like to and you can plug in wherever you'd like to get Information so that's one way of distributing information between users and you know hundreds of users Subscribe to these lists So we communicate that way and then we maintain our Log of tasks in JIRA and kind of like choose a set of tasks to do each sprint we kind of run two-week sprints so You know with that you You know have a meeting Every two weeks kind of review what was done Set the next priorities You know choose the next set of tasks to do and put those into Sprint buckets really of two weeks and so that's out there to be seen as well in JIRA that the toolkit project is open there and Anybody would like to take a look and take a look at what we're doing on a on a daily weekly basis So now you say you do, you know an agile sprint kind of development style Do you do you do this every two weeks? Do you have a sprint every two weeks or is it more a methodology of you're just making? two-week Bites of work so to speak that gets distributed around the community Right, so we do we do to pretty religiously two weeks two weeks sprints and come together It's kind of a time to you know have that meeting to review what was done and and what's next and Then certainly things can run, you know tasks or efforts of work that run past two weeks You know you you just break it up into smaller chunks and you know get the design done and then get the Implementation phase one done and another sprint and you know just kind of pound through all the work in order to You know complete complete the job and you know and more of a Larger long-term scale as some things take a longer turn longer time to implement Do you get a lot of commits from outside the University of Chicago space like how would spread around the country are you guys? Well, we work pretty closely with the open science grid community and their developers and their and their team that the Open science grid specific solution and they and they use they base Their solution and part on Globus toolkit so they maintain many patches and communicate some of those you know patches then back to us and So that's very helpful and in making the toolkit than a better product and more reliable And that's kind of how open source works, right? So So that's a that's a healthy relationship there and also with exceed is similar Where they base some of their services and and capabilities on the Clovis toolkit So so we communicate issues and patches with them as well And then in Europe, there's the initiative for Globus in Europe. That was a recently funded effort to then Bring Globus to the European community a more local group in Europe that Support their users needs, you know and answer questions, maybe more quickly or you know work more closely with their community there And really we we do, you know work with grids throughout the world in Canada Australia New Zealand Europe Praise You know us So I don't know they're really they really are grids throughout the world that are based on the toolkit So here's a question. I like to ask a other development communities just for the heck of it in my own genuine curiosity What version control system do you guys use and why? Right, so we've used a CVS You know and maybe there's some newer ones out there, but we've used CVS Man, we're kind of like do some tools around it to do everything we need It you know it provides us with branching and merging and Pretty much, you know, I guess also maybe that we've had some helpers that have been on the project for you know 10 years and plus years So You know what don't you don't need to fix you don't need to fix what it isn't broken, right? Let's let's move on to the future here. What is the coming future plans for Globus says very mature now It's been around for a while. What are the plans? Yeah, I mean I don't know if we have any Huge plans for the toolkit in that it's mature. It's it will evolve and well, you know And evolve for as a community needs. So I mean, I guess one one Also way to answer that is that, you know, there's other effort we're doing with Globus online Which uses the Globus toolkit and provides an end user capability for a high-performance file transfer You know that The toolkit and that's a recent kind of effort, you know a couple years We've been going Globus online and these are the types of things that the tool can enables You know for these type of custom high-end end user Solutions that we would anticipate, you know things, you know new things like this coming up So the coming of the toolkit maybe changes little bits here and there and add some features for users But I don't think there's anything To her shattering, you know that that is planned at this point for the toolkit Okay, Stuart. Well, thank you very much for your time. Where can we find more information about Globus toolkit? Yeah, well, we have a website Globus.org And then close that or slash toolkit to find out more about the toolkit and then All Globus.org you'll you'll find Globus online Which you know also leverages the toolkit and you can find out information about that for user File transfer. Thank you very much Thanks to her. All right. Well, thank you guys