 Welcome to another edition of RCE. Again, this is Brock Palin. You can find us online at rce-cast.com And once again, I have Jeff Squire, some Cisco systems and also one of the authors of OpenMPI. Jeff, thanks again for your time Hey Brock, here we are in December The traditional downtime for HPC since we're all recovering from Thanksgiving and supercomputing and whatnot Yeah, the follow-up to SC and all the Holidays and family stuff starts coming in and things just kind of slow down a lot Slow down. So, you know, there may or may not be another RCE cast at the end of this month. We'll see. We'll see what happens We can start it something new here in December. Didn't you? Yeah, actually, I started it at SC is something I said I was gonna do and I quickly just threw it up But you'll find on the RCE website a link not only to Jeff's blog, but also Brock now has a blog And Brock talks about himself in the third person on this blog too. Yeah, yeah, okay All right, so let's just jump right into this then Brock I know I believe you've known our guest here for a couple years now, right? Yeah, yeah So this podcast is more going to be a follow-up because of previous podcast we had which was RCE-37 about TerraGrid TerraGrid recently Went away and is now followed on by something called exceed and there's been some changes So I have on someone I have worked with through TerraGrid and now exceed for a few years now I have Phil Blood from the Pittsburgh Supercomputing Center. So Phil, why don't you take a moment to introduce yourself? Hi, Brock and Jeff. Thanks for having me on your program so and Also have listened to your podcast and appreciate the work you're doing here. So I'm excited to be here. So Yeah, so I work at the Pittsburgh Supercomputing Center and I have for the last coming up on five years now my background is in Biomolec, their simulation My PhD was at the University of Utah with Greg Voth who's now at the University of Chicago and And So I looked at really large-scale Biomolecular assemblies, membranes and proteins and things and and basically was a user of these big national systems and and Enjoyed working in that realm so much, you know as much as I did working in the science area. So So eventually that so I so I joined up with the the PSC and now I work as a scientific consultant and researcher with different research groups across the country trying to help them use the facilities at these national resources to The greatest extent possible to maximize the return on on their time and investment and research. So I also have become part of you know beyond PSC and also Exceed as Brock mentioned and working in various areas there in outreach and education And and also in in some training and also through extended Collaborative support with researchers in that From from an exceed standpoint also. So so that's basically that's a short Intro to what I do So that's interesting that you were working on your PhD was looking for resources found these resources and you know The organization never really lets you go. The interesting thing about that is that's kind of how I got started in this field, too I had an opportunity when I was in school Went into it found I liked it and never actually left. So Happen to me with MPI actually. Oh really my first MPI forum meeting in 94 and Wow, that was a long time ago My advisor asking me when I started, you know, are you more interested in science or computation and at that point I you know as I said computation and and he happened to have a grid grid He ever he said you ever heard of something called grid computing and there was a project there to To to do some work in the group related to grid computing is that it was sort of the idea of the folding at home and and Boink That that we were looking at at the time But yeah, I got really excited about these things and and I knew I wanted whatever I ended up doing in science Which I also found very interesting. I knew I wanted to do it through computing and so that's that's that's what drew me into To at the time it was Terry great as you said and I and I I looked at that and and said wow That looks really cool and I never would have imagined the time. I would have ended up being actually, you know Working in in being a part of Terry grid and then you know continuing on to To the next generation of this national Cyber infrastructure, so so it's something I enjoy doing a lot So so you used a key word there cyber infrastructure from a researcher's point of view and from being a support person at a university that consumed Terrigrid resources. What exactly happened to terrigrid? It seemed like it was it was infrastructure. Why did it go away? well, it all has to do with NSF funding cycles and terrigrid was Funded in several cycles in the previous decade I was just reviewing the the history of that and and it started out as there was a solicitation for Distributed what the solicitation described as a distributed terror scale facility and initial award was made to San Diego and And Chicago and Caltech and I'm probably forgetting one and NCSA To to create this distributed terror scale facility and they called it their proposal They called it terrigrid and so that's that's how terrigrid was born and and then the funding for terrigrid was extended into several terrigrid phase two and The Pittsburgh super computer super computing center actually joined at that point as as a part of the extended terror scale facility As was this what the solicitation called it and they put down a big resource at that time. I think was Lemieux and And and and that was and then there was additional funding that that funded terrigrid through 2010 or so and then then The the NSF put out another solicitation. They call this one xxd which did for extreme digital is Was what they're looking for and taking it to this next generation of advanced what they call advanced digital resources And sort of moving beyond just thinking of it as you know, these these big universities Or big centers with with big iron into a more inclusive sort of digital solution for advancing science and so that's why I called it extreme digital and the the winning proposal of For in response to that solicitation was called exceed which stands for if I get this rival I'll be impressed extreme science and engineering discovery environment So so that's so that's sort of how that evolved in a nutshell Okay, so what's different about exceed versus terrigrid? So you talked a little bit about the evolution here, but you know concretely what what does that mean? What was different between these programs? well, so the first the first answer to that I guess is that We tried it. We're trying to keep everything that was good about terrigrid and leave alone And or at least to I guess the mantra is do no harm So so a lot of the things there are actually You know very similar One of the best things that was so so you know the the partners A lot of the partners are similar. Although there's been some some shifting around The resources right now in terms of the main computational resources Are are quite similar basically the same that we had as Terrigrid ended And So so there's a lot of there's a lot of things that are similar and we can talk more about You know some of the things that's similar and different, but that that main difference. I think that we're Moving towards an exceed is making things more Inclusive of all the different ways that people compute So in so one way to illustrate that is in in terrigrid We had what we called resource providers and the focus was on sort of the resource and what resource do you have to contribute? So and that would basically be a machine right that they're contributing cycles and of course there were people there contributing expertise to But now in in exceed we have the idea of a service provider and the service provider may not even have a machine to contribute if you look on on the website you can see the list of all the people are service providers and One of them is actually you like super computing center in in Germany and they're not certainly providing a machine to Terrigrid But they're provide or to exceed so you're gonna catch me on that, but they are providing services expertise that We are providing to users No, what does that mean? Does that mean I can call them up for help or they help write code or what? Does it mean to provide expertise? So that's that's a good question so So you won't call them up or a right to them directly necessarily We have a set so like we did in Terrigrid we have centralized support mechanisms and One of the things that is also the same in in exceed as was in Terrigrid Although we're we're doing there are some differences that we can talk about is one of the best things people noted about Terrigrid was in-depth support from from people who are who are familiar with their science domain, but also familiar with the computing and We call this advanced support in in Terrigrid and Now we call this extended collaborative support in exceed just to it better to it's really in Whether or not it's a this type of support is is really defined by how much you know What length of time is it going to take to to work on a particular project? So so yeah, so we can bring in experts from Wherever they are so we could have someone from I think if I don't I'm not sure if there's any international limitations here But there is there are there is a facility in exceed to call upon people who are not necessarily paid by exceed to to come in and provide expertise And but we do this through a centralized mechanism And so you know you would you would you would apply for this kind of support through a central mechanism and then The managers and and and people within exceed will look for the people with the best background to help you with a particular problem and and exceed is serving as a fabric to Bring together all these different resources both in people and in in compute resources to To address problems and help people solve problems in science So now what do you get? You know what does you look get for example to be a service provider like that? Do they get allocations on other people's machines? Is it kind of a quid pro quo kind of thing or how what what is the you know the The return on investment for providing resources like that So so now you're stepping a little bit outside of my direct knowledge I'm not sure exactly in terms of those agreements and and and this is this is probably a good time to point out also that I'm exceed is is very much right now a work in progress and a lot of these things are being defined I've heard some discussion of you know What is it? Well as we start to step into these different realms of service providers for example? What you know, how do you define exactly? You know that those relationships? and They've talked about you know, there's some just internal discussion going on of how to how to How to establish, you know a Structure, you know within exceed for all these different service providers So so these questions are not necessarily all answer this time although The so the terms of these specific agreements with so there's there's a core set of providers. There's a core set of of of Centers that have PIs on this grant with exceed and and the lion's share of the actual funding Goes there, but as I said, there is there is funding available to reach outside of the key exceed partners And so a service provider might have You know, there might be people there that you know are are being paid through exceed through some mechanism And and so so we have the means of reaching out with the funding available to exceed to pay people at other service providers that come in or even they might not even be a service provider and You know just an individuals at institutions, but maybe it's made by virtue of being paid now I'm speculating but Someone might become a server provider because they have some expertise that we need to leverage and exceed that that you know The sum of the funding could go towards so so I'm not sure of the details of that and I think some of those details may still be being worked out, but But that is that is one potential return on investment that that that that is that can that can be Given to someone who'd be interested in being a service provider So you said you wanted to keep a lot of the good so for I actually tell a lot of our users who were terror grid users who are now exceed that if you were Just using Resources that were there before just approach it as though nothing changed Am I am I okay telling them that? right, yeah, so I So so so that is that is our intent That that people should be able to use it as if nothing has changed now there are some of the interfaces are evolving so this is some of the specifics of the interfaces that there's a portal that You know there's a that there was there used to be a portal that was portal that tearger org right tearger portal Which is now portal that exceed that word So there's some of these interface changes that are that are happening, but the basic functionality should Remain and we shouldn't by changing something we shouldn't make it more difficult for you to do what you were doing before if that does happen then you should yell and There's there's actually a couple of ways to yell you can Email help it exceed that word if something's broken. So that's sort of you know something broken or not working right Help it exceed that word is is the place where you really anything you can send there and it'll get right to the right place If if you find that a change was made that just makes it harder your life harder is not as good You have some feedback that you want to give it's not necessarily broken, but but you have maybe have some suggestions for making it better there's another Mechanisms that's been set up as part of exceed which we didn't have interrogate which is feedback at exceed.org and that actually Brings us right into one of the the main one of the main differences between exceed and tearger it is that in exceed there is a formal mechanism for for taking in requirements from users and cranking that through a process and generating ultimately the architecture that we put down and So so so exceed is meant to be flexible and responsive to user requirements and changing user requirements And there's a formal process for doing that in exceed So why don't we step for a moment and for those of who haven't listened to the tearger podcast What exactly are the kind of resources? What's the primary resources someone may be interested in when? Approaching exceed and that exceed can provide and what some of the lesser known resources that a exceed can provide so We have a number of different so I guess you can divide the resources up into Know your your main compute resources Your data resources visualization resources and then and then sort of special resources and Those may fall into the compute category, but they're you know, they're special for some reason So in terms of compute resources, we have kind of two types. We have the We have the high-end very high-scale very tightly coupled. You know what you call MPP or you know massively parallel systems and those right now are the primary ones are at Tech at Texas and at Nick so so Kraken has a hundred thousand cores as a cray XT five and At Nick's and then at tack. There's a sixty five thousand core Sun constellation cluster and So those are the two big high-end massively parallel systems on Exceed right now and then you have a number of smaller Systems that you might that are usually you know in a fit of band connected So they still have very good scaling performance But not necessarily to tens of thousands of cores like on the other two systems But more of the thousand you know or less type System so there's a number of those spread out at different sites. There's another one at called Lone Star Texas and And then we we have also and and there's a new resource Well, I know that so there's a resource called trestles at San Diego which is actually special because It's it's meant to be high availability. So there's some some ways to cut in and and Get jobs done that need to be done urgently It's not allocated at a hundred percent so it's so that they can accommodate those kind of things And then we have shared memory systems and the biggest one That we have on TerraGrid and actually it's the biggest share memory system in the world right now is black light It's an all takes you be system With 4,000 cores and and two segments of 16 terabytes of shared memory So so we have a shared memory systems and and then we do have Data allocations that you can get on exceed both for archival and for sort of a wide-area file system That that would be usable from different sides across exceed we have a GPU cluster NCSA called forge we have data-intensive machines that are coming on board at San Diego Called Gordon and there's a precursor to it called dash So so there's a lot of a lot of variety in the system then there's life So there's some visualization systems at at Texas and also at Nick's So so quite quite a large variety of systems I'm going to detail any one of those that that you want me to talk about So what about resources in terms of people you mentioned how there's some resource providers that provide things besides Actual compute hardware what if I'm trying to scale up to one of these massive things like I've got some really neat science But I just don't have the computational expertise to take it up to the scale that would be required right so so you can so so we have an allocation system and Within the same allocation system where you would so you go online and you request the physical route the computational resources or data resources In that same system you can request extended Support for for your application if you if you know this is not a basic problem Or even if you're wondering if if you might need support, you know extended support You go into the allocation system and right now There's a series of five questions and that describe your need where you describe your need for advanced support or extended support and That goes to to the exceed management. They look at it. It's also peer reviewed. So reviewers give a recommendation on it And we come back and talk to you about your needs and we try to match up people on exceed who have expertise that can help you solve your problem Back on Terrigade. We already had something called advanced user support What exactly is extended user support and exceed and how is it different in it in exceed? We've been we've kept the good things that we had in Terrigade The the the basic model in in Terrigade was you would work that the advanced support was a Individual research team would Would you know come and ask for help on a specific problem as we talked about and then we'd find the right people to help them with that With that problem and and you could work with a person for up to a year at a time I was certain percentage of their time would be dedicated to to working with this group So you can do some substantial work and so that we called advanced support for research teams and we also had Advanced support for for gateways. So people want to create a portal to to use exceed resource or Terrigade resources in the background Or sorry on the back end So you have a web interface that helps scientists get their work done. We still have that in in exceed One of the the the new things that we have an exceed is we've expanded the idea of helping Communities to we have as advanced support or extended support for community codes So if there's community codes that a lot people can benefit from we have a team that's dedicated to identifying those and and helping to Harden and and help develop codes that a broad range of researchers can can can benefit from and and then the other thing that's New is that we have a 25% of that of the advanced support time or extended support Funding and exceed is dedicated just to seeking out Areas or researchers and and science domains that haven't traditionally come to use these national These national resources, but that but now have a need for it. So we're reaching out to people in economics in humanities That we've there's a big data explosion in in genomics and bioinformatics That that maybe you know, we can help with now and so we have a lot of Resources now just to reach out and build communities that have not traditionally been built around The this national Cyber infrastructure community and help them to learn how to and and help them or allow them to teach us You know what what they need in order to be to advance their research using these systems So what about in the future? So Tara grid kind of added resources on over time and older resources fell off the back of the truck and Exceed took over allocating these resources Why what's coming in the future so there was a There was a new system announced recently that was won by the Texas Advanced Computing Center attack and This new system is going to call a stampede and I don't know a lot of details about it off the top of my head, but it's going to be big. It's going to use Intel's new processor the many integrated cores Mick architecture and So so that's coming I understand that NSF will also be Putting out a solicitation for another big resource and so that is that is in the works and Those are the two really big solicitations That I'm aware of that are one that was just one and is going to be in the works and the other that is Going to be coming the next year or so. I believe the solicitation so so stampede attack should be coming online in 2013 and Then there will be another solicitation. I guess that one will come on, you know for 2014 or through to 2015 So you talked a little bit about systems here, what about the other side of it that's becoming a big deal these days Storage, how is storage managed on Exceed? Is that all managed locally at each? Computational site like tack has their own storage and if you want to run on tech resources You got to get your data down to their storage and then run and things like that or what is the model? So, yes, the the default model is exactly what you said that when you get a compute allocation on one of these systems Then you have their local storage at your disposal and usually that's you know some big parallel file system where you actually do all the the data-intensive stuff coming off your computation and They have usually some archival resource although not every not every site necessarily has an archival resource that that varies a little bit But there are also some resources that extend beyond the boundaries of the individual service providers and Currently the one that is allocatable is a file a wide area luster file system called Albedo and That's not necessarily exceed wide right now. This is this is something sort of in the works, but But it is available and if if sites, you know have interest in in mounting this then they can mount it and Users can use this file system across different exceed resources It's a multi-organization Parallel global file system using all kinds of buzzwords there right right It's a parallel it's not not so a global because it's not all the way across Exceed, but it is it is wide area across organizations And there is a global is the intent to someday be global or is it still just you know a decision per site? I think the intent is that So so right now the intent is there's two different kinds of file systems that would go across boundaries One would be this this relatively high performance Luster file system that would that would span different sites I think the intent is that eventually you would have one that would that would Cross most of all or most of the main you know resource providing sites and then there's also a Something that's in the work right now that has to do with another thing that's different about exceed which is a push to bridge to local campus resources and that is a what we call the global federated file system and That is something that anyone even people outside of exceed you know any campus could deploy on their site And be able to integrate their file systems with exceed File systems and that would not be as high performance as as you know the luster solution It wouldn't be intended for high performance, but it would be but it would be truly global That's fascinating. I see you know, I think we could talk about that for an hour So let me let me not go off in that direction even though I really want to because I think there's some really fascinating things to talk about there So let me ask a slightly different question. How do researchers move their data on and off Exceed storage is it this federated thing? Is that what people typically do today? Or are there some other standardized mechanisms to get my data to random exceed organizations local storage Okay, so so the standard mechanisms are very similar to what existed in Terrigrid the thing I just said the global federated file system That's one of those things that is in the works and exceed okay, so there's actually a Pilot sites right now that that are in the process of being identified That are going to try that out and and we hope to make that standard You know in the future then that would be one way to get your files on exceed when that when that is put into production The the the basic mechanisms are So in general You would you would do very basic things, you know to get it on to exceed that a lot of researchers find you know scp is gonna work great and So so that's one way to do it There is a new data transfer service. That's coming. That's the that's in being integrated into exceed called Globus Online and That is a more sort of a gooey mechanism and also has some nice trans a lot of nice characteristics for managing data transfers and Optimizing data transfers that you can use from your laptop in it with a gooey interface and Globus online folks actually handle the data transfer so So you can establish different endpoints and and exceed has endpoints and Terrigrid has endpoints well a Terrigrid had endpoints and exceed now as endpoints and you and You know other campuses have endpoints and and you can establish endpoints at your campus and on your laptop And you know wherever you want and use that to transfer data. So that's another mechanism and and And there's also something called grid FTP that if if you're really into you know High-performance transfers Globus online uses grid FTP under the hood or you can use it explicitly On your on your campus. So those those are a few ways and there's probably others that I could go into Yeah, the Globus online stuffs need we actually set up you Mitch endpoints for our local resources here And so it makes actually moving back and forth between exceed and Our local resources really easy and actually I'm trying to get those guys on this show So if you know anybody over there, please let me know sure yeah We can we can do that and that and that integration is ongoing So it'll become even easier in the future to use Globus online with exceed That's that's you know currently in the process of being fully integrated But it's pretty easy even now to use it. I've used it and and we recently used it We set up an unaccess point for some researchers who are doing some genomics application they wanted to us to host some data for them and and We we set up a Globus online point where researchers could could pull down data from exceed so so kind of like it It was a special instance, but it's something you can do I mean if you ask for things then we try to do our best to get it to you I mean that we try to be flexible and help the community with what they need so we created a a Just we brought we had we put aside some storage for this and we and we created a special server We put it a Globus online endpoint and FTP You know endpoint and researchers could get at the data different ways So so we are we are trying to do things to host data also and distribute it to two communities that that need it They can take advantage of it So keeping with this data line does exceed have any facilities for dealing with HIPAA or privacy sensitive data No is a short answer So I think the the official policy is that researchers should know their the the this the security restrictions of their data and they're responsible for it and If they have restrictions, then they should make sure that that what we offer and exceed Is compliant with with what they with with what they've agreed to do in terms of keeping their data secure So so pretty much the onus is now now we certainly take we certainly have security mechanisms in place and exceed in terms of The you know just not letting bad guys get it data cyber security and all that there's a lot of effort in that But as far as regulations like HIPAA It's it's up to the research to make sure that that they're allowed to put their data, you know in different places on exceed So let me use that to spin off in a slightly different direction say I'm a random researcher at some Non-exceed affiliated university. How do I how do I get involved and exceed? You know, I've got some science I want to run and I don't have resources to do it How do I how do I get an allocation? so The basic mechanism is the portal is kind of one-stop shopping spot for for everything you do and exceed and So so you get an account on the portal and anyone can do that without you just it's just like you get an account You know work on any online service you create a username you create a password you verify that you're a human and And and then you've got an exceed portal username and and then within that portal you can then go in to to the allocation site and You can look at the different in the portal you can look at the different resources that are available the portal also has a place to Look at what software is available currently, although we can pretty much install anything you want as long as you have a license or it's free and And then you can look in and decide, you know, what what resources you need If and then you know, you submit a you submit a to get started is really easy You you you submit your information you pick some some resources you want to use or try out you Upload a CV and you write a paragraph saying why you want to try these these resources and And that's basically a rubber stamp kind of process. We just try to direct people to the right resources when we get those kind of requests And so does this involve do I have to pay for these kinds of things? Do I get need to give a credit card number or is this all? Researchy credit kinds of things. I mean like if you've never even heard of my you know, I'm from the University of Nowhere, right? How do you authenticate and and regulate who gets on these systems? so There's no for for academic nonprofit Researchers, there's no there's never any money that exchanges hands So so it's it's all free of charge. It's a service provided through the National Science Foundation You you do have to have an investigator or you have to have staff at an academic nonprofit University in the United States or institution in the United States To be the PI, but if you're a foreign collaborator of that person you can get an account, okay? So that's that's one thing is that that the PI does have to be at a US institution But but there's yeah, there's no payments The way it's controlled is through peer review So and and and then the centers have a say on how much you know, they actually can allocate So the constraints of the system and peer review are the way that we we decide how much we allocate to to each person But the startup that I just described is is pretty much available to everyone We'll pretty much give anyone a startup to try things out And you can get up to 200,000 core hours through a startup So it's actually not not insubstantial amount of time You can get through that startup and then you can try things out and based on what you do there You could actually submit a research proposal where that would be peer reviewed And and allocated based on input from reviewers and from the different centers Cool all right, so you mentioned again going off in a different direction because it's just my mind bounces around I apologize You mentioned a couple things throughout the questions here about some upcoming initiatives and upcoming hardware and things like that Give us one more thing that's coming on in the future that you guys are actively working on now because One of the themes that I've heard through what you've been saying is it's what's fascinating about this project is that You're not pretending to have all the answers right now and you're actively trying to figure out what is the best way for people To access and to use these things so give us one more initiative that you guys are working on Yeah, thanks for coming back to that when I talk about new things coming on You're absolutely right that we're looking at at federating with other providers of different things like resources or expertise and and bringing that being the fabric for tying these things together so something that is is Right in the works right now is for example making the open science grid a service provider on exceed and I don't know. I'm not familiar with the intimate details of where that is in the process But eventually as I understand it OSG will be a service provider and people who get an allocation on exceed can request an allocation on the open science grid, which is for those who don't know it's It's more more of a high throughput computing solution I mean they do that very well with lots of cereal or very small parallel jobs That people just need to run a bunch of those and so so OSG is going to integrate as a service provider in an exceed and you can look forward to other types of of Partnerships like that with other service providers coming online In into into exceed Okay, Phil. Thanks a lot exceed org and all the information's on there and where to get started and what to find out about Yeah, thanks for thanks for having me on and If you have you know if you have questions about these things We're we're there to help if you're just getting started, you know emailing help at exceed org is a great place to start if you just want to throw out a random question and Have someone a real person come and talk to you about how to get started Okay, thanks again, and you can find us online RCE at RCE dash cast comm again. I'm Brock palin. You can find me on Twitter at Brock palin and Jeff what's your contact? I am Jeff squires at Twitter, and I've got the blog which is the best way to find me off RCE cast comm So Phil thanks again for your time appreciate it. All right. Thanks, Jeff and Brock