 Welcome to another edition of RCE. Again, this is Brock Palin. You can find us online at rce-cast.com You can find an RSS feed and a link to subscribe and iTunes and all the old shows on there I think some of the podcatchers on the show a couple back all of the old shows all 70 plus of them are available online there Also, I have here Jeff Squires from Cisco Systems and one of the authors of open MPI Jeff Hey, Brock sounds like we're getting close to 100. We're only 20 some away, right? Yeah. Yeah We're getting there amazing that we've been doing this that long. I didn't even realize that Well, we have lots of good stuff to talk about in the future, but today We have a cast from Purdue. We actually have two Purdue people and in a row on here We have Gerhard Klemick From Purdue and he's gonna talk to us about hub zero and nano hub So Gerhard why don't you take a moment to introduce yourself? Okay, so you heard my name Gerhard Klemick. I'm a professor of electrical engineering at Purdue I'm also director of the network for computational nanotechnology, which Created nano hub and operates nano hub as a national infrastructure and We also spun out hub zero to enable other folks to to spin out their own hubs and Disseminate simulation technology to other audiences So hub zero is like the base package and Nano hub like okay, so what is hub zero? So hub zero you can think of it as a portal infrastructure if you like the word portal or a science gateway infrastructure if you like it that way The goal is to enable researchers Or other folks that might have simulation tools To enable them to disseminate these tools to a much broader audience sort of like cloud computing Where the end users really don't have to install anything, but it's It's really like a cloud computing environment. That's end-to-end from a user's perspective and it's powered by Research tools that come fresh out of research. So it's it's geared towards More rapidly disseminating knowledge and access to simulation tools So what's the relationship then between hub zero and nano hub because I think you said in the beginning that nano hub came first, right? right, so The relationship is such that we we basically as we Developed nano nano hub and developed its infrastructure We realized this is really powerful and can help many other fields of science and we we sort of spun out in a sense the the software stack the metalware that Is sort of the framework that holds simulation tools it holds the collaboration environments it holds the user support aspects But it's it's basically an empty hub. So hub zero is the the infrastructure which is open source you can download it from hub zero dot org and Install it yourself and you can spin out your own hub and then you have to fill it in fill it with content So as such over the years we we created an infrastructure group at Purdue that's now focused on Hub zero development and deployment and nano hub is kind of like the the granddaddy of it and in a In a business sense, it's the biggest customer or partner to the hub zero development team So just curious. What's the software license of hub zero? Boy, it's it is truly open source it basically embeds the lamps Web star stack, and I think it is LGPL or it's BSD I don't remember which one of the the two it is but it's of that flavor so take it use it and You can put in your own material and you don't have to commit the changes Or make your own source publicly available So so it's of that open source flavor Is that I'd have to look it up. I'm sorry So there's lots of different Portal systems out there and such and a lot of places get by with things just like you know Media wiki installs for sharing information back and forth What's some of the unique things about hub zero and maybe some of the things you've done on nano hub? That you really think is kind of like You know this is unique for our our our industry So what I would say is that Most science gateways or portals Are a little bit like what you have or expect out of a what you get at your bank, right? You you log in on a remote portal you fill in out some some little tags you push Run or get me the query and you get some data back and in general it's it's more or less data in a raw form. That's not very much graphically Represented typically if it's graphics, it's like gift images or something Either way it would be hard to really truly interact with this data that comes out of this query or comes out of that That simulation of that computer program So what's what's really unique is that we basically run a full-fledged simulation tool or tool sets Real graphical user interfaces in a that were developed in a UNIX type framework And we pipe that into the user's browser via VNC type technology, so it's truly interactive tools They run as if they were running on your local machine, but they're remote And the user doesn't have to install anything and doesn't have to put into place the connectivity to grids, etc So it's the interactivity That that's I think unique So you said you said something very interesting there though So when I was doing a little research before this and I was playing with some of the simulations on Nano hub, I just kind of assumed they were JavaScript-y kinds of things But you said they're more like VNC kinds of things So are they running back on your back-end servers and you're doing a VNC like view in my browser? That's exactly it. So so basically you have a VNC view into a virtual UX machine that runs an X11 type environment and We can host any X11 Linux based tool in this environment and The front end might run in this virtual machine also the the computation if it's lightweight enough might run in that virtual machine Or it might get dispatched into some bigger machine Pending the required resources or depending on the quiet required resources So it looks like a thin Or like a Java applet in a sense and the only thing that's Java about it right now is it's the VNC Clients that runs on your machine the rest really runs on the back-end and And possibly on a even further remote back-end on a parallel computer If if some some codes require some 200 cores on on 3-4 hours or some of the simulation tools run in seconds, so And I think that that's the unique piece And the origin really of these tools is the unique piece Interesting can you give us some specific examples of simulations and tools that people can use sure so One of the earlier ones was a tool called nanowire very creative name, right? You can imagine that it models a nanowire or electron flow through very small wires and that was a PhD thesis work that In typical academic fashion would of course be getting very dusty Virtually on some some DVD backup system or some tape backup system and nobody would be able to use it We have developed a technology called rapture that Allows us to put user interfaces on top of these tools really rapidly within a few weeks and then deploy this research code to a broader community So so there's two novel aspects to that number one It's codes that normally would be condemned to nothingness and we put a second life on them To the user interfaces are rather friendly. You don't have to read a 200 page manual before you can operate this tool So so that's would be one tool There's another tool maybe We have a donation from former Bell Labs That where we can host their device simulation tool how they designed their their semiconductor chips and You can run the tool raw Meaning with a garbled igook in input language. That's hard to use or you can have a Sort of derived versions that have very easy to use interfaces So if these tools that can then be used in classes that teach maybe peon junctions or MOSFET Transistors and things of that nature. So suddenly these tools that were condemned to expert use become Rapidly being used in classrooms and by experimental researchers that normally wouldn't touch it Now a follow-up question one of the things you said earlier that you know some old dusty academic code or something might have been Condemned to death But now you're giving a new life in inside of a virtual machine Does that mean you're you have a couple of different types of virtual machines? Perhaps one with older operating systems that were required for some of these codes We actually don't do that right now pretty much all our virtual machines and all the codes we have our Reasonably portable. We haven't found one that that will not port to the next version up We had one tool where we played around with that for a while But then also these these older virtual machines have issues on security too, so you have to be careful there as well, right? but So as of now, I think it's pretty much a homogeneous Virtual machine back end but in principle it could be managed tool by tool So so that that is conceptually there but in practice. I don't think we do it But we run some some of them are windows some of them are Linux so So in that sense, we have at least two flavors So how hard is it to set this thing up? You're talking about virtual machines and making user interfaces for tools and You know a VNC viewers and having the whole lamp stack up Like what's the process? So so there's there's two aspects to that, right? Let me from the From the let me take the second step first because that's what the more typical one is Let's assume you had a hub up and running somebody set it up for you we tend to create user interfaces and tools and pairs of graduate student undergraduate student faculty member and And that takes some Maybe for a summer undergraduate students We make those projects over a month and a half or so They learn this rapture toolkit and they they learn the science of the tool and by the end of the summer They they actually have a tool up and running that after a while can literally be used by thousands of people and That assumes knowledge of rapture a little bit of Unix knowledge And off we go. So it's basically a Unix machine running in your browser and that's how they develop now How to set up that whole framework? Out of box is not necessarily simple we distribute hub zero as a Overall virtual machine so you can install this whole thing as a As a virtual machine, but to install it for real a real infrastructure. You probably need a At least a 1u unit that has some 8 or 16 cores some some hefty memory You have to know about virtual machines, etc. And There's quite a bit of documentation that go goes along with hub zero, but still I think you sort of have to have Some serious Unix and server knowledge to to really set it up So it's it's not necessarily a thing you just sort of do in an afternoon and just for the heck of it So you kind of have to be dedicated to want to do that the alternative is We have a couple of Institutions that actually run their own hubs now Another model that we have done here at Purdue is This this hub zero development group hosts hubs for entities For Purdue entities and also for external entities under sort of a subcontract That has happened. So those are the The two miles we have in a sense we we hosted Or you can download the source and and go at it and those are the the two modes that are actively being pursued Hosting like that. It's really quite neat and the Turn around time to have a student understand and write an entire user interface for scientific application and only a month is Really impressive to you can have that going that fast Yeah, I actually really like it. So it's pretty cool. I mean, it's There's 235 tools or so on Nanohub Again, we allow anybody to have any interface like qt or even my lab interfaces or whatever I think about 220 or so use rapture And we don't indoctrinate it into people that I must do it, but it is pretty easy But what about for power users, I mean Graphic user interfaces and stuff is what we've been talking about What if I need to do like gigantic sweep or you know the kind of standard HPC user of the old cloth you could say All right, so what what we offer to the developers typically not to the end users is Basically an application we call workspace and that workspace is an instance of Linux workstation that again runs your browser and it has the whole software stack of Nanohub or hub zero installed So it has also the whole grid computing Capability in the in the back end installed So let's say you have code like that's big one lamps like a molecular dynamics code Which we also have in Nanohub and we have I believe some Some instances also installed of lamps on high-end computing systems. So you could in principle Compose your input deck to lamps on this workspace and then use our submit infrastructure to submit your input deck With lamps to some compute resources. So it's a it's a front-end to grid computing even for the geek What what all these keywords are that I just mentioned Where could you give us a little more detail on on rapture, you know, how would I go about making my own tool? Do I need to link in certain things or do I just write any old GUI and it fits in afterwards or give us some examples of that So all this info can actually be be found on rapture.org. That's with two P's are a PP Ture.org it relinks to some page on Nanohub that explains it in some detail and there's videos and podcasts on that as well but but the gist of it is basically that You describe your input and output through an XML descriptor base and From the input description in this XML it renders a GUI We also have a little bit of a GUI builder that dumps out the Framework of the XML file for you. So you don't have to do too many XML push-ups Now there's two ways to interface that GUI to your code the Probably the most prevalent one is that people are in love with their input language and their output files as Much as a mother is in love with their child And and what they need what they typically do is they write small scripted translators either in Python or Matlab or Perl or tickle To this XML base rapture description and there's API is in rapture to any of those scripting languages So see your C++ and Fortran where you can write those little translators and So you translate the The wrap input into your say couple the gook input deck or maybe extremely well structured input there for all that matter You're the script and executes your code you get some output files and Then you have this script to transform these output files into the XML based file that rapture likes to a data object that rapture likes to get then Then it's all framed right Then basically you have a workflow your script is a workflow It does the GUI generation the Translation the execution of a code or multiple codes if there's a sequence and then the processing of the output That's the typical way so it's sort of like a wrapper around the science code you date You may not want to touch But there's some scientists that also said well, I actually I conjured up this input deck I really don't like it all that much like rapture as a as an IO handler and a database handler and I throw out my own IO and I just adopt in my C Fortran or Matlab code. I adopt is the rapture IO as a library and then you don't have to write all these scripts anymore these translation scripts So there's two two ways to go about it So I want to shift a little bit here We've been talking a lot about this submitting jobs to you know back end infrastructure using these you know nice easy to build GUI interfaces and such which is kind of unique thing to The nano hub hub zero kind of setup. What are some of the other things you can do on this thing? I mean, does I have all the traditional features of what would call a portal places for storing documentation? collaboration across units departments colleges what yeah, so so The in the web infrastructure we have a lot of these items that you just listed so we have For the tools most specifically we have like question and answer for we have wish lists Where people say this is a cool tool, but I wish I would do it this thing The next big thing so there's exchange of ideas and developers can jump on it or people can join the development team Then there is a group mechanisms where some people Aggregate information into groups and these groups could be open They could be by invitation only or they could be completely hidden So they can't be discovered These groups and and these tool pages are basically wiki type pages. They follow a wiki format so they can be easily edited and Some professors now have their own group pages even their home pages on nano hub because there's a lot of exposure to some 200,000 users that Roll by on nano hub each year So so they see that as an opportunity and as an easy way to edit their own group pages on nano hub So so these collaborative tools are there There's no chat for example. There's no video chat and in a sense, I feel there's a Whole lot more that can be done, but we also then play catch up with standard other environments that like Google for example has wonderful collaboration environments so But it's very clear that users need it and they would like to have it in one place and then So which we strive to achieve some of that So how do people typically use the hub zero infrastructure? I mean, how do they Organize their their information is it is a do you typically see a lot of wiki use or or documents that are uploaded or simulation, I mean, how do they I'm not even phrasing the question well because it's it's so open-ended like just how do typical people typically use These hubs to actually collaborate. What information do they share? right, so So you're asking about the people that share which is in general quite a small Amount of people compared to the ones that use So so we have some 3000 content items or more on nano some 230 tools so the tools are I Mean distinct publications in a sense. They have digital object identifiers. They have authors They stem out of a collaboration team Then by far the larger number of content items are Seminars and lectures or lectures in a classroom We have some I think by now 55 or 60 Complete classes and nano technology that are videotaped and Processed so with PowerPoint slides voiced over etc That constitute content, there's some 800 authors now in total on Nanohub 3000 plus content items So that's typical ways of It's a publication venue if you wish to look at that at it Then some teachers put together their course wikis where they basically say well In the first week we do this These exercises and please run this tool and Etc. So they they almost build a syllabus in Nanohub We did some of that For them where we created something like a tool-powered curricula. So it's a one-stop shop for faculty members and students to come to a particular Say class first teaching semiconductor devices or for teaching quantum mechanics for engineers So these are sort of aggregates and then these Faculty members and students can buy into the aggregate so to speak of course. It's free So they don't really buy it's a figurative buy So I think those I would say Biggest gain we stand right now to see is more people contributing more of the day-to-day Usage components like the the files they might distribute to their students with their course layout or the homework assignments etc. So That we don't see a whole lot. We see it a little bit this this use for Nanohub or hub zero as a tool for education Was this something that was expected from day one when you first built the collaboration environment? Yeah, we have I mean it stemmed out of research. So anecdotally It was geared towards Sharing a research tool with another research group and Enabling to use the tool without having to reinstall it or as a matter of fact to rewrite it on a PC because most Experimentalists were on PCs and And didn't run Unix type tools. So so it stemmed out of research, but really early on Then people realized oh, I know I have small tools or bigger tools or even circuit simulation tools that My students wouldn't have to install my system admin doesn't have to install And so let's use them in the classroom and I don't have to in I can just go in the classroom and launch it and Demo it in the class and show people how to access it. So so the use in in Informal education has been very evident early on and I have kind of cool data to show now where I can Really show that use and research and use in classroom Overlap that you can't necessarily say a priori This is going to be a research tool or this only a classroom tool they overlap and I can show Quantitatively how these tools are used in the classroom and they're being cited in the scientific literature So the separation classroom versus research is really not not a good one So this sounds like a tremendous service particularly since it's free and if I am you know a given professor out there I can just latch on to all this material that that's there How do you do this for free because I mean the the people and the equipment and the Bandwidth alone has got to be relatively expensive or at least it's not free. You know, how do you how do you provide this for free? right so so the cute answer to that is it's free for the users it's not Not free to me as the as the director or the The organization that hosts it and actually the funding agency that funds it So the agency that funds it. It's the National Science Foundation that provides Very good grants to Purdue and its affiliated universities that operate none of up and fill it with content And It's not free to me because I have to continually prove that this infrastructure that we're building is actually useful and is being worked used by people so we do a lot of studies to show that it's Impactful and how it's impacting research and education, etc. That's why I know these assessment numbers, etc So we are now in your 10 over 10 year grant last October there was a call for proposals to Reconfigure the NCN and nano hub. We handed in a big proposal in January This whole thing is under review There will be new content providers as well that is also under review by NSF and I'm quite hopeful That nano hub as some 200,000 users use it every year Will continue in some shape or form for another five plus five so ten years based on Some organization that will win this grant and there will be continuity So so I'm quite hopeful that that will be the case So how do you gate things? you know like The amount of consume computation on the back end for these different works I mean mentioned lamps lamps can consume a lot of CPU depending on what kind of simulation someone was trying to do And then how do you gate like adding new tools to the public toolbox? You have over 300 now How do you keep that from exploding into 13 copies of the same thing? Well, I mean there's a couple of questions you have in there, right? One is a matter of how can you possibly think of providing enough resources in terms of compute cycles? And there's two aspects of that. That's the front end running the visual front ends and GPUs that do some of visualization and then the back end that provides the cycles So the high-end compute cycles we distribute are part of another NSF allocation on Ter grid or exceed as it's called now So we can dispatch jobs into that queue or queuing system and We haven't really pushed very much the very high-end jobs either Mostly because We want to make sure this whole grid infrastructure is actually working and working reliably, etc And there were some Technical difficulties to overcome and we feel more and more comfortable that we have overcome them or can monitor them So so far we haven't run out of cycles So that that's that's one aspect of cycle provisioning the second one you mentioned is sort of to quality control slash duplication control and In a sense what we offer is a you can think of it as a tool publication venue, right? Where people feel that their tool Should be on nano hub or they want to enable it and We now have people installing tools on nano hub or That are not directly affiliated with us meaning they have other NSF programs They see benefit in disseminating their tools and these tools that are created outside of The core unit of NC and a nano hub are now also being used by yet a fourth Say party for research. So we're starting to be a Like a dissemination venue for for other folks We're an infrastructure in terms of the suppliers and the users, which is a kind of a cool thing In that we're actually kind of unregulated. I mean the thing that we demand from These tool developers are all right. Give us a paper a scientifically peer-reviewed paper If it's a completely new tool or give us a decent reference of what the model is that is in this tool please try to provide a first-time user guide to the tool and And stand by your tool if there are questions on the science components So those are the requirements we have towards tool developers and yes, we actually have some Tools that are sounding similar. They are running similar models, but you'll find that No, there's different brands of cars, right? I mean they all have four wheels and a steering wheel and a handbrake and all that stuff but some people like one over the other and I think that's a fair thing and it's a good thing and then you can do performance metrics of one tool versus the next and And people have their personal preferences and they have their belief system or what's the better model? So we open that for people to to utilize that So how many? Hubs does Purdue itself host you mentioned that you guys do a lot of hosting You know internally and externally do you have any numbers on on how many you guys host? It's I mean the exact numbers are I think on hub zero dot org But if I think we have a total of 40 hubs and I think 12 or so are running outside I'd have to go on the web and look right now, but those are the ballpark numbers. So the majority we host For Purdue entities or on behalf of other entities Now is You know hub the biggest of those would you say? I mean that's kind of a subjective question But well, it's certainly the biggest one in terms of number of users and Called market penetration and global use There are the hubs like global hub, which is a collaborative engineering hub That has large numbers of users in the tens of thousands There's another very large hub project, which is called knees hub. It's the National earthquake engineering infrastructure That is also NSF funded. It's a very large project and they use hubs to share Data of simulation data of physical experiments and also simulation tools that model these experiments. So So that's another very large project Those are the two major very large projects and then there's lots of smaller projects that utilize hubs Do you have any kind of statistics on how many people outside of Purdue are hosting their own hubs? Yes, so my gut fee What I have in the back of my head is 12 or so 12 fully operational outside hubs That are hosted not at Purdue So there's all these hubs out there and other people using them What are some of the strange uses you've seen of hub zero? The strange uses Yeah, something like it wasn't originally you never envisioned it being used for us. Oh, I see hmm, I What I didn't expect to see I mean, I expected it to be used for research and for education What I what really surprises me is that? how rapidly these former research tools migrate into the classroom so we we we analyze usage patterns of these tools and Do user user correlations and that's how we know When there's a cohort of people all showing up on Tuesday all using the same tool and roughly coming from the same IP pool We know a hot day. These guys are doing a homework assignment, right or a project assignment And and that's unique say to a certain tool We can and that might happen in many classes But we can measure the time between the tool publication when when when it showed up on the classroom in the On Nanohub and the time It shows up for the first time in a classroom, which is sort of an adoption rate in a sense, right or use first time use Analysis and I mean in a sense you can think of these Tools as new textbooks right or parts of textbooks and the typical textbook takes some four years or so to to write So we have tools that show up within weeks in the classroom. So once they've once they're published They show literally up a couple of tools show up within a week Some of them like less than a month the on average sorry and the medium time is less than six months and You might say well these weak adopters like they take a week to adopt is probably the faculty member itself Creating a tool for their class. That's true, but we also have We can prove now that there's outside adoption and that starts about like within a month People that didn't create a tool adopt a tool for their own class So that that is so fast compared to Four years of writing a textbook or something That was quite shocking to me that that was really surprising. I would have thought it's a year or something But it's literally weeks to a few months, and that's kind of cool. That that was really surprising to me I Think that that would be the sort of the odd one I would say So as a software developer myself, I'm always curious as to people who are writing other software I have two questions that I like to ask them just for pure curiosity's sake. Number one. What language? Did you guys write most of? Hub zero in and number two what type of version control system do you use for your code base and why? Okay, so so the standard front end Is this lamps business, right? So Linux Apache my sequel PHP Forget what the s is so so basically the the front end is PHP in close adoption to some jumlah Content management system that we then extended out quite a bit So so that's the web front end Then the the middleware that manages virtual machines and Grants access and all that that's primarily written in Python and Sorry, what was the other question? Oh the revision control system. Yes. Yeah So so so far we've been using SVN and we're moving towards get Not get seems to be more General for us for multiple People working on projects with multiple permissions and that seems to be what I remember is what Michael McLean and our software architect and sort of software genius of hub zero Identified, but I'm not quite sure what the detailed differences between SVN and git So what are some of the things coming for the future for hub zero you mentioned you have a new proposal in what kind of Features and stuff. Do you want to see happen? Hey, cool question that I think professors always like to talk about what they do in the future, right? So so Yes, there's some cool things we envision doing that is We have now over 850 Citations in the literature that site nano-hub usage many of which sort of show Nano-hub simulations that they have run and then maybe compared to experiments wouldn't be cool if you actually read this in an Interactive journal say on your iPad and you read this paper and you click on the graph and it goes back to the the tool on nano-hub that ran this Right, so so that would make Scientific results really duplicatable in a sense and you can maybe then if you there is enough compared to your own simulations You can actually get the numerical data off. You don't have to scan the graph and try to guess data off of this graph So we want to embrace the the data storage of these simulation results and engage publishers on scientific communities like Institute of physics and IEEE and And springer we have actually agreements with that we will have special issues where we will push these interactive journals Host the results and then also start to manage some of the experimental data That is out there to to compare that To simulation results. So it's really All about everybody talks about big data There's lots of small data to be handled and so we think we have a way to to create user interfaces for this data browsing quite rapidly and then allow people to To foster this This notion that a good data set is like a publication, right? It's it maybe just as valuable or even more valuable than the paper that described it So other people can utilize it. So so that's one of the the new aspects of What what we're up to in in nano hub and hub zero To to to make interactive journals and to make scientific results reproducible So what's some of the contact information for hub zero where people can find it download it and nano hub? Also, so that's that's pretty easy hub zero That or g so h u b zero Z e z e r o dot or g. That's the the framework in a sense and you find all kinds of information on hub zero there how many other hubs are running there. There's links to all the other hubs and Nano hub is now an a and a and o and then hub h u b dot or g and You can find contact information on there and I guess my name is Gerhard Clemac and my email is gecko G-e-k-c-o at Purdue that edu so you can send me email if you have questions as well That would be very welcomed Okay, thank you again, dr. Clemac You're very welcome. Thanks again. This was great. All right wonderful