 of RCT, Gendis is Brock Palin, and I have with me Jeff Squire from Cisco Systems and one of the devs of OpenMPI, Jeff. Thanks again. Hey, Brock. You may notice our voice call is a little different this time, Jeff, because we're doing this on a phone instead of on site this time. Yeah, this is a great experiment here. We're testing various different recording services for reasons that are completely uninteresting. And Brock has the fantabulous fortune to be on a cell phone. So he sounds a little bit like he's on the other end of a tin can and string. Yeah, yeah. And as usual, we have our website, rce-cast.com. On there you can find our Twitter links, and you can be posted who's going to be on there and you can send us questions for things you'd like to hear about. There's also a nomination form, so if there's anything you'd like to hear on the show, please include that. Jeff, you also have a blog we link to off of that page. I do, and it's almost like we have to mention this every show, isn't it? I have a blog on there. I write random musings about MPI and various HBC topics and things like that. And we also get, you know, via the social networking questions and comments for upcoming things of which we've got a couple of questions this time for our show today. And our show today is Brock. It is LAMPs, a MD code from Sandia, and so we have one of the guys who works on that. Actually, I've been lurking on the LAMPs list for quite a while for stuff related to my job, and this guy, his ability to answer questions quickly and concisely is amazing. I don't know how he even manages to work on the code, ever. But we have Steve Clinton from, he's actually at Sandia. So Steve, welcome to the show, and please tell us a little bit about yourself. Great. Thanks. Happy to be here. Yeah, I've been at Sandia about 20 years, I guess. Hard to believe. It's gone by fast. Sandia is a Department of Energy Laboratory in New Mexico, and I came from a computational modeling group, a physics background here doing simulations of solids and got involved in a parallel computing group. That was about the time 20 years ago parallel computing was just starting to take off for scientific simulations. And so over the years, I've just been involved in a variety of different algorithm and code development projects, most of them around different kinds of particle simulations, and the one that I guess consumed most of my time and energy has been this LAMPs project you mentioned. And so that's an open source code that we've distributed for the last five years or so that's found some use and utility for different people and groups, and so we kind of support it and interact with those people in a collaborative sense. So we're here to talk specifically about LAMPs, but we actually had a question from a pretty heavy user of LAMPs. Carolyn asked, how did you get started in high performance computing? You said that you kind of ended up at Sandia, and how did you end up explicitly on LAMPs? Yeah, so the group here at Sandia initially was just looking to explore parallel computing as a way to do a new kind of high performance computing for a variety of science simulations, and the background I had in doing some molecular dynamics in graduate school was kind of my entry into looking at parallel algorithms for doing MD. That kind of led initially to some collaborative work with some companies in the mid-90s that were supportive of that effort. That's where the initial version of LAMPs was created in Fortran. I guess that's been 15 years ago, and so over time that kind of initially was a proprietary project with those companies to develop the initial version, and then we moved it to open source and rewrote it in C++ about five or six years ago, and have kind of just gone on from there. Okay, so LAMPs is an MD code, but MD can mean a lot of things. Can you describe exactly what LAMPs focus is in? Sure, so more specifically, it's a classical MD code which would differentiate it from, say, quantum-based codes or other kinds of molecular dynamics. Classical means that it just uses simple empirical formulas, so it's kind of a coarser model, say, than a quantum code. LAMPs, more generally, you can think of it as just kind of a Newton solver or a time integrator for a collection of particles, and so the particles can be at an atomic scale. They can be actual atoms or more coarse-grained pieces of molecules or molecules themselves, so that's kind of the traditional classical MD code, but more generally, they could be mesoscopic particles or even macroscopic particles, little granular particles, or even pieces of a continuum model. So there are interaction potentials and boundary conditions and options in LAMPs to sort of simulate particles at a lot of different lengths and timescales. One thing, for those who didn't catch it, we've been saying MD quite a bit here. MD refers to molecular dynamics, right, Steve? Yes. Okay, just to make sure we're all on the same level field here, but you mentioned a couple of years ago, you rewrote LAMPs in C++. What prompted that? Why did you guys do that? Well, the initial FORTRAN version we'd probably had eight to ten years of effort into, and what we found is that as you use your molecular dynamics code, your MD code on a new project, you typically have a new model, maybe you need new properties with your atoms, new boundary conditions, new force fields, and as we kind of rework that loop many times, the old code got kind of crafty and difficult to work with, and we realized we needed a more general sort of code framework that allowed you flexibility to add things easily, and so that was kind of the motivation for rewriting the code, and we thought C++ had matured enough on high-end parallel computing platforms that the compilers were good. The performance, if you wrote more C-like, C++ was basically equivalent to FORTRAN, and so we thought it was a good time to try to make a more flexible general code. The flexibility of the input format for LAMPs is one reason a group that I know that uses the code to use LAMPs was one of the many reasons. What's been the speed of adding new methods and new features that they've been appearing in LAMPs? Okay, yeah, so when I was talking about flexibility, I wasn't focusing so much on the input options, but more on I think the latter part of your question, the ability to add new methods or new features to the code with relatively low overhead and do it in a way that sort of doesn't conflict with the features that are already there, and so that was one of the design goals we had with the new C++ version, and in hindsight, I don't think we had any great vision necessarily, but in hindsight, giving that flexibility and allowing things to be added fairly easily was probably the best feature we put in the new version of the code because that has enabled a lot of not just ourselves but other people who've wanted to either contribute code that ends up making it into the main release or just modify it for their own purposes and use it internally. I know a lot of people do that. It has made it relatively low barrier for people to do that, and so I think that is one thing that people like about the code. So how much of a community have you built up around LAMP? So random user contributions are kind of important. How does it work? Are you the core developers there at Sandia and you get random other stuff, or have you evolved and gotten some other core developers outside of Sandia, or how does your community generally work? I would say it's pretty informal and grassroots. Certainly the core group of developers that have been with the code from the beginning are here at Sandia, a small group of us. There are several active people who we've kind of just met through mail lists and other email interactions that have contributed significant pieces to the code. The community around LAMPs I think has mainly just been built through the mail list and through people sending things in to us that we add. I'm sort of a gatekeeper, I wouldn't call it a truly open source project in the sense that there's just a zillion people who check into the main repository and the code kind of grows randomly. I'm sort of a gatekeeper for that process in terms of checking things into the main version to kind of keep some consistency and make sure there aren't new bugs that get generated accidentally. But there has been a lot of stuff that has just come out of the blue that we've massaged a bit and added to the main code. And like I said, I think other people just kind of do that on their own and are able to keep up with the main branch with their own private things they've modified and added. So extending it to the community outside, just specifically LAMPs, there's a lot of MD codes out there. We've had Gromax on the show and Humbi by Josh Anderson which actually draws a lot of its influence from the LAMPs. What really distinguishes LAMPs from these other codes and what is your relationship with other MD projects out there? That's a good question. So yeah, there certainly are a lot of MD codes. MD in general I think has been popular for high-performance computing because these kind of particle simulations are sort of well-designed and scalable for high-end parallel computers. So I think a lot of the big sort of open source, openly available MD codes that are out there, many of them have sort of a bioemphasis. They've been used for proteins and DNA, big bio kinds of simulations. So Gromax that you mentioned, NAMD, some of the older codes that have a long legacy in a really wide user community like Charm and Amber are in that category. So one thing LAMPs maybe is a little different from those. I would say for bioproblems those codes certainly have a lot more features and often are kind of tuned and optimized for those kind of problems and do quite well. It's hard to compete with some of the feature sets those codes have. LAMPs tries to maybe be a little more general in terms of not being just biospecific but having a lot of force fields and interatomic potentials for material systems and some of these other coarse-grained and mesoscopic and continuum systems I mentioned. You mentioned HUMDI, that's a code, as you said, that Josh Anderson has been working on for a few years. I think it's an interesting code that's different from all those others. I just listed in that it's kind of designed from the ground up for GPUs. And so he's made a lot of nice contributions and shown a lot of impressive results for getting really fast speed-ups on that specific hardware. And so I think he's trying now to broaden the class of problems that HUMDI, his code, can work on. And so it's kind of a new, interesting entry to this field of open-source MD codes. So do you actively collaborate with any of these other projects or do you keep tabs on what techniques and things that they're using and is there any kind of cross-pollinization between projects? Yeah, I wouldn't say any of that. I wouldn't say interaction or collaboration happens in a formal sense, but it certainly does informally. I mean, we read papers, and I'm sure that happens vice versa that the different groups write about the different things they're adding. We may hear of a method. Somebody on our mail group may say, we're able to do this in another code. This would be a nice feature to add to lamps. And so if it's something that's relatively easy for us to do, we might pick up a feature that way. I know there's a couple people, somebody who's really active in our user community and mail list is a person from Temple University, Axel Kohlmeier, and I think he's also active, for example, in the HMD community, so there's some cross-pollination that happens that way. Yeah, I know. I've met Axel, and he's a really bright guy. Yeah, he works on just MD codes. He works on other kinds of open-source materials, modeling codes as well, so he's a really great contributor. Yeah, he's in DMD and everything else, too. Yeah, he's in DMD also, that's right. And he's still willing to help people, too. He's a really nice guy. So actually, curious about moving into some of the performance stuff. It seems like every now and then it comes up on the lamps list, running only, so if you've got a four-core processor, actually only running two cores because memory bandwidth issues and stuff. What's some of the focuses you look at when trying to get good single-core performance out of an MD application? So the fundamental issue, I think, with MD is you have this collection of particles that's moving around, and you need, typically, the interactions that you want to compute are for particles that are geometrically close to each other, and so you need some efficient way to find what particles are close to each other. Often that's in the MD community that's called a neighbor list that you would create. And then you use that neighbor list every time step to compute force interactions. But in a performance sense, you have this collection of particles that's moving around and reorganizing themselves, and then you're sort of computing random interactions between pairs or triplets of particles, depending on the model. And so that means you're kind of hitting random places in memory trying to pick up particle attributes and do that in a cache-efficient manner. And so there's various schemes people use to try to make that efficient in lamps, I wouldn't say is necessarily very sophisticated in that context, but we try to do things like create these neighbor lists efficiently and use them efficiently so that you get good single-core performance. But I'll add one thing. The memory bandwidth issue that you mentioned is important because you're making these kind of random hits into memory, or even if you organize your list of particles and neighbor listen things so that you have better cache performance, you still often are sort of memory bandwidth limited in how much data you can pull in to do these computations, especially for force fields that are fairly cheap so that you pay a cost for how fast you can get particle coordinates and forces in and out of memory. Now, are there any other software tools or frameworks out there that help manage this kind of memory locality and overlapping cache misses and things like that, or do you rely mostly on compiler optimizations and your own code strategies for this kind of stuff? In lamps, it's more the latter. I mean, we just basically try to write clean, simple C code that the compilers can do a good job with and maybe have some higher-level algorithms that sort the particles and do things in a way that will help with the cache performance a bit. There is, I'm familiar with a method that I guess it wouldn't be so much single-node performance, but the NAMD code has some higher-level things it does with a sort of runtime system called charm that tries to reorganize the computation across different processors in order to get better parallel scalability. So that's kind of a higher-level thing they do for performance optimization. So Lamps, though, is actually able to run on multiple processors using MPI, and IBM was out of my day job one time trying to sell us a blue gene, and they actually held up Lamps as an example of getting super-scaling because they were running one atom per core, which was fully crazy. You never heard anything like that. What kind of method Lamps used to be able to get good MPI performance? So I hadn't heard about this one atom per processor number. Offhand, I would think that probably wasn't a limit in which Lamps did very well, so I probably want to know more about that. But for a parallel machine running MPI, what we do is take the simulation domain and chop it up into little blocks, one per processor, so it's what's termed a spatial decomposition of the simulation domain. And then to get good MPI performance, we just try to do things in a semi-synchronous fashion where you have big messages bundled up, going back, you know, a few big messages between processors a few times a time step in order to sort of get optimal bandwidth and low-latency performance under MPI. So do you do parallelization with both MPI and threads, or solely with MPI? Certainly initially in Lamps, it's been all MPI. We sort of targeted distributed memory platforms from the very beginning. There's been some work by a few people more recently especially for like multi-core boxes and machines where you may not want to run MPI down to the individual core or process level to add some options for a more hybrid model. So we would still always do MPI at the high level, say between nodes, but allow some kind of threading or open MP kind of constructs in some of the computational kernels at a low level. And I would say to this point, the results of that have been kind of mixed. It's still hard to beat, at least on machines, with a modest number of cores per node. It's still hard to beat all MPI performance with threading, but as you know, you get higher and higher core counts on nodes that might change in the next few years. So in taking your serial code and paralyzing it, what would you say some of the biggest challenges were? What were the difficulties that you ran into, both algorithmically and programmatically, and trying to use parallel technologies? You're asking back sort of in the early days when we first designed lamps as a parallel code, do you mean? Yes. Yeah, so I guess one thing we never did with lamps is we never really started with a serial code and tried to make it parallel. And I think this is true of probably most groups who write parallel codes for MD or other kinds of applications. I think you're usually better off sort of designing for parallel from the start rather than trying to adapt a serial code that has a different set of problems. But I think oftentimes, for example, with the spatial decomposition method I mentioned, you can think if you can sort of write it, I mean, lamps will run in parallel on some number of processors that you allocate, but it will also run on one processor. And so in that case, the spatial decomposition that the one processor owns is just the entire simulation box. And so you can sort of write the code for serial in a way that a processor knows about its domain and maybe it knows about the borders creating some copies of atoms, something that's called ghost atoms. But doing that on one processor is sort of the same idea that you would do on multiple processors. So a processor would still own a little domain and know some boundary information that it shares either with itself or other processors. And so in that sense, the one processor mode for the code is really essentially the same as the mini processor mode. Okay, so it's good to hear you say that because we actually tell people the same thing. It's better to design for parallel than try to adapt depending on the scope of your application, of course. But the real thing I'm going after here is, what makes it hard to be parallel? Is it the algorithms itself or do you have challenges with individual MPI implementations or underlying technologies or do you find that you never have a network bandwidth or what kind of things do you run into? I see. So depending on the physical model and whether, for example, you have long-range charge interactions and that alters some of the computational methods that you need to use. But for something that really just involves short-range interactions, which for a lot of materials problems is a good model, typically we can scale up as large as you want. And the number of atoms per processor is sort of a rule of thumb that you try to follow to have enough atoms so that you're not spending all your time communicating information with other processors and paying a high cost for that. But as long as you have a few hundred or a few thousand atoms per processor, we can typically scale up quite nicely on most parallel machines that have a good MPI implementation. And so there, the rule of thumb might be that we hope to spend no more than 10 to 20% of the time communicating and the other 80% computing. Now if you have a model that a lot of the biological problems need, which is where you have these long-range charges, then you have essentially a long-range coulombic problem to solve and that's most often done in big MD codes with an FFT-based solution. And so now you have a big three-dimensional FFT to do across all your processors and that can often limit scalability because that's something that takes a certain level of all-to-all communication in order to perform those FFTs. So that's a scalability bottleneck that a lot of codes face. I have a lot of users who do the more material science kind of work and not the biological work and that's what they're using lamps for. The question they're always asking is should we invest in an incentive and or Ethernet and what's kind of your rule of thumb for deciding between the two? Well, nothing beats an actual benchmark and an actual run in the problem, especially for the kind of problem that you want to run. But generally, like I said, if you were doing one of these short-range material science problems with the interactions I mean are short-range, you'd shoot for spending no more than 10 to 20 percent of your time doing communication. So the numbers I've seen and I don't necessarily have a hard number off the top of my head is that as you go to bigger systems hundreds or several hundred processors typically you do start to pay an overhead if you have poor hardware communication and so even something that should scale nicely up to hundreds or even thousands of processors will start to get, you'll start to take some hits if you don't have a good communication fabric on the machine. So there's a case where paying the extra money for something a little more expensive might pay off. You actually expect a significant speed-up for a large number of processors even with Ethernet up to a decent amount? Again, I don't necessarily have hard numbers and so I wouldn't want to mislead somebody. I guess the safest thing to do would be try it. It's also a function of what model you're running. I mean there's models that are inherently more computationally intensive just due to the force field interactions versus cheaper and so the communication part is kind of independent of that so if you have something that's heavier computationally with a smaller amount of communication relatively then you might be able to get away with Ethernet to a larger number of processors. Okay. Now a question that came in from the social networks from Carolyn came in something like this. A code like LAMPs chose a certain decomposition scheme for parallelization roughly a decade ago. Can Steve discuss other types of ND parallelization schemes and where they apply and whether it is possible for distributed computing hardware to change significantly enough to make an alternate scheme be even better? That's a good question. Yeah, before we had started on LAMPs myself and some others here at Sandia we actually worked on alternate schemes trying to figure out good ways to parallelize particle simulations like molecular dynamics and we had some that were more particle based or were based more on splitting up the forces and ignoring some of the spatial information and at the time as the question alluded at the time we sort of started with LAMPs we decided at least for big problems there were lots of atoms that the spatial decomposition approach would win that it just scaled better in the limit of large numbers of particles. What's changed in the last few years has been that sometimes people don't want to run just an infinitely huge number of particles billions or trillions but they want to run a relatively fixed size problem that's relatively small maybe a hundred thousand a few hundred thousand atoms but they want to do it on more and more processors so back 15 years ago nobody really thought about machines with a hundred thousand cores or a million cores or something like that but they're now starting to be deployed and they're being proposed for future petascale and exascale machines and so if you want to do say a biological problem of a protein and water and the protein's not infinite in size it has maybe these hundred thousand or a few hundred thousand atoms and you really want to run it on a million cores then you're kind of in this mode Brock mentioned I think a few minutes ago one atom per core or a few atoms per core and that's a very different way to have a code that works well and that limit is kind of a different problem and I don't think lamps and the methods in lamps necessarily are the best for that and so there have been there has been worked by other groups to try to develop new methods for that and probably the best known is a private company D.E. Shaw that's investing a lot of money and going after that very problem and has come up with some clever decomposition schemes I'm forgetting now the acronym or the name they use for those in their recent papers but they have methods that are kind of hybrids of splitting up forces and spatial information in ways that get you better performance at these small numbers of atoms per processor so surprisingly at least from the perspective of 10 or 15 years ago this whole area of optimal ways to split up work for an MD code is actually an active area of research where there has been some clever ideas come out in the last few years so speaking of speeding things up you guys recently released the accelerator support in lamps for certain methods GPUs and FPGAs and stuff accelerators they tend to favor a very different type of approach how is that working in with the regular lamps model? Yeah so I think all the accelerator stuff we've actually released in the code has been GPU specific and for NVIDIA chips using a CUDA library that one of our developers wrote and has added some capabilities on top of so the support for that in lamps is still fairly limited what's been done so far is to take a few of the these potentials, these pairwise potentials that are one of the big computational kernels in a typical MD run and sort of make a GPU version based on this CUDA library and so I think the performance numbers for that have been you know kind of modest speed ups if it's a cheap potential like Leonard Jones interactions may be a factor of four or five speed up due to the GPU over a single core performance for a more expensive potential that's like a spherical potential between ellipsoid particles called gay burn that's quite expensive it involves orientational dependence on other things I think the speed ups were better like up to about a factor of 100 on the GPU but one of the challenges for lamps is that we have this huge menu of potentials of interactions and so sort of making a GPUized version of each one of those is a tedious time consuming process so I think we're making incremental steps in that direction and also trying to look at some higher level reorganization issues that would help on GPUs like building the neighbor lists and things like that so we're moving in that direction but I'd say it's a challenge for us because we have a lot of legacy code that wasn't written with GPUs in mind but you mentioned CUDA in there and I'm going to kind of repeat one of my previous questions to you that I asked about MPI how well did you well you've already mentioned that some of your legacy code didn't map well to the abstractions of GPUs and whatnot but what other kinds of challenges do you run into and how were there other parts of your code that did map well to accelerator based models and their way of thinking yeah so I well the real issue with GPUs in general is that if you're doing an if you're running an MD simulation and you have your calculation with all the data stored on the CPU is that you're doing time stepping fairly rapidly depending on the size of the problem but it might be many time steps per second for example and every time step you need to do these force calculations say something that is known to optimize well for the GPUs but you have to ship the data back and forth in between the CPU and the GPU every time step and so that's sending the particle coordinates and maybe some of this neighbor list information down to the GPU and getting the forces back so a code like HMD sidesteps that problem because it was designed from the ground up to sort of be an all GPU code and so it keeps that data basically all the operations on the GPU and just occasionally get stuff back to the CPU so that's why it's able to get some really impressive performance numbers in that kind of model so for us in lamps everything starts on the CPU so we've been trying gradually to sort of coarse grain things and leave stuff on the GPU longer build the neighbor list on the GPU for the number of time steps that the neighbor lists are valid let that data if it can just get on the GPU but we can do that for some simple models and people are experimenting with that and it does improve performance but there's a lot of other options in lamps say other diagnostics or things you might want to compute and so some of those need to happen back on the CPU or they'd have to be ported to the GPU so it's again just kind of an incremental thing with how much of that calculation can we push onto the GPU and make GPUized versions of those various calculations so the Accelerator functionality with something new what other new stuff do you want to see added to lamps so we have kind of a laundry list or a wish list of things we're either working on ourselves or know of other groups that are working on it and are kind of collaborating with them most of those things are sort of science based or feature based new capabilities or new options in the code to do new kinds of problems is that the kind of question you're asking or you were more sticking to this performance or optimization mode and talking about new features for accelerators or things like that that would be interesting to work on either or whatever is most interesting to you okay well let me mention the science features first then there's a lot of interest in and maybe I should state something that's a common a common complaint or a common bottleneck I guess for molecular dynamics codes and that is that you're always limited in sort of link scale the number of particles you can simulate as well as time scale and time scale is probably the more interesting or more crucial bottleneck you always seem to want to run your systems longer than you can afford computational time to do the models and so the whole area of course graining your interactions getting away from the atomic scale if you can as you do larger molecules or mixtures of molecules and solvent methods and techniques to allow you to course grain and do your computations faster so you can get out to longer time scales is a common theme in a big area of research and so we've got some efforts trying to let you do for example nanoparticles and solution and this is a collaborative effort with some companies that contribute to this as well and so we've been developing methods that let you course grain your solvent in various interesting ways to focus on the big nanoparticles and course grain the nanoparticle nanoparticle interactions and so we've got some techniques we're developing and hope to release soon in the code for that allowing you to do more interesting kinds of big nanoparticles that are not just big spheres but ellipsoids or asphherical particles that might be gridded up in some way to make an interesting shape and how those shapes interact with each other that's an active area of research for us other methods more tuned for solid state systems where you can accelerate time by some of the methods for example the art voter at Los Alamos has worked on for the last few years these are things like parallel replica dynamics temperature accelerated dynamics requires the ability to find events and barriers in a sophisticated parallel way those are techniques and methods that we're trying to add to the code and try out on different problems let's see I guess with some other those are things we're involved with directly here at Sandia there's some other groups that are doing some interesting things that I think will be added soon to the code there's an electron based force field to allow you to do something more level more akin to a quantum dynamical simulation that some folks at Caltech have been adding to the LAPS that we hope to release soon there's other people that are coupling LAPS to other codes either at the mesoscale or the continuum level to do sort of coupled particle finite element or particle mesoscale Monte Carlo kinds of simulations and so that's one option that LAPS provides kind of to use LAPS as a library that makes it easy to work in tandem with other codes and is letting some people do some multi-scale kinds of problems and so we try to release the hooks and the options that get added to the code to support those kind of calculations in the main version as well so those I guess would be a summary of the science directions we're trying to go something you said there I remember seeing it on the website but I didn't think of that ahead of time so you can actually use LAPS as a library instead of using it as its own application that's right in fact if you look in the code the main.cpp if you build it as a standalone code is just about ten lines of code that instantiate LAPS as a library and hand it an input script and so if you just take that away you build LAPS as a library and then you could have your own driver program that instantiated one or more versions of LAPS like say you wanted to run MD in different regions of a big continuum model and have some overlaying mesh that you built yourself in your own application that let the mesh be fine grained in different parts with particles so you could certainly write an application that called LAPS as a library on top of it to do that that's a really cool feature to be able to kind of like you said hook LAPS some stuff from LAPS into your own that's the option for endless here yeah the other thing that people have done maybe even more than that is because LAPS has this extensibility option you can write something within LAPS that will get invoked each time step or whenever you want to call some external library and so people have sort of used LAPS as a driving program to call other libraries sort of the inverse of what we just talked about as well and so flexible so you can use it in either of those modes now a derivative but related question here is what license is LAPS under? GPL just totally open source code with the caveat that if people use it and modify it and want to distribute it themselves they need to keep it GPL okay so with GPL do you see are there people out there who see that as a barrier to adoption this is something from the industry side of things there are a lot of trepidation some fears founded some fears unfounded about using GPL with code how are people's reactions to GPL? yeah we haven't I guess I'm aware of some of those issues we haven't had any direct feedback on LAPS where some company has said we'd like to use it but GPL is a barrier so I guess from a philosophical point of view the reason we went with GPL in the first place and I'm not really an expert on all the variants of open source licenses but the one thing we wanted to prevent really was a company for example commercializing LAPS or taking LAPS and putting some wrapper around it and selling it as their code that they would make money off of and as I say we were just kind of philosophically opposed to that so if that's what GPL prevents then I guess we're happy we'd like it to just remain an open code that anybody can use for any purpose but not somebody else try to make money off it because we'd prefer it just to be open and people to be able to use it freely so yeah we kind of interrupted you there you got through the science additions you wanted to add to the code but what about some of the system functionality you'd like to add to the code? sure so all of the information things we talked about for example for the GPUs is an ongoing area of interest and we have effort going on in those areas I don't know if we have any brilliant ideas as to things that are going to make big differences one outstanding research challenge and I think sort of all the big especially bio related MD codes face this is how to make this long range Coulomac part with the FFT based solvers work well both on a GPU which means FFTs on the GPU but also in a multi-GPU sense across GPUs so I know that's an active area that different groups are trying out methods and trying to figure out ways to do that better and that may lead to non-FFT based solutions more real space maybe multi-grid or other kinds of methods that might work better in parallel with the presence of the GPU but a related hardware issues it seems like most of the designs for the really high end these petascale, exascale machines are going to achieve those big peak performance numbers through hybrid kinds of nodes and so you'll have nodes that have multiple cores and the GPU accelerator obviously with its own internal high number of cores and so creating an MD code with its kernels that are sort of designed to work well on those hybrid platforms where say you're doing your force calculation or your neighbor list in some way that splits the work evenly across many cores say 16 or 32 cores plus a GPU and sort of is able to work and exploit all of that available compute power in a nice way that's cleanly available to the person writing the program so that those models exist yet but designing your code to take advantage of that is kind of a new challenge that's out on the horizon. So this is some question that I asked to all open source developers what source code repository do you use and why and how do you see that affect your community or not? So we use SVN and it's an internal repository at Sandia and we love SVN it's fine for our perfect I guess for our model of developing and upgrading the code the one Hitchit puts in place for us is really a Sandia institutional thing and that is because it's an internal repository we don't have the ability to give random external people access to that repository even in a read-only sense and so we're actually working with Axl at Temple University who we mentioned a few minutes ago to put a mirror of our SVN repository up hopefully it'll be up in the next month or so so that people who want to sort of keep up to date with lamps and the latest bug fixes new features that are released because we kind of just continually release those incrementally we don't do really major releases every six months we're more in a continuous mode but that would allow other people to keep up to date with their versions of their workouts on that external mirror so let's get some contact information there's a lamps mailing list on the lamps website what's the website address lamps with two M's lamps.sandia.gov okay you can get the code there and find the mailing list and registered mail it's pretty active it's a great source of information I've found I just thought of one thing that's a little anecdote about the name it is an acronym and stands for something that's explained on the website and when we came up with the name 10 or 15 years ago we didn't envision this but it was a great choice because it's a common word lamp or lamps but it's misspelled with two M's and so that makes it very easy for people to find on the web if we'd ended up with the acronym LAMP I think no one would probably be able to find the website by doing a Google search because they'd see household appliances so excellent so we typically ask this of all of our guests here too but what's the weirdest most unanticipated use of your software that you've seen where someone is just kind of using your software and you say wow I never would have thought of that that's a good question I I think I will sidestep that for a second unless I can think of something really unusual but one thing that often happens when we started up the mail list two or three years ago we realized we needed a more open forum for answering questions there is a large number of people who post to the mail list who ask very beginner questions and so we try to be polite there's some of the people who answer on the list aren't as polite as others but we certainly get a lot of people that are like eunuchs questions or linux questions I can't run a program I tried to point and click at the lamps directory on my windows box and nothing happened why is that so we get a lot of sort of crazy newbie questions but I'm trying to think of anybody who's used it for problems that are just totally out of the box I'm not coming up with anything off the top of my head it's certainly been used and adapted in a lot of ways in particular dynamics models that we didn't anticipate when we first started but those are more kind of science questions than bizarre issues ok Steve well thank you very much for your time and this show will be out soon and you can find it at rce-cans.com ok thanks Steve thanks for your interest