 Welcome to another edition of RCE. Again, this is Brock Palin. You can find us online at RCE-cast.com. You can find links to all the blogs and to Twitters and the entire back catalogue. It's also an RSS feed and you can find us in the iTunes library. Once again, I have Jeff Squire from Cisco Systems and one of the authors of OpenMPI. Jeff, thanks a lot for your time. Hey Brock, sure good to be here. Hey, you remember that time when that guy wrote that blog article about that thing that kind of blew up the whole HPC community and whatnot? He was saying that MPI is killing HPC and all that stuff. Do you remember that? You're talking about our friend to the north and some of us wrote some replies to that and had different takes on what was going on. Yeah. Okay, so that's Jonathan Dursey. So what do you know about Jonathan? Well, he seemed to actually raise a bunch of interesting points. I can't say I agreed with everything that he wrote, but he did raise a bunch of good points. I'm kind of thinking that we should have a special edition of RCE and actually talk to this guy different than our normal, you know, talk to an HPC project and stuff like that. Let's talk to someone who's actually using all this HPC stuff and let's go in and in depth about that blog article. What do you think of that? Hi there, Brock. Hi, Jeff. It's Jonathan Dursey. Oh, what a coincidence. We were just talking about you. All right. All the cans set up aside. Jonathan, thanks for joining us today. I wonder if you could give us a quick brown and quick background on yourself and introduce yourself to our listeners. Sure. Thanks. I'd be happy to. So my name is Jonathan Dursey. I blog now a little bit at dursey.ca. And, you know, like a lot of us, I think, in this field, I started off in HPC as a user. I was an astro grad student at the University of Chicago's Ascii Flash Center back in the day, where I had to use and develop large HPC simulations. And like a lot of us, I got more and more interested in the computing. I did a postdoc here in Astro at the University of Toronto and it ended up joining a new HPC center here, Sinet. And so there I had the opportunity to work with a lot of different users doing a lot of different things. I also helped out sort of coordinate talking to people at other HPC centers. And most recently, I'm lucky enough to be working at the Ontario Institute for Cancer Research, where I'm working on cancer genomics, computing and algorithms, and I'm really getting exposed to a lot of different things there. Cool. All right. So it's in the context of all of this stuff that you wrote your most recent series of two or three blog entries or so. And wish we could have done this a little bit earlier, but it took us a week or two to get us all so that we could schedule together for this recording or so. But I don't want to say the, you know, just read off the URL to your blog because it's too long to listen to. But it's definitely in the show notes and the short version of it is durci.ca, d-u-r-s-i.ca. Go to the blog links under there. And so if you're listening, you'll be able to find the specific content that we're talking about on here. But Jonathan, I wonder if you could just give us a summary of what your message is and what are you trying to say? Yeah. So this is the most exciting time to be doing big technical computing that I can remember. And I've been doing this for a little while. So there's more, you know, big numerical computing tools available to us and our users that we can choose from, shape, improve, you know, the never before. And, you know, there's this moment where we in the community can decide what we want HPC programming to be like for the next five or 10 years. There doesn't have to be so much repetitive boilerplate that we're asking researchers to do. We and the researchers can really get to the good parts. We can, and if we choose this right, we can choose to be relevant and helpful not only to our traditional HPC users, but to this growing group of biology of big data users that are out there doing really interesting research. And so when I wrote this post, HPC is dying and MPI is killing it. I was really concerned that we as a community were sort of sleepwalking into this really exciting and rapidly changing time, sort of stuck on default, doing things the way we've always been doing them for the past two decades. So I can't help but agree with that more, but I can't help but feel like, if you remember my reply about that exact topic, and I'm excited about these tools, too, when I've been working with a lot of non-traditional users who have structured data and are kind of moving from spreadsheets to DBMS systems to using things like Spark, is, in my notes, I had legacy and user education, but it's almost not legacy. It's almost like latency. How do we push forward faculty and students being aware of this stuff? Because if you ask them, they don't even know these things exist. You know, I think that's exactly right. And it's a tough problem because even we're just learning about some of these things, right? I think it is true that we, people who work at, say, HPC centers, researchers, and especially grad students and postdocs, do to some extent look to us for guidance on what technologies to choose. They're the ones taking our classes and showing up in our offices with questions. So I think we collectively play a pretty big role in deciding what gets used by nudging researchers in various directions. Of course, you can't force a faculty member to do much of anything. There are people who are using their own tools and that's what they're happy with, and that's great. But people who aren't sure what they want to use, we have a responsibility to guide them towards tools that will work best for their research. And that's not always necessarily the stuff that we have the most experiencing. So nudging things forward, you know, will or changing directions will take some time. And that's why I was really insistent that we start thinking about this now. And this will require us staff to be learning new things and interacting with the broader community, you know, learning how people solve different problems. But, you know, that's why I took these jobs. That's what really excites me. And I just want to say, I really think this podcast actually does a lot of really good work exposing us, the people who are going to work with the researchers to a whole bunch of tools that are out there that a lot of us wouldn't hear of otherwise. Flattery will get you everywhere on this show. Absolutely. Yeah, so I like just to add on to that, I mean, something that I found is that a lot of students, so I work in a university, not a research lab. And I wonder if the experience is different. You can tweet us whether or not you think that's it's a different experience at research labs. But among graduate students, they pretty much do what other people in their lab are doing, which pretty much always trickles back to what their faculty told them to use. And we're finding that they are a lot of times the least educated in terms of what new things are out there. And so when you ask someone, it's like, why are you using Fortran 77? It's like, well, because isn't that all there is? And they just don't know there's anything else. I think a very ill-destructive example is a lot of the math libraries that have been out there for a long time. Blas and LaPak have been around for a couple of decades now. And many times I'm still educating users. They're like, hey, I know it's only a triple-four loop, but you really shouldn't write that yourself. So getting more education out there, I think is really important. It really is. And it's tough because doing that well, this is sort of a bigger policy question. But to really help researchers make use of even the tools that already exist and are venerable like LaPak, as you point out, that takes a lot of boots on the ground, a lot of people working with researchers, so they can take advantage of even what already exists. If we want researchers to be able to take advantage of new things, all the more so, because they're just not going to be able to pick up a book on... I'm going to use Spark as an example. I don't even think Spark necessarily is ready for HPC use. But there's no book, Spark for HPC, or is that a diligent grad student is going to be able to pick up and learn on their own? Yeah, and I have to color my remarks with... I haven't worked directly with users since I left academia nine-ish years or so ago. I worked with some customers and whatnot, but it's not quite the same as just walking down the lab because there's some faculty member or grad student and whatnot. So temper all my stuff, all my answers here with this. But I generally want to agree with you, and it's such a huge problem that there's multiple dimensions to it. There's not just getting the education out there, but getting people to pay attention to the education, right? The pressure back when I was in academia, and I'm guessing it's still the same, is the whole publisher parish. And they are looking for the fastest way to get their results with the minimal amount of education of the junk from their perspective to get their research done. They really want to focus on the science of what they're doing and certainly can't fault that at all. I mean, that is what they're doing. And so the minimal way to get there is to copy what your buddy is doing, take code that your advisor has already written or whatever piecemeal knowledge you can scrape together so that you can focus on the science that you're doing is the way that I saw a lot of people going to it. So we would have seminars and classes all the time about either using MPI directly or using some higher level tools. And they would be sparsely populated. And the reason why inevitably was, oh, I didn't have time. I really wanted to go to that. It looked interesting, but I just didn't have time to get this paper due or we've got a conference coming up or something like that. And it's the trade-off of the short-term immediate goals versus the long-term learning, which will provide more benefits. But it's really hard to see that and sometimes even hard to justify that to funding agencies. Do you guys see the same thing? Is this still true? So we certainly see it in Canadian centers. There's a bit more. Well, I mean, I think we have similar programs to say exceed or at least familiar. And it is tough. And I'd never suggest that if people have a working code and they just need to tweak it, that they rewrite it into something new. That would be madness. Yeah, but I'm sorry to interrupt you. It's not even that, right? They take what was working and then everything looks like a nail after that, right? Because they've got a hammer. And they're like, oh, like Brock said earlier, this is the way that it's done. So therefore, I need to make all of my work fit into whatever that model was that I was given kind of thing. I think there is a window, too, where we held a well. So I was at Sina. Sina held a number of workshops on sort of big learning and machine, big data machine learning tools. And those were absolutely packed to their afters. Because I think grad students do have a pretty good sense that that's something that's very employable right now. So I think there is some interest, maybe not necessarily even for the reasons that we'd hope for, that there is some interest in new things. But ultimately, I think people will pick up on what works. And right now, the stuff that they have around the lab that works is often fairly old. But once people in their labs do start having some successes with different approaches, then that does gather some interest. But it takes a lot of time and effort to make those first inroads. And that's a much harder problem than a technical problem. Yeah, I mean, the approach I've mostly used, given this is my job, is pretty much shock and awe, like go in something that doesn't work at all. And I use Hive or Spark or something like that to just show how quickly I can work with, say, survey data, just massive quantities of survey data for the social scientists or something like that, is really where I finally started to get people's attention. It's something that didn't work at all. And not only can I now do it, I can do it in very reasonable time periods. And that's pretty much the only success. Things had to get so bad that they didn't even work that faculty even started asking around for how could I do something. And that's kind of rough. I think, actually, there's probably more appetite for that in the new and emerging disciplines using big computing than in the more traditional fields. I know, certainly, when I was an astro grad student, I probably would not have been super interested in innovative new approaches. We had something that kind of worked. And by God, that's what we were going to keep doing. I almost wonder if part of the reason a lot of these tools came up was because they were developed almost in completely different communities. And I really hope that our community does not get buried down in the not-built-here sort of idea. But it seems like there's a bit of a split. It seems like the spark in Hadoop is really, and the big data stuff is really being driven by the commercial space and not by the academic space. And it's a completely different set of users working with completely different sets of data and completely different disciplines than we've traditionally worked in. And we really should kind of take one of those big tent approaches because I'm already starting to see things like Spark implementing linear algebra and people doing traditional compute, implementing large graphing, large graph partitioning systems. And each of them are strong at each one of those things. So we should be kind of merging to some degree. Well, I think we in the HPC centers or computational science centers, I think we're used to seeing that in other areas. How often have we connected some user doing chemistry research with someone who actually has a very similar problem but they're in something crazy like forestry or something? We copy these methods. Or even faculty members on the same floor, on the same hall, who don't realize that their neighbor is doing similar problems. So I think we're pretty good at bridging these silos that build up within our institutions. And I think we can do some of that across to the big data sort of communities to sort of unilaterally. Certainly, we don't need to wait for an engraved invitation to start filing issues or pull requests for new features or something that they love hearing is of course success stories of researcher used their thing for something completely unexpected. Well, one thing I want us to not get sidetracked on because part of your articles got a little criticized for saying, well, big data, that's totally different than HPC. And regardless of whether people agree or disagree with that statement, I don't think that was the point for you bringing up big data. You had this interesting graph about the growth of big data compared to the growth of MPI and HPC. And I think that was more your point of saying, here's a at least sort of similar set of technologies and problem space. And why is their community growing way faster than ours? Is that where you were going with that? Not so much saying, we should do big data. And all the things that big data are doing should be done in HPC. Yeah, so I think that's absolutely true. For decades, the HPC community was sort of the repository of expertise on big numerical computing. And at some point, other big computing groups started growing up around cloud computing and internet scale computing. And that was sort of a different set of problems in a way. So we sort of sat that out. Original tools like Hadoop MapReduce weren't necessarily super useful to our users. And so fine, that's a different thing. But now it turns out all those people collecting big data actually want to analyze it. And they start doing things like you mentioned, Brock, like wanting to solve big numerical linear algebra problems or solving PDEs on something that looks awful a lot like an unstructured mesh. And there's all this growth going on in that area on problems that would be very similar to us, but very familiar to us. But we've sort of sat ourselves out on the sidelines and we're not participating in this. And I think that's a shame for a couple reasons. One, we have expertise that can be applied to these problems. But people will go off and solve these problems independently on their own if we choose not to jump in there. And also, these new tools are being built up that eventually some of our users are going to look over there and say, actually, that would be fairly good for my problem now. Performance has gotten good enough. I think I'm going to try using the next generation of Spark or Flink or Moomoo or Plop or whatever the next thing is. And if the big data community, it's the only community actively providing those tools, then that's where our users would go. And they should if that's what's solving their problem. So that's what really motivated that post, this concern that we're intentionally sitting out of a bunch of really interesting problems that we have expertise that can be applied to. I must say that the response to that blog post actually makes me slightly less concerned because I did get a lot of responses that agree that there's a problem and we should participate somehow. I'm still a little concerned because there's not a lot of consensus of what we should do next. But a lot more, many more people are looking at this than I had realized at the time. Well, I just want to take one side note out of your answer there that I need to go register all those domain names like Flup and Flinky and whatnot to make sure that I get in on the next cold rush. Yes, there's definitely a monopoly on good names in the big data community. So if we move along and we think about our traditional users, our traditional scientific users, the people who have always had the really computational intensive rather than data intensive work, what have we learned from looking at some of the things that we see from the internet scale and the big data community that we should probably be implementing? Like what should kind of come after MPI? So that's a great question. And I think this is a question that we all have really, that we all have different perspectives on in this call. I'd be very interested to hear what your answers are. I think the one big thing we've learned by looking around and it's not just the big data community even seeing efforts inside our community like Chapel is that there is an awful big advantage to having our scientists or researchers or application developers building on layers that have stuff done for them, that we can architect tools and platforms that have a lot of batteries included. So we've mentioned linear algebra. It just does not make sense to have users writing linear algebra routines. They end up spending a lot of effort for a worse result by any measure. Same thing with MPI collectives. I'm sure Brock, you've had this, there are users walked in and it turns out they're looping over sends or something. There's stuff that it just does not make sense for a researcher to reimplement. This sort of boilerplate has been done. These primitives have been built and they can be used more effectively than the users write in their own. So I think that's the biggest thing that this idea that having researchers code at the lowest possible level is good for performance isn't true anymore. There's been tons of work on building up layers that can be used more productively by researchers. When it comes right down to it, halo exchange or tree updates aren't that different from linear algebra. People have written them. They're really, really performant and it would be awesome if we could provide more tools so that researchers can call those things instead of writing their own. Yeah, I struggle with that myself because conceptually you're absolutely right. We wanna give somebody the perfect tool so that they can call one line of code and then do the thing that they wanna do. Do their science or whatever problem it is that they're trying to solve. But conceptually the problem, at least on the implementation side, the perception is, well, I'll say I don't wanna generalize this. I'll say my perception and my experience has been that as soon as we build a tool, it makes one user happy or a very small number of users happy. And then the next set of users come along and say, oh, yes, I'm doing something similar to that, but it's a little different, right? And so the big characterization of HPC codes is that there is no silver bullet. And I don't know if that's an accurate representation or not, but there is no one universal solution that fixes everything. And that's why there are so many homegrown HPC applications. Yes, there are some very popular and powerful ISV applications out there that a lot of people use with a lot of success and they just happen to use MPI or some form of parallelization underneath the covers. But for all of those, there's still a bazillion of homegrown applications that people are doing and writing to solve their own unique problems. How do we address that, right? How do we address one of the actual strengths of our field is that there is so much diversity. But by everything that we've been talking about, it's also getting to be a major curse. Jeff, I'm gonna inject here and somewhat disagree with you. So you and I, we were talking to a developer not that long ago about why are a lot of these big data things written in Java? It's like, well, they're companies. They tend to not, they don't care about that last bit of performance because an algorithm or an implementation is only gonna live for a year or two and then be replaced by something. And I see that fitting what happens in the academic space a lot too, at least for the average grad student. They have four years to get something done. They graduate and then that code pretty much dies from there. A very small fraction of them, I would think actually survive for a long time. A lot of the long lived codes we've already had on the show, there's not a whole lot. And in terms of success of making a tool like that, look at the success of MATLAB, look at the success of R, and then look at the not so much success of things like Julia. I don't know what makes it work, but there's definitely examples of higher level environments that people really gravitate to because I find it easier to get some work done. So I think this is a really important discussion to have too that both of you have brought up that the median MPI code is not hugely well financed big community code that has had tons of optimization focus on, that sort of the median scientific programming project is something that's put together for a few papers. And sometimes one of those tools will be so successful that it grows into something bigger, but to do that growing, it ends up having to be substantially rewritten anyway. So I think we in HPC world tend to focus on the codes we see most often, which are grow max lamps, any of a number of things that have had a lot of developer focus on it, but the median scientific developers is probably just trying to get something done so they can get four papers out. And it would be nice if we had tools, and maybe they have to be different tools, but tools that addressed both of those sets of needs. And it's tough because we have been, in our community, been horribly burnt by tools that promised much more than they actually could deliver. We're lucky now, I guess, that people have stopped trying to sell us automatic paralyzing compilers, but there were an awful lot of, this will fix everything, projects proposed and they didn't fix everything. Is the answer more focused tools, domain specific languages? Is the answer higher level tools where you can delve down into the guts if you need to change something? Maybe that would work. And I think there's a lot to be said for that approach. I mean, sometimes a tool builder say it does need to go into the lower levels. We've all had situations where we've had to rewrite a loop to tile it for cash performance or explicitly vectorize it for something. But we don't start writing code like that, or else we'd never get anything done. We use high level stuff and then we delve into the low level only if there's evidence that we need to. But it's tough. There's not even one best serial programming language for scientific computing. Maybe there isn't one best thing for a parallel scientific computing. Yeah, you raise a lot of good points here too. I guess I kind of see those as the extremes. We've got these upper level general languages that are easy, but not necessarily the best performing because they're designed for general purpose. And by definition, you're trying to do something very specific. But then there's the domain specific ones as well that are super focused and super optimized for one particular set of applications or problems or whatever it is that they're trying to solve. But then trying to distill the wheat from the chaff of all of these choices becomes a super daunting problem at super computing. Every year, we see a whole range of both products and research projects that just like you said, claim to solve all the world's problems. And I'm sure I'm at least somewhat guilty of that as well because everybody's looking from their particular foxhole for their particular problem saying this is gonna do it. This is gonna solve everything. This is gonna make users' lives better and birds are gonna sing and children are gonna be laughing and all these kinds of things. And some of them, that is true. Most of them, it's not. You're trying to publish a paper or sell some product or whatever it is you're trying to do. So trying to distill what are the ones that I should pay attention to gets into the problems that we were talking back in the beginning here about the education, like what actually works? What's going to help this particular user get their problem done? And so is that part of the problem that there's really so much new that it's hard to know. And so people gravitate towards a lowest common denominator for example, in this case, oh, I'm gonna write to MPI because I can't figure out some of these new fangled tools or know which one's any good for me to use. What do you guys think of that as a supposition? I think there's some truth to that. I do wanna say that's a, as far as problems go that's a pretty high class problem to have that there's so much cool stuff out there that it's not necessarily obvious what to do. There are already things that we can point users to in pretty clear conscience, right? Someone who's writing say Fortran or MATLAB or something, we can be pretty confident in pointing them to things like co-array Fortran. And that is hard coded in the Fortran standard now. It will outlive all of us. And for smaller projects, there's a lot of things that I think are still pretty tractable. One thing would just be not asking them to write raw MPI and using libraries that extremely well tested like Trilinos or the like, there are lots of things we can start pointing them to that is somewhere between that isn't all the way to crazy town, Hadoop type things, unless that's actually what they need for their problem. So- There's the quote of the podcast right there, crazy town Hadoop things. All right, so let me play right off of this because this was also one of the criticisms for your blog entry was like, well, no, lots of people, there are some fantastic MPI based libraries out there where people don't have to know MPI, they just call some magic function, parallelization happens underneath the covers, they don't even know that it happens and so on. And one of your retorts to that was really struck a resonant note with me saying, yes, I agree with you, there's a lot of fantastic MPI libraries out there, but the leakage of MPI abstractions is way too strong and it actually, oh, I don't even remember the word you used, but it's, you know, the essence was that it's harmful, that it's not, the users have to know more MPI than they really should. Am I characterizing your argument properly? So I think there's two things there. I think MPI, the sort of the standard, the whole model, I think MPI makes that inevitable and that I see as a problem, but that's not, that's a shame, but that doesn't mean that users shouldn't use these things. I think that if we move to something else, maybe more than one other thing, I think new libraries, new frameworks would have less of that, maybe. But having said that, you know, any of these extremely well-built, well-tested libraries is a huge productivity win for our researchers over having them reinvent that particular wheel. Yeah, so I think I actually agree with most of your points there that, yeah, I don't need to restate all of them. I agree with most of them there, but I also agree with there are lots of quality MPI libraries out there that do take a lot of the burden off. But here's my question that, you know, as a member of the MPI forum, trying to shape the next generation of MPI, trying to actually stir up exactly this discussion in the MPI forum itself, what should we do? Like if we had a blue sky today and we could make, you know, MPI next generation, MPI 20, right, that looked sort of like what we have today, so it's not completely unfamiliar. But what are some of the mistakes? And one of the things that guarantee leakage out of these otherwise wonderful libraries of MPI abstractions, what would you like to see fixed? Well, so, you know, MPI has been enormously successful over the last two decades. And it has this huge installed base that it has a lot of responsibility to. And so that's part of the natural life cycle of a software project. So I kind of view MPI as being what it is. I'd be surprised if it made big changes at this point. And in fact, I think it would be unreasonable of us to ask it to. But, yeah, so, but, you know, maybe not using the word MPI, maybe just, you know, what do we want HPC programming to look like? What, you know, what would be the ideal stack if we were starting from scratch? I'm not sure that we know yet, but I do think that there's a lot of really interesting projects that we can, you know, begin supporting and see how they grow and succeed and, you know, start pushing in one direction or another. I think it would probably be a huge mistake to start a new standards body at this point while things are just congealing. But there's a lot of things we can learn from. So, you know, Chapel is something that I mentioned. Chapel, I think, is really interesting because it separates the data decomposition from the operations on the data. And, you know, one is implemented in the domains and one is implemented in the application code and you can write your own domains if you want. I think that's really interesting. And that stops some of that leakage or at least means that if you do realize that things aren't decomposed the way you'd want, it at least offers the hope that you can change the decomposition without blowing up all your application code. I think this idea of, say, Spark, having a very particular abstraction, a resilient distributed data set, basically a big data table and having a bunch of operations on that, I think that works really well for a lot of applications. And you see things that aren't necessarily obvious built on top of that, like graph libraries. So I think this idea of, you know, multiple levels at which you can interact with the tools, I think that's really powerful and it'll take us a little while to decide what is going to be the best for most of our users, but that doesn't necessarily mean we have to hold off from playing with some of these things. There's, you know, small short-term scientific software projects of which there's tons out there are wonderful opportunities to adopt some of these new technologies, try them. In many of these tools, you can put something together that works very quickly and judge its success. And many of these short-term projects, they don't necessarily have to have, you know, blazing fast performance, but getting something coded quickly and working can be a huge win more than, you know, more than, you know, hitting 80% optimization, you know, 80% utilization on the CPU. So something though about your point about MPI leaking through for myself, that's actually kind of been a feature because what that's always been is the underlying structure kind of leaks through. An example I had recently was with Spark and I had a person who wanted to invert a 800 gigabyte matrix. That's all they wanted to do, just invert it. And if you Google for Spark invert, you'll find something, but it turns out it's just for a local CPU. At that point, because the communication that Spark uses is not exposed to the end user, I pretty much was like stuck. I was to a point where it was the equivalent of having to modify MPI almost. Like I'd have to dig inside the core library. And it felt very dirty and I felt very much at the mercy of the programmer. Now I'm very new to Spark. I may be completely wrong on all this stuff. But it feels like if you don't have some of that exposed, you're then, if you can't make a deal with the constraint that was put around you, this is the ease of use flexibility trade-off. That you're kind of at the mercy of the developer then. You have to almost wait for them to put something in place. Or you have to become a very, very sophisticated developer yourself. And I could see arguments that wouldn't be that great. Now the nice thing about Spark and some of these other things is they run on top of a generic infrastructure and yarn containers. And you don't have to, you can run Spark beside a bunch of other things. And hopefully in the end, we can run Spark and MapReduce and HDFS and POSIX and MPI and Serial all along each other in the same flexible environment. Because that will actually serve our users the best, being able to support whatever it is that they are comfortable using. Telling a user that they have to use something is not necessarily the best thing. Yeah, I don't know if that was the full point though because being able to dive into the guts of something when you want to or when you need to is certainly a wonderful thing, right? One way to do that is just, hey, open source, just go look at the code yourself. That's not a great answer from an abstraction point of view, but it is a answer that gets used. And it certainly would be better if the abstraction itself just supported it saying, oh, you could completely ignore MPI and do your thing. Or if you want to dive a little deeper, we provide some hooks to give you access to the MPI underneath or whatever. But I think the point is being forced to adhere to the underlying abstractions that you shouldn't otherwise have to care about. That's what you want to potentially avoid, right? For the total newbie user, they don't want to have to know anything about MPI, they just want to call one line of code and it does some magic for them and they have their answer. That's kind of the gold standard. I don't know if we can ever get there for every single problem in the world, but that's what you want. And then for the more advanced user, when they say, ah, okay, that one line of code is not doing exactly what I want. I want to tweak it. I want to do a little things. You have some hooks to go deeper. And that's where I think some of these more modern architectures are going. That, you know, not necessarily just an HPC world, but other types of applications, you know, mobile-based applications, web-based applications. You look at some of the frameworks that are out there for PHP and Python and Perl and things like that. And they are multi-layer beasts where you can get hooks inside when you need them or otherwise you use the top-level services and take what they give you, right? Yeah, I mean, I think that I agree with both of you. A too rigid high-level abstraction or not even too rigid, too opaque is no better for users on average than one that's super low-level. It's just the problem, the problems are different. And, you know, ideally though, you would, ideally though, you would minimize the number of times you have to dig deep into the lower layers to change something. So, you know, Spark for better or worse is growing distributed linear algebra and it will have that at some point. And, you know, then it will be a fixed problem for everyone. So, you know, ideally that would be the way things go. Of course, that doesn't help your particular user now. Spark's still a very new, I won't say immature, but it doesn't have the features that we and our users necessarily want. And that can be frustrating. That's something that people have to seriously think about when they're considering these things for a new project. Does this have the library support that I know I can get elsewhere? So, I wonder if maybe what the most useful thing is, is to rally the community around maybe something like Chapel and also pair with a comprehensive library of libraries. Think of like CPAN or CRAN for Perlin R respectively. Those communities have built up a standard practice of if you need to do something, search for it. And then when you need it, you run, you know, package.install and your MPI library already being set up or whatever the, however low of an abstract you want to go, things can come out. And I think the funding agencies would be interested in this because it should reduce deployment time. And by deployment time, I mean, like grad student productivity to like real science rather than starting with int main or begin program. But also, you know, there's a lot of talk about, you know, code reuse and not recreating the wheel over and over. Well, how about you create a wheel and you publish it in a standard place where it's easy to get at and make an entire ecosystem for the entire community. So I just got this off top of my head. So I'm could be completely smoking something. If we can build or choose a set of tools that support that, that that's flexible enough that it can, you know, load in different modules. I mean, you mentioned to CBAN and CRAN that, you know, have enormously benefited those two communities. And I think our, you know, without question a big part of why those two platforms succeed because there's this community standard that if you build something useful you really should contribute it and the tools make it incredibly easy to do so. So much so that actually discovery becomes a little bit of a problem which again is a pretty high-class problem to have. Okay, so this has all been really good and all but I kind of want to end on one thing that MPI has been tremendously successful. It did bring about and kind of create a standard way of creating parallel applications when it used to be every vendor system was different. And that is almost like the first step in what we've been talking about here. And it's really the case that we need to kind of push forward and continue trying to make all this stuff better. So let's not discount the success MPI has had in taking us that first step. Oh, I think you're completely right. It was a huge, it was a huge win. It was vastly better than anything else that was available at the time. And it's taken us 20 years. And I think now we have the opportunity to decide for ourselves as a community what do we want the next 20 years to look like. Yeah, good. Yeah, this is a great point. Let's end on being positive because there is still a lot to be thankful for and be happy about that our community has been able to achieve. But I think the point is valid and the point is very important that we do need to keep looking for the future and not get stuck in the status quo of it's good enough, right? So it's always been a challenge for us on the MPI forum to find out exactly what the users want. We get an amazing lack of reply when we ask, say, hey, what do users want? And it's a real challenge. And that might be part of the disconnect. We need to know because the members of the MPI forum are not just focused on MPI. All of us touch many different things in the HPC community and work in many next generation types of projects. So maybe MPI will continue to be a substrate or maybe just the technologies that implement MPI will continue to be the substrate. What is the next set of things that we need to be working on? What are the features that users need to be able to have in their applications and be more productive and things like that? These are the types of things that we as implementers of tools, libraries, MPI implementations, and in the MPI forum itself, we really wanna hear from end users and people who have boots on the ground trying to get stuff to run day after day. Yeah, I agree. I'll just add, and not just users, but the tool builders, the library builders, what do they need to support those same researchers? Jonathan, thank you very much for your time. It was great being here after having listened for so long. Thanks both of you. Well, you are very kind, sir. We appreciate your time. And I really appreciate you writing all these series of blog articles. I think this is a fantastic conversation. I feel like we could keep talking for another hour, at least, but we do need to wrap it up because people only have so much time to listen to a podcast. But thank you very much for your time, sir. And you guys. It was Jonathan Dursey, you can find him at dursey.ca.