 This is Brock Palin from RCE, and I have Jeff Squires from Cisco. Greetings! We have with us two developers from the Devisit Visualization Project. Sean, Jeremy, you guys there? Yep, yeah. Hey there, Brock. Hey, Jeff. Thanks for taking some time out with us. Quickly, can you guys give us your affiliation and how you're involved with this project? Sure. So, this is Sean. I'm from Oak Ridge National Laboratory. I'm the visualization lead there, and I was one of the original designers of the visit system, and I'm still one of the developers of it. And I'm Jeremy, and I'm also at Oak Ridge. And I used to come from Livermore where, again, I was also one of the original developers on the project, and I'm still involved with it in a few ways and a few different projects as well. Well, guys, thanks for taking your time to talk with us today. I wonder if you could give us the short version. What is visit? Yeah, so visit is an interesting animal. It's a turnkey visualization tool. It's a tool that's designed for people to be able to get understanding out of scientific data sets. And it can be pretty pictures. It can be statistical analyses and graphs. It can be pretty much anything you want if you're trying to get better understanding of the results, generally, of a high-performance computing simulation. That's the 10-second version of what visit does, but obviously the answer can be much more complex. So what would one of your users do? So they do a big parallel run, and they get a couple terabytes of data, and then they use visit to visualize it? Sure, I can maybe go through a simple conceptual workflow here. So you've definitely got the right starter, right? Typically, a scientist will write a whole bunch of data out to disk. So the first thing that you need in your sort of workflow pipeline here is reading the data off of disk. So we have about a hundred different file format readers right now. I can certainly talk about those in more detail, but there's some that are ASCII-based, some that are binary, HDF5 and net-CDF, things like that. A lot of them support parallel IO. And then after you've written the data, one thing you might want to do with it is create new expression, like you might want to calculate density from volume and mass and drive new quantities. We have dozens of different ways of operating on your data, capabilities for slicing, subsetting, transforming your data, and then typically you go through and you plot the data to your screen. So we have about 20 to 30 different plots for... These include ways of rendering your data, whether it's mapping values to colors or doing volumetric rendering. And other than plotting new data, you might want to do quantitative analysis, whether it's queries or other ways of extracting data, maybe scientific-based queries, you know, calculating centroid of some object, maybe feature extraction kinds of things. And you might want to save out the results. Obviously we can save as images as you expect from a visualization tool, but you can use this as sort of a larger tool in a processing pipeline. You can save data files out as, for instance, importing into ray tracers or out to actual scientific data files for importing to other tools. And of course, the entire time you're interacting with Visit, you do it through either a graphical interface or a Python interface or a Java interface. So we have a few different fun interfaces you can swap out and use in the same way. Oh, that sounds like a lot of code there. So is this a very large project? Are there a lot of developers on it? Yeah, there are a good number of developers. I don't know that we have an exact number right now. We were taking a look at the code repository and trying to figure out who has done major development on it in the last six months or so. Jeremy, I think you had put together a list. Do you have a count of how many people have done it? Yeah, sure. So, I mean, this list of names just came from actual commits to the repository, but we have more developers than this. But I can just give you the brief list. We have Shawnee Hearn and myself, Jeremy Meredith from Oak Ridge as well as Dave Pugmeyer here. Lawrence Livermore National Laboratory. We have certainly some of the original developers still there as well. We have Kathleen Vannell, Dave Bremmer, Eric Brueger, Hank Childs, Cyrus Harrison, Mark Miller and Brad Whitlock. All those folks are at Lawrence Livermore National Lab. At University of Utah in the Scientific Computing and Imaging Institute, I believe are Alan Sanderson and Tom Fogel and Gunther Weber from Lawrence Berkeley Lab. So those are the people who sort of, maybe not full-time, but something approximating full-time committing to the project. And certainly the project started back in 2000 with about half a dozen full-time developers. So overall, yeah, it's a very large project. A whole lot of man years going into it. And it's now actually gotten even international. We've got a couple of developers in Britain who are doing, who are contributing. We've got someone in France who is working on helping doing internationalization. It really has exploded in the last couple years in terms of contributions. Yeah, and University of California, Davis. And I mean, it's certainly got people at half a dozen or a dozen or even more universities, whether they're contributing just smaller code chunks and so forth. We do have, when you install this, there's a contributors file that lists a little more comprehensively. Gotcha. Okay. Well, let me digress from that and go back into the what is visit kind of category here. Is visit itself a parallel application? Because it sounds like to operate on super huge datasets, it itself might benefit from going parallel. So yes, it is, but I think it's important to understand somewhat of what the architecture is with visit. Before we talk about that piece, visit is actually a number of different components which can all talk to each other. And certain parts of it are parallel and certain parts are not. The basic concept of visit is a client server architecture where the user will run a front end client, which is a GUI interface generally, that takes advantage of their local graphics cards. And then the actual data processing happens out on a server, what we call the engine. And generally, we have the engine sitting close to wherever the data happens to be. It might be around their local workstation, but it might be on a large parallel supercomputer somewhere or a visualization cluster. And that engine piece of it where the data IO happens, where the analysis algorithms run, that one has been parallel, I think, since the inception of visit. We knew that we needed to be able to go to large scale parallelism. And so that one, actually, we've gone to thousands of processors. I think the largest one we've done has been over 8,000 processors. But the user is not stuck with doing that. It also allows you to go all the way down to very small datasets that are just on someone's local workstation or on their laptop. And so it can scale up and down depending on what the data need and what the user need is. So one of the things in keeping with the theme of a turnkey application to make things easy for users, we try to hide a lot of the complexity of launching multiple components as well. So for instance, the first time somebody installs on say some big machine at NERSC, they'd set up this host profile, which includes some information, not just about the machine names, but what kind of job launcher and what kind of MPI it's using. That way, when the user wants to connect, even if they're starting to visit on their little local laptop, when they go to open a file, they can actually just punch in the host's name with that big machine. It will go out, connect to that remote machine, let you browse the remote file system, open files. And at some point, if you need to launch a parallel engine to do the analysis on your data there, it hopefully tries to ask you just a bare minimum of questions. How many processors, how many nodes, which bank do you want to use, and then it will understand the batch system, launch your job, and track all of that for you. So we definitely hide all that as much as possible to make it just a few mouse clicks. And all the interaction with remote data should feel exactly the same as interacting with local data. So you already have all the information there for the batch system, so they don't have to create a PBS file or an SGE input or any of that stuff. That's all done by the visit itself? Yes, exactly. So basically the way we do it is visit already has knowledge in its internal scripts about a plethora of different launch systems, you know, MPI run, Moab, Slurm, you name it. As well as you've got batch schedulers, as well as job launchers underneath them. And visit will create the PBS script for you or will create the MPI run machine file, whatever's necessary for communicating to that job control system, so that the user doesn't have to deal with it. Many times the user doesn't care. They really just want to say, I want to view my data which is on machine XYZ and I need 256 processors. Go do whatever you need to do to go launch over there and connect back. And visit handles all that for you. I don't have to have all these things set up for running out on my local system. I don't have to have MPI and all that to just run visit on my laptop if I want to run the engine and the viewer all running locally. That's correct. Yes, you're exactly correct. If you don't have the complications of a job control system, you don't need to think about it. Visit already knows how to talk directly to local host. The only time you need to think about any of that is if you want to attach to a remote resource. And even then, even if you're attaching to a remote resource, it doesn't necessarily have to be a parallel resource. So it's actually very easy to set up a host profile for a remote, excuse me. It's very easy to set up a host profile for a remote resource and you just say it's not parallel and it just uses SSH to connect. You then could say, oh no, wait, no, this is parallel. Check the parallel box and then you describe parallel information. That's what goes into a host profile. But it's not necessary if you don't need that level of complexity. So I'm a little confused here. You have the engine running on a cluster, which is going through the batch system. How do I see my plots and how do I make it interactive? There's the viewer part I see on my local laptop. Does that connect to the cluster somehow? Jeremy, I think you're probably best to... Yeah, sure. So let's talk about a couple different pieces here, right? So the main sort of central repository of all visits state where it stores what plots you have open, this is called the viewer. And that's what runs typically on your local workstation. So first of all, just to one aside very briefly here, there's no reason you couldn't SSH out to the remote big cluster and run all visit there. You would run the graphical interface there and the viewer there and you would use something like X11 to connect back. But that's typically not the best way to do it because X is so sensitive to latency. So what we've done is we've moved the things that require graphics to the laptop and put the engine which requires parallelism on the remote side. So visit can be run either way. It's just typically a better idea to separate the two because you're making the best use of the hardware and you're not relying so much on a low latency connection because we're just shifting either data or images over the network. So what happens is when you launch visit, typically the front end that you've chosen, whether it's the Python interface, the graphical interface, launches along with the viewer, and that opens up a window on your laptop or your workstation and then interacts with the graphics card. And that viewer is the one which initiates an SSH connection. So if you then go to open a file locally, it just launches an engine locally, whether in parallel or just a serial version. But if you select a remote host name, then yes, it will initiate an SSH connection, find the right directory for visit, launch the remote visit on the remote machine, and then that connects back. And in the cases where you're behind firewalls or don't have access to open up ports, we do have more recently we've added an SSH tunneling mode that should forward everything through SSH and make it hopefully still completely transparent without having to go through a lot of rigmarole. So if I can follow up on that, the piece of visit which does all the data processing, the engine is not the piece, and it never expects to be the piece that displays the plot. So Brock, your question was, how do I see the plots that I'm asking for? The engine knows that it must communicate with the client piece, what we call the viewer, to be able to display that. So the engine, after it generates geometry or after it generates imagery, it sends those across the network back to the user on their local workstation, and then the viewer then displays that information. So it works in concert, the viewer engine work in concert with each other to be able to make those plots visible to the user. So if you have a small dataset, one thing we could just mention quickly is whether you're doing rendering locally or remotely. If you have a reasonably sized dataset or something that you've reduced down to maybe a slice through a gigantic dataset, or maybe if you're on a high bandwidth or a latency connection, what you might want to do is send all the geometry down and have it rendering using your local graphics card. But we don't have to do it that way. We have the ability to switch over to a parallel rendering mode where it actually does all the rendering in parallel on the remote machine and just sends you images. That's better if your dataset, if you're talking dozens of billions of cells or you're on a connection that happens to be low latency and it's faster to render images in parallel or remote and just send images to your viewer on your local machine. A follow-up question to this here. Do you guys have any GPU kind of aware of code? So if you detect that you've got some kind of graphics coprocessor, how do you, A, take advantage of that and B, how do you know when it's better to do it locally versus remote, which may be parallel and big horsepower back there? Yeah, so there's sort of two questions there. Let me first answer the GPU one. Visit does use GPUs in a couple of different ways. First of all, we can do parallel rendering out on the back end on the engine using GPUs. So we've got a mode where we can, say, use hardware resources. If we don't have that, we use Mesa, so we're doing software rendering, but we can take advantage of graphics cards if the cluster happens to have them. Now, that's purely for rendering. We need GPGPU type of stuff at this point. So that's one GPU exploitation. The other one we do is on the front end, where you have a, with the user's graphics card on their own local workstation, we've got a number of ways that we're exploiting that. Probably the best one is we've got a GPU-enabled volume renderer that was incorporated from the University of Utah called Sliver, which uses fragment programs to be able to do its volume rendering. So that's kind of cool that was actually put in in the last year or so. So that's the GPU exploitation. The second one had to do with how does Visit make a decision as to what to do? Should I do remote rendering? Should I do local rendering? We've got a heuristic built into Visit which measures the geometry load. So it's not tremendously sophisticated. We're not measuring network latencies and things like that. What we do is we look at the engine. We determine how big the ultimate rendering load is. If it's over a number of million polygons, let's say the number is 4 million polygons. If it's above that number, let's use parallel image renderers. Let's use what we call the parallel rendering mode and send images back to the client. If it's below 4 million, instead, let's send the geometry. And that heuristic is tunable. So the user at any time can say, you know what, I've got a pretty powerful graphics card and my network is really good. I'd like you to send graphics. I'd like you to send geometry more often. So I'll crank up that threshold to 10 million and they can pull it up and down as they like. So you can force it to be one or the other? Of course. With my obvious network bias here, let me ask something that's a fairly obvious question to me. Do you take the network capacity into account automatically? So you can say, oh, this is a 10 gigabit link or this is a 20 gigabit link. So the amount of data that I've got to push back and forth, does that influence the decision of remote versus local or is that a user-tunable, user-decidable kind of thing? So right now, we do not take that into account. This is a user-tunable thing. But there are some research efforts that we've got going on to be able to think of the network more as a first-class entity. But none of that is in production or even close to production right now. Those are just sort of some research questions that we're looking at right now. Okay, so you're saying there's a cutoff threshold for a windy geometry or the problems over a certain size. You do the rendering on the cluster and then you send images to the client. How big have these things gotten? Exactly how large of a system have you actually rendered a visit before? So there's a number of ways you can measure size. We can think of data set size. We can think of how many mesh cells that we've ever processed. And that one is in the tens of billions of cells. There's one that I can think of very specifically that was 27 billion. But that was a couple of years ago. I think we've crossed that now. I haven't heard of any that's out to 100 billion or so, but I think the largest is somewhere around 50 to 60 billion cells. Let's see. We can look at image size. I think we've rendered out to images that are, I think about 16,000 squared. I think that's where we've gone. But you can also think about data set size. Jeremy was just recently doing some data sets that were fairly large. Yeah, actually. Well, so there's one that wasn't me, but I was talking to one of the other developers who was mentioning something. I believe it was hundreds of terabytes. And partly that was because there were a lot of time steps. But processing the entire data from beginning to end, it'll touch literally hundreds of terabytes on disk. Now you don't actually read all 100 terabytes into memory, right? The readers are a little bit more intelligent than that? Right, yeah, exactly. So a lot of times when people talk about the size of the output of some simulation, and maybe this is more from the scientific computing side, they talk about its total space on disk. But usually that's split over a number of time steps. So that's why when we talk about size, we're usually talking about the amount from a single time step. So yes, usually we process just a single time step at once. So the actual readers, those are plugins or do those have to be compiled into visit itself? No, they're definitely all plugins. So I sort of mentioned maybe at the beginning we have about 100 different file format readers. Some of those are based on ASCII text unfortunately because ASCII text doesn't paralyze well. So if you're running on a thousand processors, that's probably not what you want to base your I-O on. But nevertheless, some code's right out ASCII, so we do have to support that. And some readers are based on HDF-5 and net CDF and things that support hyperslab I-O and other things that scale a bit better. But those are definitely... There are scientific I-O libraries like Silo or Exodus or even things from molecular data like Protein Data Bank. So some of these are specific to codes or infrastructures like S3D or ANSYS or LAMPs. Some of them are modeling formats like STL or Wavefront and of course, general purpose ones. We can read JPEGs and PINGS. And we have a plain text reader which supports things like whitespace and comma-delimited files and various flavors. So you could export a spreadsheet of numbers from Excel and read that into visit using the plain text reader which says how many columns you have and how many lines to skip at the beginning. So we have a whole bunch of different readers. They're all plugins. So you can write these plugins without having to use... without having to download the entire visit source code and so forth. Obviously if you're using a library that, you know, some scientific I-O library that we don't already support, you'll have to link that in against your plugin but there's no reason that visit can't load that afterwards. And so I can maybe give a few seconds about the typical plugin development process. We have a tool which will create a small little XML file for you. And it creates things like, you know, the name of your plugin and give it a version number and four things like databases. There might be some options you want to specify when you're opening the file, you know, whether to treat the third variable as the z-coordinate or not. And so you specify a few of these things and then we have a few scripts which go through and generate the spoiler plate code. And most of the code you don't need to edit. It's just things that provide information to visit, so it just generates that and it'll get compiled in. And there's about one file that's set up with template sort of boilerplate code for you as well, and you go in and fill the functions. So for something like a file format reader plugin, about the only thing you have to do is go in and there's one that returns a list of the variables in the file and there's one which will return, let's say, a VTK object when you ask for some mesh or a data array when you ask for a variable. And that by itself is enough to get a simple reader working. And every file format reader we have is a plugin. So, you know, there's nothing you can't do in a plugin yourself that we can't do. Follow-up question for you on this here. So, you know, with all these different plugins and whatnot, you know, it becomes a software engineering issue almost of, you know, who maintains them and who develops them over time, who adds new features and things like that. As Brock mentioned in the beginning of this here, I work on a different open source project, an MPI project, and we're based on plugins as well. And we kind of have this general rule of thumb that he or she who cares will implement and maintain. Is that kind of the same rule that you have followed here or have you developed all of these plugins to address a, you know, a specific need? Or is it, you know, a group comes in and joins the project and writes the plugins, you know, so that they can have visit support their data? Or, you know, how do you generally do that? I think it's probably all of the above. There are, you know, funding groups which are interested in our maintaining readers for specific things. We've got some where whoever's interested can maintain it. We have other ones that actually have languished and, you know, we haven't poked that for a while because, you know, people haven't been, people haven't been that interested in it. And then we have other ones where we have external groups who maintain their own readers and may not even talk to us about it. They, you know, given that it's a plugin model, they can have their own plugins that aren't necessarily part of the source that we distribute with visit. So it's, it's everything and all of the above. Excellent. It's the chaotic open source model. Exactly right. So we should mention also that these plugins, so, so far Jeremy's talked about the plugins for reading and data, but actually all of visit is a plugin model. So every plot that we have, every way that you can put anything on the screen, that's a plugin as well. As well as all of the data operations are just about all of the data options. So slicing and reflecting and projecting and all of those other things, those are plugins just as well. So it's very easy for people to add new capability. And those all use the same template generation code that Jeremy mentioned. So it's a fairly powerful model for extensibility. And, you know, it's kind of like, you know, the phrase about eating your own dog food. Right. If we sort of forced ourselves to do everything we do in terms of plots and operators and file I.O. in terms of plugins, then it's the best way to ensure that anybody else out there can do whatever they need also through a plugin. Absolutely. Yeah. We very much the same thing in the project that I work in as well here. So kind of along the same theme there. So, you know, we talked about the, you know, how you attract developers for all those very different reasons. Who uses Visit? And I know that's kind of a difficult question to answer because it's open source. And so it's not like you can look at who's bought a license to say exactly who uses it. You can only track who downloads it. But are there public users or do you go to a presentation and say, you know, look at somebody's slide and say, oh, that plot was generated with Visit. You know, how many people use this stuff? So it's really hard to count as you mentioned. We don't have a license that we can count. We want to take a look at downloads and look at what countries that come from. But we can also look at things like we have open mailing lists that people ask questions of. And we can track where these different questions have come from. And it really comes from all over the place. We're getting questions from Japan and Brazil and Korea and Canada. And it's all over the place. But generally, Visit is used by people who do, who look at data from computing of some type. Maybe it's small scale. Maybe it's large scale. But yeah, there have been times where we look at a movie or we look at an image and go, hey, wow, look at that. That legend is our legend. That's neat. We've never even heard of these people before. We've had application areas we had never really thought of for using climate simulation, geographic information systems. There are, it has gone a lot farther than we had ever really expected. But I will say our core set of users are people who are doing simulations. It certainly can be used for other things. Medical visualization can be used for, but we didn't design it that way. But it certainly could. Very cool. So we have found in open MPI that the largest country where all of our questions come from on the public mailing list is a country called Gmail. And so it's actually very difficult. It's very difficult to know actually who these people really are. Yeah, that's very true. So at that point you count downloads, right? Yes. All right, let me ask on a different text here. How exactly did this project come about? So you've talked about the various needs that it addresses and whatnot. But at some point, I think you said the project started around 2000 or so. There must have been a critical mass where somebody said, all right, let's do something about this and create some software to address this need. Can you give us a little of the backstory? Yeah, I can give a little bit of that history. So visit came out of Lawrence Livermore National Laboratory. And the large proportion of the developers are still there. And actually came out of the ASCII program, out of the effort to bring high-performance computing to the modeling of nuclear weapons designs. The ASCII program really was in the forefront of high-performance computing. And before a lot of other people needed it, they had a need for doing large-scale parallel data analysis and visualization. And there weren't a lot of tools out there. Livermore, we had a tool called Mesh TV that had evolved over the years to do a lot of things and had done some amount of parallel visualization. But it was clear at some point around the end of the 90s that Mesh TV was being pushed in ways that it really wasn't designed to. And we needed a new architecture. We needed a new system to allow us to deploy visualization analysis algorithms that Mesh TV simply couldn't do. And so there was a... We had a new hire who came on, a gentleman by the name of Hank Childs, who started working on a prototype of exploring the VTK data model, a visualization toolkit. He's looking at new models for parallel pipeline visualization deployment, looking at a new type of client server model. And that work that he did has become visit. It has supplanted Mesh TV a long time ago and has been what we've built upon ever since. And obviously it's gone through evolution over the last, what, 10 years or so. But that's the history of it. So it really did come out of the weapons program. But it has since escaped the nuclear weapons lab. So as we've mentioned, the generality that we've built into the system seems to apply across all sorts of different application domains. So to some extent, I guess it's not surprising that it's now being used for a lot of unclassified, for open science work, that it was never really designed to do from the get-go. But the generality certainly allows that. So that's sort of the history of where it came from. So you mentioned VTK in there. How much has VTK influenced the design of Visit and how much of the work is actually done in VTK? So VTK is the visualization toolkit that comes out of Kitware. It's sort of the uber language, the uber library for designing visualization tools. It forms the underpinnings of Visit. But at the time we designed Visit, there were a lot of pieces that VTK didn't have. And a lot of them, it still doesn't have. The biggest thing that it didn't have at the time was any type of parallelization model. Now, since then, VTK has had MPI-based parallelism added to it. But at the time we designed Visit, there was nothing there. So we had to build all of that on top of it. So there's an entire infrastructure for doing data decomposition and communication that's entirely separate from VTK that's inside Visit. But the other thing that we've built upon is extended VTK's data model. VTK is very powerful, but the data that you can represent in VTK is somewhat limited. For example, it's not very good for representing molecular data sets and understanding data on the atoms versus the bonds. It's not very good for doing mixed material models if you had, for example, an arbitrary Lagrangian or Larian mesh where you've got material affecting through a mesh. It doesn't really describe that very well. There are other parts of the data model where we had to build something above VTK. We use VTK as an underlying storage layer and mesh representation layer. But we build all sorts of other data model on top of VTK so that we can have a very rich data model to be able to describe new types of data sets. Geographic information systems is another one that VTK really isn't really designed to do. So it'd be difficult to guess how much of Visit is VTK and how much is not, but there's a lot of it which is not, which is going well beyond what VTK has. So VTK, a lot of other libraries, you're actually leveraging a lot of work from others to take advantage of what they've already done well. Absolutely. So Visit has actually depended upon a lot of other libraries. So we're given the open source model of collaboration. We like building upon what good things other people have done. VTK is just a perfect example of that. Mesa is another one. HDF-5 is a wonderful I.O. library that we're able to get a lot of acceleration through. I'll give you another example. So VTK over the last several years has... Sandia National Laboratories developed a rendering and compositing library called ICE-T that was used for very, very fast compositing work. They incorporated that into VTK and some of their own tools recognizing the power of that in the last year or so, we have rolled that into Visit. So we're now using the compositing library that Sandia contributed to the open source community. So there's actually a lot of external dependencies for Visit so that we can build upon the strengths of others. So then have you done anything to make building Visit for your cluster specifically to match your MPI library, to match your Indian and all that stuff? Have you done any... made that easier? Because getting all these libraries, do I have to go and build all these libraries myself? So we have something called... Well, so let me back up a little. There are definitely things that aren't too bad. For instance, you can build Visit using whatever version of VTK, whatever version of QT you might have installed in your system. For other things where we've made patches or are dependent sort of on a specific version, we have a script now called Build Visit which knows enough to go on a whole bunch of systems to download and build and package up and build Visit around these libraries. And for the most part, you type build Visit, check a few boxes, and it's able to go download these things, build them. Previous to that, we have these build notes text file, which is pages long, which talks about what you need to watch out for on an AI Act system when building this particular library and where you'd have to patch things manually if you went and downloaded them yourselves. But thanks to Build Visit, now that's all sort of rolled into one script that should make things fairly trivial for you on most platforms. Now you specifically mentioned MPI. Visit isn't using anything special with MPI. We can run on any MPI platform that's available. So we're not using anything, let's say, specific to open MPI versus this other MPI. It's pretty vanilla. So there's not a lot of extra detection that we have to do there. So let me ask you, I have to ask now, since you hit my other obvious MPI, what do you use in MPI, just the point-to-point kind of stuff and collectives, or what do you use? Well, I'd have to take a look at that. We certainly use barriers. We've got a number of collectives. We've got a good number of point-to-point. We actually did a survey. There was a paper that we produced at Orange National Lab looking at the complexity of scientific simulation codes from an MPI call point of view. So we went and looked at about a dozen different simulation codes and looked at what are the MPI calls that are happening. And Visit actually used about an order of magnitude more than a lot of the other simulation codes. Now that may be because it's not a simulation code. It's a data analysis code, which has a lot of diverse things that it's doing. But there's a lot of extra work that's going on. We've got our own separate communicators depending on what processors are doing, what pieces of the pipeline, whether they're doing IO or processing or rendering. We don't have any MPI IO in there. So it's all just network-based communication. But that's where we are. Okay, very cool. So jumping back a little bit, I heard somebody say QT earlier. Does that mean you support the Big Three platforms for rendering so the Windows and Mac and Linux? Yeah, absolutely. So that was one of the reasons for picking the libraries that we did. Mesh TV is, as Sean pointed out, being sort of a predecessor to Visit. It was very limited and it depended on motif and things which certainly weren't, at least not worked well cross-platform. So the things that we chose definitely are all working on Windows, OS X, Linux, AIX, any Unix out there. So yes, absolutely. We have binaries on our webpage for all of those. Now, quick question. While looking through the getting data into Visit, I noticed something called LibSim. What exactly is LibSim? Yeah, so there's a couple different ways to sort of think about this. Let me see if I can describe this succinctly. So the typical way that we think of people doing their visualization is by writing a simulation, writing out terabytes of data to disk, quitting the simulation, and then going and launching a visualization tool. And that's certainly one of the incentives for wanting to operate Visit's parallel compute engine where your data lives, is because we're using the same disks that you've done to write your simulation out. But there's a couple downsides to that, not the least of which is waiting for all the IO. I mean, there's certainly simulation codes where you have to choose how many time steps you want to run before you write out your data. You might miss something interesting. It takes a lot of time to do that as well. So that's sort of one motivation. Another one, well, let me talk about what it is first. So Visit connects to a back-end server, this parallel compute engine. The parallel compute engine is the thing that loads the data off the disk and operates it on and so forth. So what the simulation library does is it lets you, by adding a couple files to your simulation, it lets you load at runtime the pieces of Visit which will turn your simulation code into this big parallel server. So what happens, maybe I can sort of walk you through how you'd actually use this. So you launch your simulation code, if you've done it in such a way that you've instrumented your simulation code now, what you can do is you would launch Visit and you would go connect to your simulation which would sort of every time step it goes through listen to see if anybody's trying to connect to it. So you then go and connect to your simulation and then this might pause your simulation. And then what you can do inside Visit's GUI is go make a plot. And so your simulation might be paused and then what it will do is go through Visit's section of code. It's sort of loaded Visit at runtime, Visit's parallel engine at runtime to do the data processing. So if you ask Visit to make a plot for you, it's now asking the Visit piece inside your simulation code. And it knows enough about how to read the current data of your simulation out of memory. So basically you avoid writing all the stuff out to disk, you can just leave it all in memory and basically you're reading from the data structures for your running application. You don't have to worry about getting everything down to disk, spinning it back up from disk. Right, yeah, and it's certainly, I mean, so the way I sort of couched this originally was from a performance reason. But another one is that potentially then you could have it creating a plot of every single time step as it goes. So you could actually create, you know, if you ran for 10,000 time steps but only wrote every 10th one to disk, you could make a movie of every time step without having to do a whole lot of I.O. So it's also ways to find things that you might be missing. And there's some other capabilities. So one of the capabilities sort of built into this whole simulation interface and Visit is the ability for the simulation to advertise capabilities to visit, which will sort of pop up in this window. So for instance, you might have some computational steering things. You might have buttons then that appear in Visit that says run one cycle, update my plot, run 10 cycles and then update my plot. You might do things like relax some mesh or you could actually put a lot of capabilities for things you might want to do as your simulation is running as features that are exposed through Visit and you can control them while you're watching it. So you could almost turn it into sort of a very simple and lightweight front end, GUI front end for your simulation with plotting and analysis capabilities as you go. Okay, so that's really neat. So you can basically watch your stuff as it going, direct the simulation a little bit. I've seen some people do work where they've done like different types of interactive kind of sessions, but not using Visit so it'd be neat if they could kind of tie all this together. And it simplifies saying quite a bit because all the plotting and all the operators and all that's already there. Right, yeah, exactly. And so one of the design goals of the way it works right now is that you don't want to pay the cost. This is big, right? You don't want to pay the cost in terms of memory usage and so forth if you're not going to wind up using Visit, especially on some machines we are very limited on the amount of memory. So the way it's currently set up is it actually loads at runtime on demand it will load the rest of Visit's library in. Obviously it requires some operating system support so that couldn't always work that way on every machine out there, but certainly on a lot of the ones that we've been using it can load Visit on demand and you only get the memory and performance cost if you wind up using it. Okay, so on a different topic what's the most unusual use of Visit you've ever seen? Something you did not expect when you originally set out on this project? Yeah, so Sean's mentioned a couple of these already and I don't know that they count as unusual, but sort of Visit's history isn't necessarily where it is today. For instance I spent a good portion of my time working on molecular types of visualization and this is a problem domain that was not even conceived of as being a use for Visit when we wrote it. Certainly we have a general architecture and things that we expected people would be able to write new plugins and new format readers and so forth, so it certainly didn't present much in the way in terms of a real technological challenge, but even just getting into that game at all is certainly something I think that's rather unusual. I think Sean might have other examples as well. Yeah, so geographic information systems are one that we never even considered. I mean that's outside even the area of scientific computing and so we've had people write plugin readers for GIS formats and now you can plot, let's say, buildings of a city at the same time as you look at the results of a simulation of let's say some flow through the buildings of that city or medical visualization. I had one where I had to have a CT scan done and I asked the people running the CT machine, can I get that data and they handed me a CD and in 30 minutes time I was able to pull it in a visit and take a look at my bones and take a look at the inside of me and segment out different things and Visit was never intended to do medical visualization but it can certainly do that. So you guys have mentioned a couple times that Visit's a pretty large code base. What's the predominant language that you guys use for programming? You mentioned language bindings in a couple of different languages Java, Python and so on but what's the core of Visit written in? So the core of Visit is C++ it's something that back in 1999, 2000 we really had to explore to see whether or not we could get this in a whether we could do the things we wanted to do in a true cross-platform manner. Can we depend upon the standard template library? Can we do exceptions in the appropriate way and eventually we decided yes the C++ compiler technology had matured at that point enough that we can depend upon it and we are that's what 90% of the code base is C++ we certainly have bindings in a number of different areas but C++ is where we've taken almost everything. How big is the code base? If you had to give a rough swag, how many lines of code are we talking? We're getting close to 2 million lines of code that's a little bit misleading because not all of it is human generated. I recently went into the code and scrubbed out all of the code that we have scripts to generate for out of other templates and I think the actual hand coded human generated code is a little bit north of a million lines of code. Does that include comments son? No, it does not include comments so I was able to find a tool that scrubbed out blank lines and comments and so it's actually a little bit north of a million real lines of code. Cool, that's pretty large. Yeah it is a pretty big code base. What kind of development environment do you guys use? Well, so I think we're not tied to any integrated development environment at all. I use Xemax for text editing and make and... We use Subversion. Subversion is all of our code repository but when it was in Livermore we were using commercial products like ClearCase and ClearQuest but since it has escaped the weapons lab we're now using open source technologies like Subversion or using the mailman mailing lists but it's nothing much more special than that. Good, good to hear. What license is VisitUnder? We use the BSD license. Any particular reason for that? I think it provides us a lot of... It allows us to be able to go open source but many times if you use the GPL that can be somewhat scary to commercial entities and there's no reason that we should have to scare them off. I think the BSD gives us the flexibility that we're looking for both on the open source end and also on the collaboration end so if needed we could very easily put together a closed source plug-ins should we need to do that. I actually can't think of any right now but we wanted to be able to provide the ability for people to do that. We should mention that though Visit doesn't have any requirements on the IDE that you use it is very much an auto tools based system. We use auto to do all of the discovery of the platform you're compiling on figuring out where libraries are, how we should be linking, how shared libraries are set up and so forth and that's actually been very powerful for us but we're actually about to move away from it because there is a problem with going towards Windows so it's been a manual process for us to be able to create Windows project files. Basically all of the work is on one developer. Every time we put out a new release, now make sure all the Windows stuff works correctly. We're in the process of moving towards a CMake-based system so that we can have better and more easy cross-platform development for all of Visit. Cool. Yeah, that's a popular argument. We have very similar arguments in OpenMPI because we're auto tools based to automake, auto con, flip tool and so on and the Windows support is always problematic. Hey, cool. Thanks a lot for your time guys. I'm going to go ahead and wrap it up here. I'm personally going to promote Visit to a couple of people I know who I think could really take advantage of it. Jeff, is there anything you had left? Yeah, I just want to say thanks guys. Appreciate your time today. This was fascinating stuff. This is not really stuff that I delve into much at all so it's really, really very interesting to hear about the output of big parallel runs. So this is great stuff. Thanks. Should we put a couple web links out there in case people want to find more information? They could go to visit.ll.gov or visitusers.org. Those will both take you to various other places that give you information. Yeah, let's just slow that one down a little bit there for those who are not familiar. So it was Visit. Let me try that again. Sorry, yes. So you can go to visit.lll.gov and that's sort of the homepage and where you can find the binaries and source code. There's also a wonderful gallery if you want to see some examples of images that people have done in various domains. The other one is visitusers.org and that is where we have a wiki and it will also contain pointers to the mailing lists and various documentation and forums and all sorts of other wonderful information on that site. I should mention also that at those sites not only can you download the source and find the mailing list, but we also have tons of documentation. There's probably over a thousand pages of documentation for visit depending on what you want to do with it, whether you want to develop or just make some visualizations or so forth. So download and start reading and jump on the list. Have fun. Okay, cool. I'll put the links on the notes for this podcast when it goes out so everyone will be able to see it on there. Okay, so thanks a lot for your time guys and we will be in touch. Well, thank you very much for being on. No problem. Thanks a lot. Thanks.