 I'm totally honored to be here. It's the first time that I think I've been correctly invited to an imaging conference, because there's someone that shares my name at Santa Barbara who invented Eigenfaces, and I'm not him. I'm also not the singer-songwriter. You might find it turctunes.com. Anyway, I'd like to share with you a little bit about imaging from the perspective that I come at it. And so the prompt for this, is this a little hot? The mic, I can't quite tell. Okay, the prompt for this was to go over who am I, which is why I use the scare quotes around me, common practices, and then in the domain that I come from, what do we do well, and what is it that we're struggling with? So I sort of tried to follow that outline. I'm gonna tell you a little bit about some of the work that I do, and I'm coming from National Center for Supercomputing Applications, where I run a group called the Data Exploration Lab, and we've been supported in one way or another by NCSA, the Gordon and Betty Moore Foundation, the NSF, and NumFocus, so I wanna say thanks to all of those organizations and people before I get going. But I come from a background in astronomy, and so this is a picture of the universe. At the top is the beginning of the universe, and then down right here where I'm standing is the present day of the universe. And then you can see up at the top, there's what looks like this, this is the recombination era, and then there was the Dark Ages, where there was no visible light as a result of neutral hydrogen and no stars. Then the first stars started forming in the first galaxies, and then we get into sort of the present day when things are optically thin. And so my background is all actually above this dividing line. I did my thesis work on the formation of the very first stars in the universe, and then I've worked since then on things like galaxy formation, the formation of the first galaxies, and so on. And so when people think about astronomy, oh actually here, I'm gonna show a video. This is all from my thesis work, and this is something that people think about a lot when they think about imaging in terms of simulations. So all the work that I've done has been through simulations. And I work again on the first stars. This is a visualization that I made a couple of years ago. I show it all the time, so I'm sure some of you are bored, but please feel free to go ahead and check Twitter and or Snapchat I guess. We're now zooming out, oh no please don't play. We're now zooming out from one of the star forming regions in the early universe in a simulation. We're going over about 15 orders of magnitude in length scale all the way out to a region about the size of our galaxy. And then we're gonna switch to another frame of view where we actually see that star turn on and then illuminate its surroundings and ultimately supernova and inject heavier elements into its environments. Those heavier elements then change the way that stars form in subsequent generations. So now all of these images, this entire movie was generated with Python code using a software volume renderer, which we've now ported onto the GPU, but a software volume renderer that takes multi-dimensional data along varying different types of resolutions and then generates image planes from that. And I'm gonna talk a little bit about the different needs that we have for imaging and sort of what I thought about when Ariel invited me to speak at this when he said imaging because to me images are often three-dimensional. We think about them in terms of some sort of organization, but they're not necessarily always going to be the same resolution and they're certainly not going to be necessarily regularly gridded. So this is an image from a paper from I think the 1950s by Shannon about the theory of information where you've got a source, a transmitter that turns into a signal, then there's a source of noise, and then that becomes a receiver and then a destination. And when we think about astronomy, typically people focus very much on the process of light moving through the universe and then reaching a person. So this is sort of the typical way that people think about survey telescopes, about observational data and so on. But to be honest, I actually haven't used observational data in a really long time and I probably should have mentioned that. The organizers before I gave my talk and in fact if I were to ever visit the Hubble Space Telescope I'd probably like leave the flash on or something like that when I were taking my images. I think more about the way that we generate data or generate images from simulations. So the process of generating images and making those really gorgeous. Now I didn't realize that earlier in the day someone from Pixar was going to be totally schooling all of us on making images. So fortunately in the domain of simulated observations from telescopes we actually do sometimes focus on making things look pretty but it's also an art form to make things look awful. Because we have this notion of a ground truth when we conduct a simulation. We've got a data set that has been simulated to some degree of accuracy and we can say that in some formal representation. But when we observe that data we actually are implicitly looking through a telescope through something that has flaws that we may not necessarily understand the entire process through which we see it. And so we need to do things like inject noise, we need to do things like spread the signal and so on and so forth. And again that's why oftentimes with astronomy people are focused on the process, imaging in astronomy focused on the receiver whereas the work that I do is often focused on the transmitter part of the information transmission process. So I'm gonna briefly talk about how we think about all this because this fits into the outline that was provided. We think about this oftentimes as breaking things down into the way that the bits are represented on disk and so this is things like file formats. The way that the data is represented. So this would be for instance discretization methods or the way that we might conduct our simulations. In some cases that will be Lagrangian where data points follow a particular region of interest and then other points it's Eulerian where you've got data that flows across fixed boundaries in a grid structure. And then on top of that we have this notion of an implicit model. So when we look at all of these different pieces of data we're actually in theory looking at a star at a galaxy that's something. And so this is really the point at which the human observer comes into that. And we have all these different barriers between these different things. So I'll give some examples of the file formats and the data formats that we deal with. Oftentimes we deal with grid based data. So you imagine and this is actually one of the things that's probably the most at home to people in imaging from other domains. You imagine you've got a spatial region and then at every single point you've got in some regularly divided way you've got a data value. This is kind of like an image buffer or a pixel buffer that can be in three, four or five dimensions, whatever. Oftentimes we deal with the solution of fluid flow across those boundaries but in some cases we also deal with the fact that particular regions are more interesting like fundamentally more interesting than others. Perhaps in one region galaxies are collapsing perhaps in another region they're not. We have voids in the universe for instance. This is something that image compression techniques and JPEG 2000 for instance can deal with natively but in our case what we do is we typically insert higher resolution elements. So we have irregular resolution elements that might be a budding that we need to solve for fluid flow across or that for instance we might need to regularize into some mechanism for visualizing or analyzing. Now the sort of alternate method and I hope there aren't too many astronomy simulators in the room because this is actually you know this is where people often find themselves at fisticuffs is the Lagrangian approach where you have a number of different particles or sampling points that might move with the flow. So this is what you often see when you see galaxy collisions. Has anybody ever seen a visualization of a galaxy collision? You've got all the little particles and then the other little bits of particles and they collide. That's often done through a Lagrangian technique where you have different sample points and those sample points might move throughout the simulation. Now this can also take the form of things like unstructured mesh where you have a Voronoi tessellation that moves adaptive Lagrangian oilerian techniques and so on but it still represents the same basic thing. There's a volume of data. We have fields that are defined at every single point although they aren't sampled in a regular way. And so what we wanna do is we want to be able to sample these things uniformly. This is not in fact an eldritch horror. This is a star forming region in one of these simulations but it's actually composed of a number of different levels of resolution that are all abutting, sometimes always overlapping that we need to regularize into some way in order to properly analyze. And so these are images of all the different grid patches that are in this that are overlapping where you might have finer resolution data that we combined in order to get this final image. Now if we go to the next one, this is actually where I've colored it by the different levels of resolution. So at the bottom this is some resolution and typically it goes by factors of two in these simulations but in unstructured regions it may not. And then each one goes all the way up and so we need to be able to deal with differences of about two to the 30, two to the 35, something like that in resolution between the coarsest areas in these calculations and the finest areas in these calculations. So for instance that means that if we're generating an adaptive image pyramid for instance as you might do with something like leaflet or for geospatial imagery, it's going to be extraordinarily sparse. So this is kind of a fun thing. You might have something that in the very center goes up to extremely high resolution but in the very outskirts is very, very coarse. And so that's something that we have to deal with at the rasterization step of our imaging. I wanna show just two fun things. One of the ways that we're attempting to deal with this is through multi-patch GPU volume rendering for instance, so having a couple different things. This is a bit of a slow video. I'm gonna go ahead and skip but inside this region are about 10 levels of refinement and so we're dumping all of these onto the GPU and we need to do adaptive level of detail in order to determine which get displayed when. And I see that Stefan is in the audience so I wanna apologize in advance for the color map. That was not my choice. It should switch to Veritas in a moment as part of the demo. There we go, right? That's Veritas, right? Yeah, okay. So but we also need to deal with the situations where we're dealing with totally unstructured data where we may not have either a fixed regularly spaced set of points or an irregularly spaced set of points where we might instead have different cells and they might be hexahedral, they might be octa, no wait, could you have octa? Anyway, they might be different hydrals. So you might have pyramids, wedges and so on. And so we're dealing with the situation where those also need to be sampled at higher, they may represent for instance different shape functions that are going on inside these different elements. So this work was done by Andrew Myers who is in the audience back there and I just wanna give a shout out to Andrew. We're dealing with all these different things and if we return to this notion of the bits, the data and the model, what we really wanna do is we want to compare at the level of the model and so we need to abstract out the bits and the data. So moving on to the next bullet point in the outline that was provided, I've divided up the file formats that are used in astronomy into fits which I think people are probably familiar with. It's the flexible image transport system. It has evolved over many years into a relatively static data type that gets commonly used for interchange and in fact fits itself, my understanding, is extremely robust but the metadata that is affiliated with fits is another source of contention and problems. Recently, people from the Space Telescope Science Institute have proposed an alternate data format called ASDF which to my understanding is basically not HDF5. And then there's the simulation world and so I'll talk a little bit at the end when I talk about the things we're not good at, about the reason for this but the simulation world essentially has a plethora of different file formats. So each one of these is a different file format that gets generated by simulations and in fact, not only are they different file formats, they are different discretization mechanisms. So for instance, some of these store their data as octrees. Now mind you, some of them are depth-first traversal, some of them are breadth-first traversals. Some of these data formats store themselves as patch-based data where you have different patches that correspond to different areas, different levels of refinement and so on but they don't always overlap and sometimes they do overlap, sometimes you can have things that overlap multiply. So those are some other problems that we run into with that. Then you've got your particle formats. So particle formats are pretty cool in the sense that for the most part you can just, if you know how to read them, you can just read them. I mean, they're all at discrete points and look in space but something that seems to plague particle formats is also their lack of self-describing metadata which is super fun. And then there's also unstructured data. So data that might take the form of different cells of different shapes, for instance and I don't even have up here Voronoi tessellation cells because the fun thing about Voronoi tessellation cells is that you don't actually need to store the mesh because you store the Voronoi point locations which adds on another interesting wrinkle. But these are all the different things that we need to deal with if we want to be able to make cross-model comparisons in imaging. So there are a number of different software tools that have evolved along this. The primary one and I saw Kyle here earlier, the primary one that people use for imaging now or at least that people are starting to use now is AstroPie which is a pretty cool suite of tools that has drawn from a variety of different communities and it's an enormous success in the astronomy community. But then there are still pipelines that rely on lower level tools, things like Seafit's IO, the Source Extractor, IRaF which is an impressive piece of work although it is somewhat older in its heritage than the others. And then for simulation data sets, there are things like YT which I'll talk about briefly, Pinbody, PimC's and then there are some others as well that I think come locally from Berkeley. So we have all these different software tools and for the most part interoperability can actually happen which is nice. So I'm gonna tell you a little bit about the YT project which I work on which is a Python based but it's got C although we have almost removed all of the hard-coded C functions which were uncharitably referred to as write-only code by one of our contributors recently with Scython, it's community developed, code of conduct, governance structure, 100 plus contributors for volumetric and non-spatial data. And so it's been used in a bunch of papers, et cetera, et cetera. But the point behind the YT project is to be able to take some simulations, some data set that's on disk, select out the regions that you're interested in, abstract away things like big data that might live behind them, chunk them into a parallel and efficient way so that you can analyze them along as you go in a memory efficient and hopefully IO efficient way and then also represent the state in the simulation. So I'm gonna talk about fields because this is something that Astronomy could learn quite a lot from GDAL for instance. So GDAL has this cool feature that you can embed into it, workflows that get executed on the fly or at least this was my understanding. Regardless, I'll go ahead and compliment GDAL for being cool. Anyway, sorry, okay. So we have a notion of representation of states. So you might imagine a field having something like name, units, context, and then a prescription for that field. And some of the prescriptions will be this field was simulated on disk. So you might have, for instance, a primitive that gets stored during the course of your simulation. This might be density, momentum, energy, so on. And then some mechanism by which you derive something from that. So the canonical example that I like to refer to here is a very simple arithmetic operation where you can compute the energy in a region by taking the mass in that region, multiplying it by the velocity squared and dividing by two. So in principle, this is something that can be utilized that isn't necessarily stored whereas the primitive variables are stored. And then we write out different mechanisms for representing those in Python. They get computed on the fly next to the data so as to not avoid pre-generating them in memory. And then things like spatial operations where you might be doing a gradient or divergence. We try to make that easy. Although, as you can see, the boilerplate code is a little bit nasty for that. But the point of this is that we, again, want to present a mechanism for analyzing at the model level rather than at the data or the bit level. So this is one of the simplest data sets that comes in our sample data set, in our set of sample data. And so YT is open source and we have a number of different data sets online at our website. And this one is called isolated galaxy 30. It's really small. It's a model of a galaxy. You can't quite see, but out here, there's the outskirts of the galaxy. Here's the nucleus. And it has a number of different fields that are baked into it. So what we attempt to do on top of that is we define a catalog of fields that can be derived from that. We infer what fields can be derived based on the semantic information that we encode both in YT and in the data set in order to figure out what we can generate. So as an example, there are 43 primitive fields in this file. From that, we can generate 363 derived fields. Of those, there are 35 distinct SIMPI symbols that correspond to independent units. And that averages out to about 2.5 primitive fields per derived field. So this is a histogram of the number of dependencies of the field. So most of these operate on one or two or three or four. I think I counted that right. Anyway, fields, but we can also find combinations up to eight or nine fields that get combined to generate a single derived field from this. So from the perspective of imaging, I wanted to talk about a couple of the algorithms that we've implemented, things like volumetric segmentation. So it's specifically topologically connected sets. So generating multi-resolution topologically connected sets across different resolution boundaries so that you can identify, for instance, clumping objects in a star forming region and then evaluate them for things like boundedness, their peculiar velocity and so on. And these might spread themselves across multiple different resolution regions and so on. And in essence, this is really just a parallel union find system. And that deals with irregular resolution as well as fixed sampling points so doing contours on particles as well, implementing marching cubes, ray tracing, including radiative transfer. So properly doing the emission and absorption spectrum across multiple fields and volume rendering on that. And then finally, the part that's perhaps most interesting for imaging is the rasterization and the pixelization level. So we have all these different types of data that come in irregular, totally irregular or multi-resolution that might be overlapping and so on. And we have mechanisms for pixelizing and rasterizing sub-selections of that data in coordinate systems, including, you know, spherical, whatever, spherical cartesian and so on and then dumping that over to matplotlib. But also moving forward, we're dealing with gravitationally warped and distorted regions as well. So you might have a coordinate system that's defined in terms of the Einstein radius of a black hole, but inside your dataset, you've mapped it to a regular cartesian dataset. We wanna make visualizations that accurately represent the data. So, you know, things like volume rendering across irregular and multi-resolution grids, Lagrangian coherent structures, so taking the volumetric segmentation and connecting across time steps, and then also taking datasets, and this is really what I would consider the holy grail of all of this, taking datasets generated with fundamentally different representations of the data and then being able to uniformly visualize it. And so you wouldn't know by looking at this, I hope, but many of these use octree data structures or particle data structures or patch-based data structures and using the same script, we're able to conduct visualization of all those different things simultaneously. So here's a pretty picture. I've been showing this for years and so I'm sorry if any of you have seen this of basing our visualizations on scientifically motivated data. So looking at the distribution of density and temperature and then basing the volume renderings on that. Things like, you know, one of the, I really like this picture and I probably won't ever take it out of my talk because it was generated by a 10th grader working with Stella Offner at UMass Amherst. Different coordinate systems. I'm just gonna skip through this because I recognize it's now 327. And then also layering on top of visualizations models such as gravitationally bound regions like this. Simulated observations are another region that we've been working on. So taking for instance, gravitationally bound clumps inside a star forming region and then ray tracing outward and computing the rotation measure from the magnetic fields in those. So essentially taking our simulations and then saying this is what it would look like through telescope and then applying observability limits on it. This is an image where you'll see we've taken a simulation generated an absolutely gorgeous spectral line and then made it a little bit messier. So again, this is the art of making data look ugly. And then also generating things like simulated X-ray observations. So this is a set of images generated from one single simulation and then convolved with different X-ray telescopes including the now late astro H. And you know, we're dealing with some other types of data including seismology. So guiding seismological visualizations based on these things and also non-Cartesian ray tracing. So if you have a spherical domain you wanna cast Cartesian rays through it without aberration and without resampling onto a Cartesian grid you should be able to do that. And then being able to guide those visualizations. I was really proud to be able to see NeuroDome which used YT as its visualization system in a theater. And I actually got to see it with the person who sculled us is. So that was cool and a little weird. And then dealing with things like nuclear reactors. I think this is a nuclear reactor. Sorry? It's a fusion toroid. It's a fusion toroid. Is that not a nuclear reactor? It's a fusion reactor. Okay, I see. I don't work at a nuclear plant. So anyway, ending the talk I wanna to mention what are we good at? So I would say in the field of astrophysical simulations we are awesome at reinventing things. This is one of our core competencies. And even more core to that is thinking that we ought to. So that's great. I would say on the simulation side we're pretty good at moderate parallelism. So dozens to thousands, maybe tens of thousands of cores. Applying model-based information to data sets I think is one of our core competencies. Being able to apply semantically rich information to the data. Multi-resolution, regular space data I think we're all right at. And I think in-memory data models. So things like derived fields that get applied to simulations and other fields. Things that we're not good at, not reinventing things. This is something that I'd be happy to go on at length in private over. Extreme parallelism. So I would say we're not at, in at least the work that I do, we're not at exascale. There are outliers that are at exascale including a BIDS fellow that I haven't seen yet today. But there are some, I have some conflicting feelings about whether or not that should be a core focus for us. Interactivity as opposed to scripts and workflows. Interactivity is something that we are working on at least in the project that I'm involved in. Although we did recently load galaxies into the HTC Vive which was pretty disorienting and cool. Provenance and workflows and using knowledge from image processing I think is one of the things we aren't good at in simulations at least. Observers are probably much better. And in fact, anytime I've said something unflattering just assume it doesn't apply to observers because they know where I live. So I guess that's how I wanted to end my talk. So thank you very much for giving me this opportunity to talk to you. This has been an awesome conference so far. And thanks.