 I'm Anthony Charakara, I lead the Computational Extra Science Group at the APS. And today I'm gonna talk about some of the ways that we're using high performance computing and AI to enable essentially new kinds of techniques and new kinds of science at the APS. So before I start, I wanna emphasize that this is the work of a great many people. I just happened to be the person presenting today from the Computational Extra Science Group. So we're a diverse group of about 15 PhDs with expertise ranging from applied math and signal processing to high performance computing, workflows, water non-crystallographers like Brian Toby and Bob Van Dril. And more recently, we've been hiring a lot of early career stuff and postdocs who come to us with an expertise in deep learning and automatic differentiation. I particularly want to highlight the work that I'm gonna talk about today, particularly highlights the work of Taken Bizer and Saugat Kandel and Yudong Yaw and Anaka Babu who were former members of the group but have now moved on to other opportunities. Outside of the CX6 group, I also wanna acknowledge a lot of people at the APS, the Math and Computer Science Division, the Data Science and Learning Division, Center for Nanoscale Materials, folks at Lawrence Berkeley and in India as we start pushing more towards the hardware side of computing. And again, in particular, I want to highlight Tarjo who's a staff scientist at CNN and Henry Chan also a staff scientist at the Center for Nanoscale Materials. And funding for this game from Lab Directed Research at Argon and from a department of energy project called AI at scientific facilities, digital to inference and eco-experiments. So I'm gonna start out by talking about numbers basically, essentially laying out kind of a business case for why AI and talk about some of the data and compute needs that we have today, what we expect in the near future. And I suspect a lot of this will be familiar to folks on Syncotron and X-Files from around the world. And then I'll talk about a couple of examples of how we're using AI to address this. The first for analysis, we'll show how we can use AI to make our analysis workflows hundreds of times faster and sometimes even more accurate. And I'll show how this enables for the first time in a real time analysis on really high data streams, like gigabit per second data streams. And then I'll talk a little bit about how this in turn enables the ability to steer intelligently steer experiments. Essentially the idea being, how do you maximize information gain from an experiment in a minimal amount of time? And then more longer term, we are interested in extracting knowledge autonomously from our scattering data. So essentially, how do you directly learn the physics without having to go through a multi-stage process of data inversion, some understanding of what that entails and how that relates to the physical questions that we're trying to answer. So just as a quick overview, if in case you're not intimately familiar with the APS, we are pre-pandemic, where we welcomed users from around the world, academia, industry and government, and we used to see about 6,000 users every year. We operate nearly 70 beamlines, all capable of independent operation, each providing a unique take on materials characterization and many of them are multimodal in nature. And kind of that in turn implies that we need unique data workflows and compute solutions for each of these beamlines because each of them are unique, they're bespoke and they're often more complex instruments than just one particular technique. So the primary motivation is data volumes. So for AI and for the strategy and how we're doing computing at APS. So this is a plot of the data produced by the APS over the last few years. So you're reaching about 10 petabytes or so. And what the data will look like after we go down for a big upgrade and come back up. So we're going to see massive increase in data. And if you look across at different facilities, LCLS or across the department of energy facilities in the US in total, we're going to see 10 to 100 X increase in data rates. Another way of looking at this is by looking at the sort of historical capabilities of instruments like this. So the curve in red shows the photon flux and you can consider photon fluxes like a proxy for data rates. The curve in red shows the photon flux over the last 60 years or so. And the curve in green is Moore's law. And Moore's law has been slowing down and sort of flatlining here. And so what you can see is over the last 60 years or so consistently the rate at which we're generating data at synchrotron facilities has been outpacing Moore's law. And this gap is going to continue to increase. And so we need essentially a new way of thinking about how we analyze data at our facilities. Just plugging in a bigger, bigger computer is not sustainable because of this increasing gap. And we do this, we now have a supercomputer plugged into some of our beam lines and I'll talk briefly about it, but that's still not going to be sufficient for our needs. And so AI will become a necessity to fully exploit the instrumentation and the facility itself. And as you're all very well aware, materials characterization very often involves a solution of some form of inverse problem. Whether you're doing imaging or spectroscopy or diffraction or some combination of them, you typically have to solve an inverse problem and inverse problems are computationally expensive to solve. And then the third motivation is how we want to do science going forward. We want to solve problems. We want to study science that we can't do today. So for example, we're interested in why materials fail catastrophically. And being able to study this means being able to image a material in action at multiple length scales, starting from the nano scale to the macro scale to see the origin of failure in the material and how that evolves, grows and leads to a massive failure on the component scale. If you want to do this, you need to have some form of automated steering of your experiment. And you need essentially any argument capable of making these decisions on where to acquire data, what modality to acquire, et cetera, et cetera. And if you want to do this, of course, then you need real time data inversion, right? Even if the data takes 15 minutes or half an hour to be analyzed, that's too slow. You essentially need to be able to do data inversion on the order of seconds or less. So because of all of these challenges over the last couple of years or so, we have been essentially integrating our performance computing resources at our leadership computing facility directly with beam lines and instruments at the APS. So Polaris is a top 20 supercomputer and a portion of that machine, about four petaflops of that computers processing power is now being made available directly for use at the APS. And so we're building up software solutions that tackle a lot of inverse problems. So things like tachography, XPCS, et cetera, high energy diffraction microscopy, and many others are now directly utilizing these machines to do the analysis. So data is moved directly from the beam line to the instrument. You do some processing, you move that back for visualization analysis at the beam line. And this also includes deep learning training at scale and deploying at the edge, which I'm gonna talk about specifically in the next few slides. So the examples that I'm going to talk a bit more in detail are in current imaging, because that's my background and I'm more personally involved with in some of these projects. So current imaging is going to be a big part of the APSU. There are going to be a large number of tachography beam lines. In addition to that, you're going to have beam lines to things like core and surface capturing, atomic, which we're going to do 3D Bragg CDI, polar, doing polarization modulation spectroscopy, various checks beam lines, then combinations of Bragg CDI, XPCS, et cetera, high energy 3D Bragg CDI, and sector nanoprobe which we'll be doing 3D Bragg tachography. So a great number of coherent imaging beam lines at the APSU and all of these are going to essentially rely on us being able to do the data analysis in the case of current imaging phase retrieval very quickly and accurately. So again, just to give you kind of an idea of how much of a jump in capability this is, I think most of the community here is familiar with tachographies, I'm not going to describe that. Today you can measure in the order of a millimeter square at 10 nanometer resolution in a few hours after the upgrade that's expected to be on the order of a centimeter square at approximately 10 nanometer resolution in a day or two. I like to make the analogy, this is the equivalent of mapping out the entire land area of the United States at few meter resolution, except that instead of taking a simple image at every spot, you have to solve this complex inverse problem which is phase retrieval. And so phase retrieval, as many of you know, is an iterative process, right? You apply certain constraints in real space, you apply some contracts reciprocals space and you essentially do a lot of Fourier transforms 2D or 3D to recover the real space image. And the challenge with this is that you do have to do a lot of FFTs, complex to complex FFTs. You need hundreds of thousands of iterations and you typically also need to have many different starts to get high fidelity data because you could get trapped in local minima, it's a non-smooth surface that you're trying to converge to. And it's often very sensitive to the choice of hyperparameters that tune your phase retrieval algorithms like which algorithms, how many iterations of each algorithm, how to apply the constraints and so on. And this is kind of like an expert knowledge, right? It takes a lot of time to build up the expertise required to do successful phase retrieval. So our approach has been to essentially replace iterative phase retrieval with neural network methods that essentially learn to solve this inverse problem of diffraction data to sample amplitude and image in a single shot. So in the case of dichography, for example, you acquire a lot of overlapping data that provides you the oversampling required to recover the lost phase information. You run this through a nitrate of engine of some sort and you get sample amplitude and phase when you're done. In our approach, we essentially learn a mapping from one single diffraction pattern that you've measured from one scan location to what that sample amplitude and phase would look like. So it's a single shot mapping from this to this. And so we don't explicitly use the overlap information in any way but it's used implicitly in the generation of training data and so on. So because this is not no longer a nitrate of process it's just one single shot. You just give the, once you've trained the network that is you give the diffraction data to the neural network and you essentially get out a prediction of what the amplitude and phase look like. This approach is hundreds of times faster than traditional iterative methods and you also require significantly less data to do this. So this kind of process has been, this approach to addressing the problems of phase retrieval has been widely studied by a lot of different people and here's just a few examples from literature. So this example is from electron microscopy from Jean-Marie Miao and others. This is from optical microscopy, I think. Here's another example from single shot dichography. This is from Esther and Giorging and others at Brookhaven doing this for x-rays and all these examples basically you will see different approaches to replacing iterative phase retrieval with neural networks that learn to solve the inverse problem in a single shot. And so what I'm gonna talk about in the next few slides is how to implement these sort of approaches on high rate instruments that we have at the APS and how we can get essentially real time instrument images from these instruments. Now talk about specifically how we need to develop instrument specific or technique specific solutions to some of these challenges. So workload APS looks something like this and this animation gonna show us how it works. So it is a bit complicated so it takes some time to break this down. So this portion of the animation represents the acquisition of the instrument. So you have a focus zone plate scanning a sample you measure the diffraction in the the current diffraction in the far field you acquire a sequence of these images, right? Let's just scan through the sample. We move these images to a high performance cluster Polaris or anything else and we then do phase retrieval and I'll come back to why we still do phase retrieval. When you're done with doing phase retrieval you're essentially left with some training data that you can now do supervised training on. What I mean by that is you have the diffraction data. You also have the images that you have obtained by doing conventional phase retrieval. So you can use that to essentially do distributed training on a high performance computing resource. So you take the diffraction data you take the phase retrieval images you give it to neural network and say learn the mapping from this input diffraction to what the sample looks like. When you do that you're left with essentially a trained model which we then push to an edge computing device that sits at the beam line. This edge computing device is responsible for the live inference that the user sees. So essentially the acquired data is split into two streams. One goes to the high performance computer one goes to this edge computing device and this edge computing device that is running a copy of the trained neural network makes a prediction makes a near instantaneous prediction of what the sample image looks like, right? And so the reason we're still doing phase retrieval is because we're constantly updating the copy of the neural network, right? Samples change the instrument drifts over time and so we want the network to have the latest information as part of its training. But this kind of breaks the dependence on phase retrieval for real time imaging. So right now if phase retrieval takes half an hour or one hour that's okay because how we get the images to the user is through these live, the live inference is through the neural network running on the edge computing device. And so when phase retrieval is done a new copy of the network comes to this device but if that doesn't happen you could still use the old copy of the network for some time at least. And so using this approach you've essentially been able to do live inference on two kilohertz data stream on 128 by 128 detector images. And we were essentially bandwidth limited to one gigabyte per second. It's actually a detected computer's network card couldn't handle more than this and which is why we tapped out of this. In principle we could go even faster and for larger images. So here's a recording of the beam line computer while this is happening. So this is the data that you acquired. The, you can see the diffraction data and some of the underlying structure from the zoom plate. This is the live inference on the neural network. So the neural network gets this image and then immediately predicts what this should look like. And then image on the bottom is a stitching of this as a scan progresses. So this is a scan location and it's going across it's doing it in a format spiral. And this is the cumulative image obtained by stitching all of these scan locations and the predictions of the neural network together. So to kind of summarize this portion of the talk without high performance computing we're talking about days or weeks to do the phase retrieval and image reconstruction and you kind of need the full data set with high performance computing and essentially scaling out this phase retrieval problem for typography to many, many GPUs. We can cut the reconstruction time to minutes to hours but with this hybrid approach where we have both high performance computing and a train neural network running on an edge instrument you can cut the reconstruction time to the order of milliseconds. Okay, so I'm gonna switch gears a little bit to a different current imaging technique for in this case, Bragg CDI kind of to make a contrast between how we approached typography and how we approached BCDI and why we did it. And I also highlight something that we're doing more and more which is to incorporate prior physics knowledge into our neural network architecture and our machine learning solutions. So in 3D BCDI, similar to typography we're solving the phase retrieval problem except in this case, we directly measure a small volume of reciprocal space around the Bragg peak and so we want to do essentially then reconstruct the full 3D image of a nanoparticle or a single grain in a polyprosline material. So we essentially want to solve a 3D to 3D inverse problem as opposed to typography which you're doing 2D images and switching them late. So in this case again, we have a network that in this case a 3D convolution neural network that learns this mapping from the reciprocal space information, reciprocal space data directly to sample amplitude and phase. And again, once trained you do this in a single shot. The difference with this network is that unlike in typography where you have to do supervised training that is where you need images of both the diffraction data and the sample images to train the neural network. In this case, by incorporating essentially known physics that is the forward scattering model we can train this neural network in a completely unsupervised manner. And so the way that works is you have some input diffraction data. You pass it through this untrained neural network it makes a prediction of what the image should look like. Initially this is a terrible image but that's okay. What we do is then we take the object amplitude and phase generate the complex number and pass it through a forward model. So essentially do some support constraint. We do a 3D FFT do some normalization and get what the estimated diffraction data should look like. You then compare that to the input diffraction image compute a loss function and then use that to update the network weights. So in this entire sample loop you never have to actually show the network images of the sample of what the actual particle looks like. All you need is the data that you've already collected at the instrument. All the 3D BCDI data that you've collected over 10 years, 15 years or so on. And you can imagine how attractive this is to us because this just means that by collecting more experimental data so just as the instrument keeps operating we can acquire more and more experimental data which we can use to train more and more accurate neural networks in the future in a completely autonomous fashion. So the network gets better and better and it's done in a completely unsupervised manner. And essentially to kind of reiterate we eliminate the need for ground truth images and training. So in principle, you will never have to do phase retrieval again. So here's some results from this network. So the top row is phase retrieval and the middle row is the raw prediction from the neural network. And the bottom row is what happens when you take the prediction from the neural network and you want the few iterations of phase retrieval. So initially if you had to run a hundred iterations of phase retrieval now you just run like 10 iterations of phase retrieval. And the Chi-square error is listed here in this case lower Chi-square error is better. So although the images in the neural from the raw neural network prediction look very good the Chi-square error is a little higher but if we do this kind of hybrid approach where we take the neural network prediction run a few iterations of phase retrieval you would not only end up being faster but you also end up with lower Chi-square error in the images look as you can see in some cases even superior to phase retrieval which often ends up with some artifacts in there. Okay. And of course we are a user facility and one of the things that we really care about is making this available to our users. So in this case the auto phase and end model is now incorporated into production software that the users use while they're operating the instrument and also can take home with them for their data analysis. So this is a package called cohere that Barbara Frost it can cross harder I've been developing over some years and essentially you know you do there's options to a lot of data preparations preprocessing of the data display and so on but you also have the option now of choosing this neural network prediction instead of a random guess or a flat guess when you start your phase retrieval. And so when you do that essentially you can get in this case you know this was phase retrieval run for 600 iterations it took so much amount of time. Essentially you'll see your dog who recorded this video she cuts you know she just run 50 iterations of ER when you start with a neural network prediction and in a fraction of the time that you take to run phase retrieval from the beginning you have a good reconstruction. And if the video plays out a little more you can see a comparison between the different reconstruction images starting with phase retrieval run as in the past versus phase retrieval run we starting with the neural network guess. So the three things that you see here are essentially phase retrieval the raw neural network guess and this one which is a combination that we start the neural network on a few iterations of phase retrieval and you can see this produces a very pleasing image which we can also quantify with the Keiser error and see that it's the best possible reconstruction even better than phase retrieval while being faster. Okay so I showed two examples of networks that we're deploying at the APS one is TycoNN and one was auto phase NN so this was for typography this was for 3D BCDI and I showed two different approaches essentially to how we train these networks right with TycoNN we do this in a supervised manner we did it in line with the experiment with auto phase NN it's done on supervised essentially learns by itself how to do phase retrieval and it's trained in an offline manner and the reason we approach this is because we're essentially governed by engineering criteria for typography you're acquiring a lot of data sequentially very quickly right you're running a detector very fast with BCDI it typically takes a few minutes because you have to get a full 3D data set which means involving physically rocking the sample stage so inference times for BCDI don't have to be so high if you can do it in the order of a second or so you know that's plenty whereas for typography you basically have to be able to get inference from a neural network at the same speed that you're running a detector right otherwise you just keep ending up with a backlog of images coming in from a detector so this means that our network sizes in turn are also governed by this requirement on inference time so for TycoNN that means having parameter having a very small network we essentially quantize, prune and make the network as small as possible so that we can keep inference time really low with AutoPhaseNN since this is not a criteria we can train a large network we can train a network with more than 10 million parameters and this in turn has consequences on how these networks generalize with TycoNN because it's so small it is basically only good for this instrument and this sample with AutoPhaseNN this is a generalizable network it's large, it gets to learn more things and so it basically is good for convex objects with weak phase and now we're working on extending this basically to more complex objects with like defects in them so they have a lot of strong phase in them and in principle this would mean increasing the size of the network to make it capable essentially of learning broader space so I talk a lot about current imaging but that's not the only kind of technique where we are applying the sort of philosophy of training on high performance computing and deploying on edge computing devices this is a network of Bragg and N that's ancient Louis who's now at Amazon Hemansharma who's in the CXX group and many others at the APS have developed and this is a neural network that essentially does very accurate peak fitting for high energy diffraction microscopy and so it's actually a convolutional neural network that takes patches of the detector and then predicts to have very high subpixel accuracy what the peak location is and this network is 200 times faster and more accurate than current pseudo white approaches that have been used to do peak fitting and just like with dichography essentially they've deployed this network on an edge computing device to process streaming data so you read the data right off the detector so she has a video of that happening and identify the peaks and then fit those to high peak high accuracy okay so I'm gonna switch gears a little bit and talk about how this real time analysis that AI enables in turn lets us do more exciting things with instruments in particular how we can start steering these experiments to target features that we're interested in and this solves a more general problem right so you put a sample in and you don't know what's in the sample and the question that we typically face when you start an experiment is how should you acquire data so as to maximize the information game in a minimal amount of time and so in the approach that I'm gonna show what we do is we sample a few points initially randomly and then we pre train a network that essentially kind of learns where to sample from given a set of measurements where to sample from so as to maximize this information game and I'll show again we put this in the loop in the experiment so that these decisions are made very fast and the instrument is tiered very quickly to target locations and essentially we can use this approach to reconstruct images with far fewer points than you could if you were to do this in a high fidelity raster scan or other kind of fixed scan so the particular sample is a tungsten diselenite film and we measured the signal at the OO8 brag peak in a scanning diffraction modality at the nano probe beam line at the ABS so this is kind of like how our edge setup looks like this is for tachography as well I guess I didn't show the image earlier so this is looking upstream at one of our beam lines so we put we use this NVIDIA Jetson mini computer so sort of like a think of it like a Raspberry Pi except it comes with you know CUDA cores and the ability to run very fast neural network inference it's inexpensive it's low power and it's sufficient to keep up with a lot of our compute or edge computing needs when it's running a train neural network and our workflow goes something like this so you know your sample a few points randomly you we have this train neural network that essentially then suggests the next 15 most important points to sample from we run some root optimization through Google's OR tools to optimize the scan path over 50 points require those points we compute a change in the image and then loop backwards to figure out again the next set of points from that the neural network suggests to sample so all of this is in a completely automated fashion so you know your user will start the scan here and then this cycle will continue until the change in the image is minimal and you're done with the scan the cool thing about what we did was we trained this neural network on an utterly unrelated image it's this famous image of a cameraman and essentially it was trained on 100 different masks at different sampling rates so you know we block out different portions of the of the image the neural network has to learn where to sample from so as to get closest to the to the actual ground truth image and this is an approach that we have essentially adapted from this paper from people at Argonne and from Charlie Bowman at Purdue and these are the results from that so the on the left you have the full resolution image that you would have acquired if you were to measure every single scan location and this is an image that is taken at a 100 nanometer stamp set the results on the right show the corresponding image if you were to if you were to use this AI guided acquisition approach the plot on the bottom right shows the locations that were chosen by this neural network to scan so you kind of see it starts focusing on edges of features where things are changing the most and it ignores portions of the sample that are relatively flat but there's not a lot going on and so this is the image that's acquired with you know like 20-25% of the number of scan points as this image and you can see it acquires in all of the features in the image so again here's a video of this running at a beam line so this is the 26th beam line at the APS and this is a live recording of the computer so the bottom row is the points that are measured by the neural network the image here is the interpolated image I think the text will walk you through the rest of the video so this little red dot is the live location of where the beam is and this path in white is an optimized route through all of the points that the neural network suggests is the most important to sample from this is the live detector image so you see it discovered a new feature and so spends more time measuring around it just to make sure it gets all of the aspects of that feature we can also do this sort of automation in experimental and intelligence sharing of instruments not just for the experimentation side but also for the instrumentation side and so Luca Rabafi in the optics group and Sagat Khandle and others have been working on this thing called autofocus essentially the idea of being how can you automate the process of you know alignment and focusing of mirrors and they're doing this by essentially working with a digital twin of the beam line in this modeling software called Oasis so they develop a lot of these models in simulation and then you can deploy these instruments and they've been getting very good results for this other related work people like Yinsan, Nikita Kuklev, Ihar Lobak and others at the APS they're looking at AI based sharing of the accelerator itself and they're looking at AI applied to increasing the efficiency of the operation and also being able to predict things like power trips ahead of time okay so with that I raised the end of my talk I hope I've convinced you that AI would be an integral part of APSU beam lines we talked about AI for analysis and I showed some examples of how we can accelerate and essentially allow real-time imaging for the first time showed an example of autonomously steering the microscope to features of interest that we're interested in and this one I've not talked about today in the interest of time but you know more long-term we're interested in learning physics directly from the data we measure and again putting these into production is integral to what we do writing the papers only part of it this is going and some of our AI for analysis techniques already going into production like we saw in the next few years we expect some of our steering models to go into production as well and a lot more long-term we hope to see some of our AI for knowledge models in production all of our code data train models everything is freely available and always will be and we welcome to try these out and let us know if you have any issues thank you and that'll take any questions