 Yeah, that would be better. Yes, they can see you. I have another talk on a similar topic. So I'm going to speak about PDE, which is a code package that Andrea Chappellotti and I and some others from the Kazenski group at Harvard University have been developing over the last year or so. All right, so in writing this code, our goal, we want to compute transport properties. Something that's really central to the focus of some of the things we do in the Kazenski group. So we want to compute transport properties, such as traditional things like the electrical conductivity when we apply an electric field, or when we apply a thermal gradient, the thermal conductivity or thermal electric effects. Traditionally, these kinds of calculations have been done using the Boltzmann transport equation. So to set up the Boltzmann transport equation, we look at how the particle distribution function labeled here as F changes in respect to the supplied field. So we want to consider contributions such as if we apply an external force, we get some term like this, or changes due to a thermal gradient, as well as changes due to particle collisions. In solving the Boltzmann transport equation, we want to consider the point at which the system has reached equilibrium. And we want to linearize the Boltzmann transport equation. And then we can, in the formalism of FIDI, and the formalism we're interested in solving this problem in, rearrange the Boltzmann transport equation into a matrix equation. So here we have these force and diffusion terms going into this vector B. And all of the contributions related to particle scattering going into the scattering matrix, which we have labeled here as A. So to attempt to solve this problem, what we're interested in solving for is this solution vector, the delta F term, which is related to the out of particle, out of equilibrium particle distribution. And we can use that to calculate transport properties, transport coefficients that we're interested in. So while this problem is set up in a nice and easy to look at way, it is a, it is a difficult computational problem to solve for lots of reasons. Certainly we've seen in the last talk also about the calculation of particle interactions is difficult. The calculation of the scattering matrix is computationally burdensome. Additionally, the scattering matrix can be very large. And you have to come up with a good computational framework to solve the problem. So we have done this by writing this code, which we have called FIDI. As I said, it's been an effort in the Kaczynski group over the last few years. We have named it FIDI because it is a package of phonon and electron-Golson equation solvers. And the goal of FIDI is to offer basically a range of BTE solvers, which vary in cost and accuracy to suit the specific needs of your material transport problem. We have written it to be extremely high performance computing oriented. We have focused a lot on making sure it is a parallel as possible. And we've also implemented GPU capabilities. And additionally, it is written in C plus and we have done this to provide sort of a suite of objects which are useful to add future developments, basically. While I can't get into every single thing that Phoebe can do in the content of this talk, sort of an overview of the contributions to transport coefficients that we offer, I'm going to show here. So we are able right now to calculate contributions from electron-phonon interactions as well as phonon-phonon interactions. And this gives us a pretty broad range of the contributions to transport properties that we're interested in, as I listed on the first slide, are electrical conductivity, thermal conductivity, and CVAC coefficient. Additionally, we have some sort of special features. We can also output electron and phonon viscosities, which are relevant hydrodynamic systems, and can calculate contributions to the transport coefficients related to the Vigner distribution. In addition to this, we have what's called the electron-phonon average approximation, something that has been developed by Boris Kaczynski in the past, which can sort of do accelerated electron-phonon calculations. So the overarching way that Phoebe works, we first perform some calculations using density functional theory, density functional derivation theory, to calculate the electron and phonon properties, and any interaction matrix elements that you need to do the calculation. And then we go and calculate scattering rates. We fill in the scattering matrix and we use one of our solvers to solve the BTD and find the resulting transport coefficients. While that setup is similar to other codes, we are also doing something that is familiar. We're using this electron-phonon vanne interpolation procedure. So something that we've seen before, but we take the electron-phonon average elements in the course grid. Currently, we're using quantum espresso to calculate them. You do a vanne function calculation. And the critical step is going to be taking these electron-phonon matrix elements, transforming them from the block space where they're calculated to the real space vanne basis, and then back transforming them to an arbitrary mesh of wave vectors, so that you can converge brain zone integrations, which are required for the scattering rates on very fine k-point meshes, so a familiar process. You say, okay, the workflow, the electron-phonon calculation, this is similar to other codes of what is different. Why did you do all of this to write Phoebe? The first point we want to make is this full scattering matrix approach gives us access to some BT solvers, which would otherwise not be possible. We're doing these calculations for phonons and electrons. Basically, every feature we have can be done for both in the same way. This is nice because it allows us access to thermoelectric predictions of things like the figure of merit, where you also need the phonon contributions. And additionally, there are some transport properties where you need information both about phonon transport and electron phonon transport, so like phonon drag or the thermal conductivity of metals. Additionally, as I said earlier, we are focusing a lot on making sure that this code package is as high performance computing oriented as possible. So we've written it in C++, and one of the things that that gives us access to is some really useful libraries for acceleration. The library we have really leaned on is called COCOs. Basically, what this allows us to have is a performance boost throughout the code, as well as GPU acceleration. And it allows us to have this basically, regardless of the underlying hardware that you want to run the code on. So COCOs handles all of the interface with the architecture, all of the optimization related to architecture, and we just get to write one piece of code that works on everything. Additionally, we have some comments that are important about the way we have fixed the gauge in our DFT calculation when we do this phonon interpolation. It makes it so that we hope in the future, although we're currently interfaced with phonon espresso, in any DFT code where the phonon function and electron phonon matrix element information was available, we could perform a skate fixing procedure and also interface with other codes. So just a few slides on this as it is an important point for us. If we can return to the electron phonon matrix elements, we notice a problem when we want to do this interpolation that is sort of tricky. When you do a calculation in phonon espresso, the gauge of the wave function is essentially a random number. So that means that when you do a calculation with 2090, you're not dealing with the same wave function that you had when you did the density functional perturbation theory calculation. And this means that you can't just interpolate the electron phonon matrix elements in a simple way. You have to deal with this gauge problem. So there are some different workarounds. So one which has been used by other codes is after you have calculated the changes to the potential and the volume function information, you can recompute the electron phonon matrix elements. But what we have done which is a little bit different is we have initially selected a gauge at the start of our DFT calculation and then used the same gauge in the further parts of the electron phonon calculation. So the modification we have made. So the first thing to look at the standard workflow of quantum espresso at each step in what we would need to do to calculate the matrix elements, there is a different gauge basically. So the gauge used in the SCF calculation, the density function perturbation theory calculation, and SCF2 upon A90, it's different. And this is because quantum espresso is computing the wave function on demand and we have different gauge information at each point. It leads us to this gauge mismatch problem, which we have to address. And the way that we have done this, we have applied basically, we ship Phoebe basically with a patch to quantum espresso that you can apply. And what this does is when you run the very first SCF calculation, we write information about the plane wave coefficients that contains our gauge information as a reference. We save it only for the irreducible k-points. And then when we run further calculations, we read that information in and we rotate the irreducible k-point plane wave coefficients to whatever other points we need to perform the calculation. And what this enables us to do is that at the end of the day, we do not have to recompute the electron phonon matrix elements on the full k-point mesh. We can just use the ones which were done for the irreducible mesh. So this is a nice way that we get some additional acceleration in Phoebe. All right. So to return to the sort of workflow, we comment that we perform our DFT calculation with this fixed gauge. Then we can proceed, take in our matrix elements, calculate our scattering matrix, and to solve for our transport coefficients to just cover kind of the basic range of solvers that we offer in Phoebe. We go from the constant relaxation time approximation to the absolute bare minimum where you're not even relying on this electron phonon matrix elements or anything like that. We offer this electron phonon averaged approximation solver that just uses the coarse electron phonon matrix elements and does not do any additional interpolation, but still get some pretty accurate results for the transport properties. We offer the traditional relaxation time approximation solution. In this case, our scattering matrix is just reduced to basically the diagonal of the matrix, and we can save some memory by holding it down to that. Then we have some exact solutions to the DT. So a traditional iterative solver, as well as a variational iterative solver, which gives us better convergence, is guaranteed to converge. And finally, we have the relaxant solver, which is useful in some specific cases. So in the case where you have perhaps a hydrodynamic system and you want to know about the lifetimes, you really have to go with this one. Now that we have established how we're doing these calculations, what kinds of calculations we offer, we wanted to show one example case that we used when we were benchmarking and checking through all of our capabilities. So we ran the ScuderDite cobalt entomonide, which is a relatively large 16 atom unit cell. We, because we have calculations of the sievek coefficient conductivity and contributions to the lattice thermal conductivity and electrical conductivity, in the same code, we can make a full prediction of the thermoelectric properties of the material. We here have versus temperature, the lattice thermal conductivity, electrical conductivity, sievek coefficient. We can see that they have really nice agreement with the experiment in black and that in this particular case, we didn't get a significant change between the RTA and iterative solvers. We can also take a look at if we vary the doping concentration of the material and look at the electronic properties, so the mobility, the conductivity and the sievek coefficient. We can see again that if we compare to the experimental points in black that we're getting these really great results and we have a lot of confidence in what PD can do at this point. Again, we can use it to come out with the entire thermoelectric figure of merit, which is something that we're particularly interested in doing. All right, so having shown what PD can do and how it does it, I also want to make a comment on our computational performance. Here we did some benchmarks with a convergent electron phonon calculation for gallium nitride and we did this for both CPU and GPU capabilities. We find that basically we have ideal scaling when running on CPUs and the line on this plot to look at is especially the calculation of the scattering matrix, which is a dash line. It is the bulk of the calculation that we have to do is the most expensive part, so this is the thing to focus on. And we can see that for the red line without GPUs that we have basically just a following of the ideal scaling line. So as we go up a number of processes, we're reducing the time in that way. We are additionally proud to say that when we do this with GPUs, so for this case up to 64 GPUs, we found that we could have the same kind of nearly ideal scaling and that we found approximately for this calculation, one GPU gave us the same kind of acceleration as about 100 CPUs. And for one last note on computational performance, some of the way in which we get this performance benefit is by accelerating the interpolation of the energies and matrix elements. So we had to be a little bit clever about how we did this. If you want to do just a single diagonalization and Fourier transform of your Von A information, it's not going to be really worth putting on a GPU because say you have maybe 10 Von A functions, N and M is going to be maybe 100 matrix elements. Maybe R is something like 100 as well, it's not going to be a huge number. You're kind of limited in the number of parallel threads you can use and it's not going to be worth moving this calculation onto a GPU. However, if you then were to batch over two points, do a whole bunch of these at the same time, you can then take advantage of the kind of number of threads that are on GPUs and get a pretty significant speed up. So we've found that this has been a benefit to us in doing these calculations. And it's a similar kind of thing to what we do for the electron phonon interpolation when we're calculating the scattering matrix and part of why we get this nice scaling and speed up. All right, so to wrap up the presentation and recap what I have just said, we have developed VV, which is an efficient and GPU accelerated code for phonon and electron Boltzmann equation solutions and transport property calculations. It focuses on using a full scattering matrix formalism so that we have access to a comprehensive range basically of Boltzmann transport equation solvers. There's a lot of potential for future developments we feel, and though I don't have time to go into those here, definitely something that can be added is additional scattering contributions. So if you want to add something like electron electron scattering or another kind of quasi particle interaction like this, it's something that you can just add onto the scattering matrix and then use the VT solvers in a similar way. And additionally, we're interested in coming up with more transport properties and other specific effects to study through this code. So then I have to thank the VV team. So first we have to thank Boris who has provided us a lot of invaluable advice and also tremendous freedom in pursuing this project. Andrea is absolutely critical to the project. He has really guided the direction of it. I've been a key participant in the implementations and we've had also Anders Johansson who has helped us out a lot with this co-post GPU acceleration and Natalia Federova who helped us with the EPA implementation. So we have, maybe has been released on archive where you can look at a lot of the details and we also have a website which has extensive documentation as well as tutorials showing you how to run the code. One thing we also wanted to bring up in the context of this workshop and some things that have been discussed recently, even as early as this morning. When we do these calculations we need a number of files from Vania 90. So some lists like this that we have to read in. Right now we are parsing them sort of manually and C++ which you know it's fine it's working but there might have been an easier way to do this and additionally it would be cool if we could have fewer files to read in if there is some way to even post-process this information into a single format that would be helpful to us in using these pieces of information. Okay so then I'm taking questions about anything in the content of the talk thanks so much. Okay so thank you so much for the very clear talk. Of course we are open for questions please use your hand I guess we have already two. Thank you for the talk. Could you expand a bit so it was very nice to have these slides where you compared the features that are different from existing code and so one of the points was this full scattering matrix. Yeah so can you explain I mean what do you mean exactly and how it's not available in EPW. Well so certainly there are some other codes that also have the full scattering matrix like I know Phono 3.5 will do that for the Phonon case as well but definitely we wanted to make sure that this was the sort of central piece of the code we wanted to structure the code around formatting things this way because we felt it was what worked best with the kinds of BT solvers we wanted to implement. So I don't think that's always the case a lot of times I think there's more focus on trying to format things as vectors and save memory in this way. We wrote the code so that you can do that but you can also go to the full scattering matrix and have it not something I talk about but distributed in memory as efficiently as possible and try to keep this calculation you know do a lot of things to reduce the size of the scattering matrix before you do things just a lot of focus on this scattering matrix formalism. Thank you. Thank you for the talk. I had a question about your programming language so most of the in this field most of the lab programs are written either in Fortran or Python and you're using C++ so could you share your experience? So yeah definitely so there were a couple reasons we wanted to use C++ and one of the key reasons was because there were these different kinds of libraries available to us that we felt would help a lot with the acceleration so certainly this code post library is something that has made it much easier because we did not have to write like you know specific CUDA code or something to get the GPU acceleration so that's a that's an effort at Sandia National Laboratory it's it's quite a nice project we haven't looked into it before but unfortunately it's not available in Fortran or things like that so yeah so some of the library interfaces and also we we would really like people in the future to develop this code as well and I think a lot of times people are now having more experience for things like C++ and Fortran when they teach them so okay so we have a few more questions but I have a follow-up question yeah so I wanted to you know to ask you if you can elaborate a bit more on this code post library in the sense that exactly so one of the biggest problem in porting applications to GPUs is not coming up with a clever algorithm but also you know learning how to write in the list CUDA Fortran but then that would only work on some GPUs so you need so there are all these problems that we all know and it seems that with this it's you are sort of solving part of this problem so I was wondering how does it work is it like you know you're calling like a LAPA but it's not LAPA it's something on GPU or because if you need to parallelize a say a for loop you still have to do it no matter if you use a library so so it's something that you can add into parts of the code without disrupting the structure too much it's not quite like open mp but sort of a similar thing where you don't have to write the code in a specific way around cocos other than maybe telling it you know some data structures are meant to be on GPUs or something like that so did they say there are that like tips that you can write a little bit like open mp so it's it's it's closer to open mp that you would write things with up with mpi and communicators yeah I can also elaborate more on this yeah maybe I can elaborate a little bit um it's a little bit tricky to explain maybe you know overly familiar um with how GPUs work but the idea is that you in the code you can write the kernel so the code that is going to be executed on the GPU and the library cocos basically provides a way to compile this set of instructions that you provide in the way that works the best for the current underlying architecture so um it's a little bit different than open mp you have actually actually access to finer uh and lower level control if you need to but if you don't actually it kind of abstract away a lot of the challenges that you would have if you actually would be using CUDA the yeah yeah no the code cocos would actually compile to CUDA if you are using and then be the GPU but it does support also Intel and AMD GPUs and essentially since it's an effort backed by the Sandia National Lab they actually have an interest of supporting all the major super computers that are going to be at least be publicly available in the United States which covers actually most of the vendors uh really nice talk thank you so just two things the files that you said it would be good to package them up I mean I think once uh we move to the library that becomes quite easy because the the state of the calculation becomes extremely well defined and essentially it can be dumped into a file of whatever format you'd like I don't know how urgent that issue is for you no no not urgent but it would be much nicer to do it in the sky so I mean perhaps in the future yeah yeah so once it's library I mean Jerome can maybe they just become very easy to just dump the entire state of the calculation in one go um I'm really interested in the gauge fixing because I think this problem isn't just unique to you this must be something that people face a lot um so first question is I should like to find out more in detail how you do it and and I guess the second is a is a comment for maybe others here as well is it'll be really good if if say the electronic structure code had an option where you could switch it on and it's uniquely fixed the case it had a criterion that it used to fix the case so that every time you ran it you knew it would be in the same case this is basically why we offer this patch to quantum espresso we've just modified routines like c bands to output this information and then read it back in um so it's it is like basically something that you could maybe turn on as an option if it were integrated in quantum espresso uh so so to answer that first question in terms of the gauge fixing what uh specific kind of yeah well we could talk more afterwards we had some great discussion about this uh all the electron quantum people at lunch but we can yeah if you want we can do it afterwards and we can answer more specific questions now so see there is a question online but before that is there anyone okay that's one thank you for the very nice talk and my question would be do you think it would be interesting or also difficult to add the possibility to also simulate the presence of an external field in your transport calculation like a magnetic field or something like that so I I'm working right now on doing some basic magnetic field additions um in in some ways that I've been explored a little bit before but definitely yes yeah I think it should be very possible depending on the method we've talked about different levels of that so an external field certainly yeah for the magnetic field it's it should just be an addition to the scattering matrix that's like one function to apply okay so there is a question from Sergei online does the code allow to refine the k-grid near scattering rate singularity how large of a k-grid is feasible I'm sorry this this is the question does the code allow to refine the k-grid near scattering rate singularity how large of a k-grid is feasible um so in terms of okay so the second question is easier how large of a k-grid is feasible um because we have written this to be extremely parallel as many cores as you have you can distribute it and you should be okay um on the first question I don't think we have this right now though I think it could be very possible um the definition of the k-mesh and the code we could replace it with something else and I don't think there would be a problem there just hasn't been a need for that at this time okay so there's another question from Stepan and when in the preparation it was Fourier transform I guess you do it with a simple Fourier drop not as a fast Fourier did you think of using fast Fourier uh no I don't think so we're just doing it in the traditional funny interpolation format okay anyone else okay so if there are no more questions I think we can thank Jenny again this was let me stop recording now um yes