 Dobro, vse moj vse. Prvno, zelo, da bih uroga naredim, da bih me pošličili, to je vse pravdje. Imam Ivan Karnimem, imam vse poslednje v Kantom Espresso grup in Trieste in Udine in vse moj pači pomeni v kod Trifok Kod Kantom Espresso. V tom enakte pa te navetne vse, ki je obtavljeno v svoj dveh latih, in kako sem je tudi všem izvahovati na optimizacijenju za zelo vzvečenju v Kultuma Espresso. Tudi vzvečenja vzvečenja je vzvečenja iz vzvečenja z kanalih, kako je zelo vzvečenja. Vzvečenja z vzvečenjem je zelo vzvečenja, ker je bilo vzvečenja, in Others have good predictive capabilities in most cases for most molecular properties. On the other side planes are very good and we know that scaling to large system is good and they are used for many applications of molecular dynamics and so on for their treatment of large systems. The problem is as you might know that when we try to use a big functional v plenovačnih plenovačnih. Danes komputčenja bardena v tega začela je veliko zelo. Očelji je, da imaš vsej zelo vsej metod, da je početno vsej in vsej vsej vsej početno v plenovačnih plenovačnih. Tudi, da je vsej začeljno vsej vsej tega začela v tega začela v plenovačnih plenovačnih in da je zelo vsej and the ones which can be treated currently. Basically in this presentation I will first present this our approach, the theoretical framework on which our approach relies and then I will show some benchmarks and test cases where the computational performance is in terms of efficiency and time computation of the calculations will be always compared with the accuracy which can be obtained when going to large systems. So basically in this test cases the times and some molecular properties will be compared. So the main problem is we want to solve a set of Konsham equations where in the Konsham operator we have this Artifoke exchange operator which depends on the density matrix. The density matrix can be written as a sum of orbitals and the Artifoke exchange operator is defined like this is a sum of contributions coming from pair of orbitals. So in order to evaluate the Fock potential in the SCF we have to calculate these electrostatic integrals where in the integrand the density here integrates is the product of two molecular orbitals. These integrals are evaluated with FFT procedures so the evaluation of one, if this is each evaluation is relatively not that expensive if it is done just once then it is quite fine even for large system because we can exploit the scalability of FFTs. However the problem here is that we have to do those integrals many times because for one evaluation of the Fock potential we need to compute here the number of integrals here is the square of the number of occupied Konsham states at least. So the problem is that we have many of those integrals electrostatic integrals to be computed. What we have exploited, the basis we have used to try to reduce the computational burden related to this calculation is the orbital localization. So our claim is exploiting the orbital localization localizing the molecular orbital we can significantly save computational time in the evaluation of the Fock potential. We can in order to understand this we can just notice that here if ci and psi j are two canonical orbitals then the product of these two orbitals is usually delocalized over the whole system because the two orbitals are delocalized. However if we have localized representation of the orbitals then the product of these two localized orbitals is really, really small. For large system it happens that if the two orbitals are far enough the product can be just neglected. So the way we are trying to exploit in localization is that for large system we localize the orbital so there will be many orbital pairs which are far enough so we don't really need to evaluate n squared integrals like this but we need to evaluate a fewer number because many of the orbitals are not really interacting with each other. So let's see how to transform this proof of concept this general idea into a practical algorithm which can be used and which can be used in the SCF. So the main ingredients are basically we need a fast and efficient localization procedure. To this purpose I have to remember that we want to do it inside the SCF so the orbitals needs to be localized many times at every SCF iteration. So we need a fast algorithm and something which can be scaled to large systems. Then once assuming we have a localization representation of the orbitals then we have to project the exchange operator onto the localized representation and we will show how we do it. And finally once we have localized the exchange potential expressed in term of localized orbitals we need a fast and efficient criterion for troncating the orbital interactions. So let's see how the localization has been done. The SCDM method has been chosen. I will not go through the detail of this method because it has been beautifully explained in the previous talk. I just will focus on the way it has been implemented in quantum espresso with some care about going towards large systems. The starting point of SCDM is the real space representation of the molecular orbital on the real space grid. This is a huge matrix in which dimension is the dimension of the real space grid times the number of occupied quantum orbitals. So this matrix is very large and the localization is done by multiplying this matrix times a transformation matrix. What we need to know now is that this transformation matrix is obtained with a QR decomposition of the huge matrix psi. So we have if we imagine that we have a large system, we have a huge matrix molecular orbital matrix because one of the dimension is the real space grid. And we have to perform a QR decomposition on this huge matrix. So the way to apply this method for largest system are basically two. One option is to distribute the matrix among the processors and perform the QR decomposition, a distributed QR decomposition using Scalapac subroutine. This is one option. We still need in this case to perform the full QR decomposition of the huge matrix, although the matrix is distributed among processors. Another option, which has been provosted also by Damlin in this article, is to use a prescreening of the matrix because what has been noted is that although the matrix is huge, the points which really matters for the QR decomposition and for the building of this transformation matrix are only the points where the norm of the density is high. So since this matrix is very sparse, we could prescreen the real space grid, select the point, which really matters for the localization and perform the QR decomposition only on this reduced ensemble of points. This is the way we have chosen. In practice, this has been implemented in quantum espresso using two thresholds. One threshold is on the norm of the density. So in the prescreening we screened the real space grid and we chose the points for which the norm of the density is higher than threshold. Then since we want to further restrict this ensemble, we have included another threshold, which is on the norm of the gradient. So we further screened the points, selecting the points for which the norm of the gradient is lower than threshold. This is aimed to pick the peaks of the density. If we do this, here in this plot the fraction, the percentual localization, the fraction of the localization obtained using the prescreened SCDM algorithm and the full SCDM algorithm is reported on the y-axis. On the x-axis is the fraction of grid used. Here is the dimension of the matrix. The fraction means the dimension of the matrix which has been used for the composition. And here is the localization, which is achieved. We can see that for this system, just using 1% of the real space grid points, we can achieve 100% of localization. So basically the localization doesn't vary a lot, even though we are using a very, very small fraction of the total number of points. The reason is that the information contained in the large matrix is just the number of occupied states. So no matter how dense is the grid, is the grid we choose to represent the orbitals, in the rank of the matrix is just the number of electrons or the occupied states. So this is the point. Using this prescreened algorithm we can go towards very large systems. So here for example we have a cluster of 700 atoms. So 240 water molecules. And the total real space grid is 11 million of points and we could perform the localization using only 3,000 points. So with this algorithm we assume we can perform fast orbital localization in an efficient way for large systems. Let's move on. We have a way to localize the orbitals. Now we have to project the exchange operator on the manifold of localized orbitals. This is very easy indeed because we can exploit to this purpose the ACE formalism. The ACE method has been implemented in the last years in order to speed up the evaluation of SCF with hybrid. It is basically a projection procedure. The point is in the SCF we need to evaluate the fock potential many times in all the iterations. So instead of evaluating the exact potential all the times we can just build a projected operator and evaluating the projected operator during using the projected operator rather than the exact one in the SCF. The application of the projected operator is much cheaper than the application of the exact one. When we use projected operator we always should be very careful about what we are doing because of course what matters is the manifold the space on which the projection is done. In our case the project the operator is projected in the conscious manifold so the orbitals in which we are solving the SCF but since the canonical and localized orbitals are just obtained as a unitary rotation the manifold is the same. So basically we can use the same computational framework of the ASE but projecting the ASE operator on the localized manifold and this is in practice equivalent to project on the canonical orbitals. So we can just construct the ASE operator using localized orbitals and apply to canonical orbitals to solve the quantum equations. So in order just to distinguish this to stress the fact that the ASE operator is computed using localized orbitals I just put an L here to distinguish the method for later convenience. So this L ASE means that this is an ASE where the projection is done on localized orbitals. Now we have a localized representation we know how to project the exchange operator on such set of orbitals. We need a criterion to truncate the exchange integrals. So the point is now we have to compute this exchange potential using localized orbitals and we know that most of those orbitals are far enough so the product density here is vanishing in many cases especially for large systems. So we have to evaluate how much these two orbitals overlap with each other. If the overlap is very small then the electrostatic potential is also very small and it can be neglected. In order to evaluate the overlap between these two localized orbitals we have used this integral which is the absolute overlap. If I remove the absolute value here this integral is just 0 or 1 because of orthonormalization condition. If I put the absolute value then this integral measures how much the two distributions overlap with each other. So if these integrals are 0 or are close to 0 we can assume that the two distributions are not overlapping so we can neglect the pair here in the exchange potential and we can save the time of the evaluation of the potential here. Here these integrals are plotted as a function of the distance between the centroids of the two involved orbitals. We can notice that for canonical orbital the correlation is very low because the canonical orbital are delocalized so in most cases the absolute overlap is 1. However if we localize the orbitals then we see a correlation between the value of the absolute overlap and the distance. So we can set the threshold and we can decide to cut the exchange potential whenever the absolute overlap is lower than this threshold. Even using very small threshold like 0.001 we are skipping more than half of the pairs of orbitals and this is a significant computation and speed up of the evaluation of the potential. So all these methodology has been implemented in quantum espresso using three thresholds. Two thresholds related to the SCDM localization. So tuning these thresholds means that we are varying the sides of the matrix for which the QR decomposition is done and this threshold is the very important parameter because this threshold tells us how much orbital pairs are neglected in the calculation of the exchange potential. So while these thresholds are just attuned the localization of the orbitals this one is really related to the accuracy of what we are doing because it's the approximation we are doing. So let's see how this method works. Let's see for real, I mean for real system, for real benchmark calculations. So using for example 80 CPU here we have a cluster of water molecules and we see that with the old code without neither the ASE the computational time explodes. Using the ASE 100 water molecules costs about 1,000 minutes. If we use thresholds we see that the fraction of orbital pairs included in this case is just 16 percent and the computational time is really, really lower. The accuracy of the method on the other side is still quite high. I've tried the three main aspects the energetics, the electronic structure and the forces. Regarding the energetics I've calculated the binding energy of these water clusters and we see that the errors for using the LAC with different ratios with respect to the exact method are really, really low. K kalper molar are quite large huge units so this is of the order of 10 to the minus 3 K kalper mole which is really, really a small error. On the other side the position of the homo state is practically the same even for truncated potential. Here we have the distribution of error of the forces. We see that using a threshold of 0.001 the error on the forces is of the order of 10 to the minus 6 Riedberg per Au. And I think the important thing is that increasing the threshold of course the variance of the error increases because we are doing more approximations but the distribution is still centered so it means that we are not including any overall translation and the method is still consistent. So let's try to go towards larger systems here. I have tried with more computational resources 576 processors to treat the larger systems so here we have 700 atoms of water molecules. Again the accuracy is really the same. This is the position of the homo with approximate and full methods and this is the binding energy. I notice that the full method here takes 25 hours for an SCF the truncated method is just 2 hours. And the same I tried to do it for cluster nanoparticles in this case I've tried on this titanium dioxide nanoparticle of 2250 atoms sorry and full calculation with AC took 11 hours with using truncated potential was just 3 hours and here we see the comparison with the density of states in even range of energy. This comparison is done for the smaller cluster. We see that we cannot distinguish the black line where the exact density of state is plotted from the red line where the truncated potential is used. As a final test case here we have a silicon nanoparticle this is the biggest one is 1000 atoms 15 angstrom radius of course for silicon but of used for the plane waves is much lower than for the other test cases. Anyway in this case using the full AC method we took 16 hours of computational time while using this truncated LACE is just 4 hours and again the gap is the error on the gap is still really reasonable and also the error on the total force is still reasonable. So let me just summarize what we have obtained I guess I can say that the methods provide quite an high accuracy and also a good efficiency because usually in the test cases we have found the computational time where 4, 5, 3, 4, 5 times faster than using the full AC method. Another advantage I think is that it is suitable for any computational facility because the computational speed up depends on the system you are treating not on the number of processes or parallelization. Of course we have drawbacks because the method is still approximate we have to remember it if we want to use it and some thresholds need to be tuned in order to set the localization and the truncation in the exchange potential. At the moment this has been implemented in gamma cases which is the most relevant one because it is the case where the cells are larger. And for also spin polarized calculation this is a really recent development which has been done in the last few days. Also I would like to let me say here that reduced exchange grids are also compatible with truncated potential so basically we are combining of the AC method with the computational speed up of the truncated potential based on the localization with the computational speed up given by using a lower cutoff for the exchange potential so we can provide the three cutoffs for the plane waves one for the wave function one for the exchange potential and one for the density and the cutoff for the foc operator is not bound to be the same as for the density but we can use a lower one. So from the test cases it seems that it works for nanoparticles or fragmented systems and a point could be for future developments to the extension to k-points in particular for large surfaces and further improvements could be also to combine for the pairs which are included still further improvement is possible because the electrostatic potential can be computed in real space using Poisson solver on reduced grids exploiting the fact that the orbitals are localized or also using FFTs on reduced boxes Finally this can be also combined with some other parallelization strategies at the moment only the g-vector is implemented So let me just acknowledge the people I've worked with, Stefano Baroni and Paolo Giannozzi, with whom I've done all these developments but also people with whom I have really useful discussions Stefano de Gironkeli, Pietro de Lugas about the quantum espresso code and of course Linlin regarding the AC and the CDM approach and thank you for the attention. Thanks.