 Okay, so we move to our next speaker, Junfran Chiao from EPFL, who will sort of build up on what Valerio said, and we'll show you how you can use this Sdm method that is implemented in Vanie 90, use these ideas shown to you by Valerio on projectabilities and fittings and extracting optimal parameters for Sdm, and how all of this is embedded into AIDA to deliver a sort of high group with automated Vanie function. So thanks a lot, Junfran, the floor is yours. Thank you, Antimo. So hello everyone, I'm Junfran Chiao from EPFL. So today I'm going to talk about AIDA, which is a computational infrastructure for automated workflows and data provenance. So, I guess you have a rough feeling of the AIDA modernization procedure. So we have to start from the self-consistent, non-self-consistent, all these steps to the final maximum localized linear functions. So in each step, we have to manually edit the input files to look at all the output files to make sure the calculations are running smoothly. And even more, if I change the structure, I need to redo everything. So this is looks like a bit cumbersome. Even if we have a tool that can automatically generate inputs from a user-provided structure, an arbitrary structure, and the tool can submit pw.apps and all these executables to the local machine or a remote HPC cluster. So it can recover from errors, either due to some hardware issue or some non-optimal parameters. And also if the tool can track and query all the inputs and outputs, so we have a full reproducibility of all our calculations and either for data management. If we have this tool, we can save a lot of human labor and save a lot of time to do research. So AIDA is the tool to come to rescue. So AIDA is this automated interactive infrastructure and database for computational science. So essentially it is a computational science infrastructure for high throughput workflows with full data provenance. So AIDA itself is written in Python. It's easier to use. And this MIT license is fully open source on GitHub. So everyone can have a look at the source code and do the pull request and make it better. So the main feature of AIDA is its scalable workflow engine, meaning that you can submit from several calculations to hundreds of calculations simultaneously. It has built-in support for high performance computing. So it can handle the SSH connection, upload the files and also download the files from the remote cluster. Moreover, it can automatically generate the full data provenance to make sure your calculations are reproducible even after several years. It has this flexible plugin system so that you can easily integrate third-party codes into your AIDA ecosystem. So in short, AIDA is a tool to reduce human labor for running single workflow. And most importantly, it's particularly suitable for high throughput calculations. You always start using AIDA and like to clarify a bit the concept behind all these software. So suppose we have a single calculation. So if we want to fully reproduce the calculation, we have to first store the calculation itself and store all its input and all the generated outputs. Moreover, we have to store the interconnection between these entities. So if we reuse the calculations, these intermediate calculations, we have this directive graph. So for example, if we start from a structure, we run some self-consistent calculation. We can also run some relaxed calculation to have this relaxed structure. And we can compare the distance between the original structure and the relaxed structure so we can see the volume changes. So we can run self-consistent calculation of the relaxed structure. So with these several calculations, these nodes are becoming a network. It's a directed graph. And this network will grow quickly in complexity, even for some simple workflows. So for example, for this molecular dynamic study of lithium in solid electrolyte, it might be simple as you might start. But if you want to fully reproduce all your calculation, you have to store a rather complex network. So even for larger databases like this, these thousands of calculations, you see the IEDA problem is quite complex. So for the network of this scale, it's very difficult for human brain to remember everything. So that's why we need a tool like IEDA to fully generate the problem in this graph. So how does IEDA automatically generate this problem in this graph? So in IEDA, we track the problem since the data creation. So instead of simple Python type like the integer 5, we use IEDA type called this IEDA.orim.int. So it seems might be a little bit complex, but essentially by using IEDA type, the IEDA itself can automatically track the problem of the data. So for example, when we pass this int5 as the input to the calculation, the IEDA will automatically track the input and the track is generated output. So that's why IEDA can automatically generate the problem in this graph. So this might be still a little bit abstract. So now we can start with some practical code to understand a bit more about the problem in this tracking. So suppose we want to calculate this 3 plus 4 times 5. So this is quite simple. If we do this in Python, although this is not a straightforward way to do this, but if we use function to calculate this, we can define three functions. So first is this addition, and then this multiplication, and then this add multiply calls these two functions and return its results. So we have this add multiply function to do the calculation and get the final value. In IEDA, we use these Python decorators before these functions. So this add cal function essentially turns this simple Python function into a IEDA cal function. So with this IEDA cal function on the right, you can find the IEDA can automatically track the problem. So we have this rm.int, this 5. This is the IEDA type. So when we pass this IEDA type to this work function, it can call this cal function and to generate all the problems until the final output. So essentially you only need to add this add cal function, add work function, so to turn your ordinary Python function into the IEDA cal function. So you have these automated problems tracking almost for free. So for a more complex case, you can use this Python class. You can create your subclass from this IEDA provided in WorkChain. Essentially define a function in this defined function, you define the inputs of your WorkChain, and you write down the procedure of your calculation. So you're going from the setting up some initial values, and then you validate these inputs. And if you have some relaxation settings, then you do the relaxation. If not, you just skip this, and then you can run a seek pass to generate the primitive cell, and also run the self consistent calculation, and then then calculation, and also finally output all the results. So this is essentially models all your manually calculation steps. So for more complex case, for example, we want to calculate the band structure. So we start from this crystal structure and launch an IEDA workflow, so it will automatically generate this problem graph, and then we get a final band structure. So this IEDA automatically generated problem graph enables us to reproduce a specific calculation is like a log of what happened in the past. Well, for the IEDA workflow engine, this is a flexible Python interface to encode our complex scientific steps, like I showed before the WorkChain class. This workflow engine provides some robustness against SSH connection drops and also allows to easily implement error handlers for some common errors like end of war time. Also, it has this additional bonus that it can automatically track the problem is to ensure fully reproducible. So we have this IEDA quantum espresso as an example, we have this IEDA quantum espresso graph in. So, on the left, this is a realization workflow. So we start from a crystal structure. We run the seek pass. So, seek pass is a tool to reduce your conventional sound to a primitive cell, and to find the standard K pass for the band structure calculation. So seek pass run, we can run a quantum espresso relaxation, if it is not converged, we can adapt its input and prepare restart. Well, if it is converged, we run another seek pass to find the primitive cell, and also generate the band structure K point pass. And finally, we run a self consistent calculation to get the self consistent charge density. All this step I implemented in a PW relax working in the IEDA quantum espresso plugin. We also have this PW bands working to automatically calculate the band structure. And also this pH base for change to calculate phonon properties. So actually it's very easy to wrap any executable into a corresponding IEDA workflow. So apart from IEDA quantum espresso, we can also implement various kinds of plugins for the calculation for the data, the parser, the transport scheduler, and workflow and also importer and exporter. So for now we have so the latest numbers like near 80. So we have near 80 IEDA plugin packages to handle to integrate different codes into IEDA ecosystem. So as you can see from this figure, we are now the number of either packages plugins is steadily increasing with respect to time so we're building up a community of contributors to integrate various code like quantum espresso wasp or full electron codes like when to take. So also thanks to the abstraction of the IEDA, we now have a common workflow interface. So you can only need you only need to change several lines of Python, and then you can submit calculations to calculate the equation of states using different kinds of codes. And then you can compare the results between various flavors of DFT and know the know their accuracy. So, in short, it is infrastructure for calculation. It has these main feature of workflow automation. We can also define complex workflows with advanced error handling, and also either provide an automated robust and scalable engine to run the workflows. Also, either help us to manage our data and to automatically generate the problems. So it's stored data, interoperable and the portable, and also it ensures fully reproducibility by storing the provenance work. So a little, a little bit behind the scene. So as a user we have three way to interact with IEDA, you can either use this interactive Python shell, or use this very command line interface, or we can write Python script to submit your workflow. As a plugin developer, we can write either plugins for quantum expressors for us for all kinds of codes. IEDA itself provides the rest of the stuff. It has this either demon to manage your submitted workflow and to submit your workflow into the remote cluster. It also provides storage that it stores your nodes, your relationship into this database, SQL database, and also it stores your input and output farm to a file repository. So, once you have finished your calculation with IEDA and you're satisfied with your results, you maybe want to share your data with others. For IEDA, you can directly export your IEDA database and import it on a different computer, or you can share your database onto this materials cloud, this online platform. So, for each submission to materials cloud, it has this designated permanent DOI so you can easily cite your submission. Also, if you are using IEDA to generate a database, there's a direct link that you can link this materials cloud web page into the IEDA problems graph. Also, if you are not using IEDA, you can directly submit your input and output, and it is guaranteed to be stored for at least 10 years after your submission. So, for example, for this material in this paper, you can from this band structure, you can click this IEDA icon, and then this will lead you to the problems graph. So, here we have this as a small video clip to show you the power of this problem. So we start from this band structure. And we can find this parent quantum espresso calculation. And we can find always input and outputs. So, for example, here, we can find this input file. This is the output file. And also, we can find is input structure and visualize the crystal structure and also download the crystal structure. So this materials cloud provides a platform for you to share your data. So, as summary, so there are two core infrastructures. So the first is this IEDA is like a operating system to manage and to automate and store your simulations and results. Well, this materials cloud is this open science dissemination portal and cloud simulation platform. So IEDA is like the deed to manage your source code. Well, this materials cloud is like data for you to share your code repository, but this is for computation and material science community. So that's this gentle introduction to IEDA. So I'm very happy to take any question. Okay, so there was a question on the chat but I think Giovanni already answered that in full detail. Yes. So, I guess we can take questions from participants here present. Please again, raise your hand if you want to ask a question. Any other questions? Zoom or on metrics. Maybe just Giovanni, hi. There's just one comment. I don't know if people there can read the question on Zoom. So I don't know if there are no more questions. I'm happy to read it again and comment again. What's up to you. No, there was just a question from Andre about the difference between card function and word function. Maybe you can read loud what you answered so that's fine. Okay, if I go ahead. Yeah. So this is the answer from Giovanni. The cow function, the work function both are simple wrappers around Python functions to make them automatically tracked by IEDA in the provenance graph. The difference is that the first one is the calculation. The second is the work function is a workflow. So this is clear difference in IEDA. Calculations can generate new data. This typically wrap external executables such as quantum espresso when you're 90. Those are just orchestrators. They cannot generate new data, but they can call other sub workflows or calculations and return rather than create data generated by other calculations that they call. I guess it's better we have some concrete examples in the next hour so we can better understand these concepts. Okay, so. Yeah. So please, if you could just come here or just give me the microphone I go there. So it's just a statement. The microphone comes to you. I was just wondering if there is any user friendly debugging procedure, because there's so many steps that are intricate with one another so if there's something, for instance, it's that if we are one arising. and we chose the wrong parameters for the estimate. Is there any way to go back to the correct and track changes. Yeah, of course, you can you can go to any calculation that is submitted by the engine to your remote cluster, you can go to the scratch folder to look at all the generated input and output to debug your calculation. Please run. It's so easy to set up an instance that one can also use, you know, develop something into a workflow and once you're done you throw away all the database or, you know, you can just create a sandbox. Yeah, maybe one additional comments so is that it fully agree with both anti one dream fans say that one more thing is explained that either automatically tracks all the inputs and all the outputs or calculation so it's very easy to go to a war for you submitted. You can look at the graph of everything which was called all the subwar from the calculation so we submit one to one to express so one pw2 a 90 90 90 you can go to that one. And there is a functionality called report, which tells you all the information that there are any messages warnings with calculation. You can either go directly to the supercomputer at room fan site or even that you cannot directly ask IE that can you show me the input that can you show me the output of the code. Either the row on the file you are used to look at text files or the past one. So all this information is there and if you share with collaborators, everything will be there and you put a materials crowd everybody can check it. Okay, any other questions. If not, I think we can stop recording. Yeah.