 So I am, I'm not able to present it in person. I had an issue with the visa, so couldn't attend, but thanks for everyone who's joined both in person and virtually. First, I'll go with the AI profile. Yeah, can you go to the next slide, please? Yeah, thank you. Just cut the live stream. Oh, okay. Which slide are you on? The previous one. Okay. All right. Okay, so traceability and accountability has been at the forefront of a lot of new AI development right now, especially with the rise of the, of a rise of a lot of the large language models and AI taking center stage. A lot of these big AI models, including co-pilot, the chat GPT, everything is getting sued for a lot of the copyright infringement. And there seems to be a lot of ethical violations that are coming through. So a lot of it can be attributed to the fact that many of the models that are being made public have the information that is needed to assess a lot of these things being private. Can you go to the next slide, please? Yeah. And a recent article by CISA kind of says that software should be secured by design and artificial intelligence is nothing but software. So there needs to be something like a software buildup materials, even for the AI software. Because there are two things to note here. At many times, AI models are not standalone by themselves. In a complex system, AI models form a, act as a component in a software, which means that we have to kind of describe the whole software along with being a, having the ability to describe the AI properties also. And which is what is making even the US military to start thinking about AI bombs. But the problem currently is, in many of the software buildup materials by themselves, they don't have all the capabilities to describe a software that has an AI component or some details of the AI component can not be captured. So we came up with this AI and dataset profile, which I'll explain in a little bit. So that all the components, the ethical implications, the datasets used, the AI specific details can be described and the compliance issues and the vulnerabilities that are associated with it can be highlighted. Can you go to the next slide please? So there is another aspect of S1, which is often overlooked when it comes to AI spaces that there are several standards like model cards, not standards, several documentation methods like model cards and many other things. However, if we are to do something like identifying vulnerability or ensuring compliance at scale, we need to ensure that things remain machine readable, only then a lot of this can be done automatically. For instance, the slide that I'm showing you is a study that was done by Stanford, where they took a lot of the current state of the art large language models and they assist for compliance across several aspects which were highlighted by the EU AI regulation. And they did the study manually by analyzing the model cards of it. However, with a standard like SPDX having ability to describe AI profile, AI, we would be able to do a lot of these things automatically and at least that's eventually the goal that we're trying to tend towards. And there's a key point that I wanna convey here is we need to have a standard that captures all the required data to do such compliance analysis and it be machine readable. Can you please put it to the next slide? So as Rose was describing earlier, a lot of the details have been applied as profiles and we have two working groups. One is for AI, which actually meets together for AI and dataset and first I'll talk about the AI. Can you please go to the next slide? Okay, so just assessing description of what AI profile is supposed to do. AI profile adds on top of the core software profile as I was saying earlier. AI components at many times lives alongside the software components. So we have this profile that adds on to the core profile. We even borrow some of the elements from the core profile and the goal of this profile is we will enable traceability and transparency for all the components of the AI software. And one of the other important things that we have to note particularly with AI is that we need to also capture the process that was used to build a lot of these components. So in a final software, an AI model is deployed. So a deployed model has a lot of decisions that were made earlier, say for example, which dataset it was used to train on, what are the pre-processing steps that they've done, how exactly some of the bias was mitigated, some of the noise was addressed. However, when we describe just the software, a lot of these details are missed. So one of the core ideas in this profile is to be able to capture some of these processes too and all the risks and the uncertainty. So in another way, we're trying to introduce elements of not only the components being described in the buildup materials, but also the process being captured so that end-to-end compliance and any vulnerability that arises from the pipeline itself can be captured and analyzed. Go to the next slide, please. And so both AI and dataset profile, but for this purpose, the AI profile has four key things. Like the AI profile has properties that is specific to itself, which describes just the AI software. And then there are external property restrictions which we wanna ensure as much reuse as possible from the existing profiles and not really reinvent the wheel. So we have external property restrictions that describes the properties that are borrowed from the other SPDX profiles. And then we have relationships which connects the two profiles, which connects primarily AI and dataset profile and also AI and licensing profile. So wherever there are other profile elements that we wanna describe, we use the relationship to describe them. And the other thing that we've been trying, we have also as part of the AI profile is they are both the required and the optional fields. So required fields describe the minimal set of fields that is required to describe the AI component in a SPDX S-bomb. And whereas optional fields add the additional details to the profile. Please go to the next slide. And before I go and explain the individual fields of the profile, one of the key questions might be why not model cards? Model cards have been used to describe a lot of the AI models in platforms like hugging fairs and many other marketplaces. But the key idea of not using a model card and the fact that one needs to use the S-bomb is that model cards have limited scope in the sense that they only describe the AI model and not the surrounding software. And at many points in time, they oversimplify the software aspect of it. And they do not capture the interdependency across many different things. And they don't have particular fields to describe or express security and compliance or versioning. And the way that they capture environmental impact is pretty minimal. And the ethical and social implications while they have a field to describe intended use and some limitations, they don't necessarily have fields that consider all the different aspects. Can we go to the next slide, please? So to be more specific, one of the properties in the AI profile are we borrowed, we supplied by download location, package version, primary purpose and release time from the core and the software profile. And then specific to AI profile, we start, we ensure that we have, we capture energy consumption, the compliance of standards, which standards it complies to, what are the limitations, what type of model? It is the standard compliance kind of ensures that we comply to essence. We capture the standard something's comply to and the information about training, application, the metrics, the metrics decision threshold go back to the aspect that I was trying to highlight earlier that we not only capture the details of the software component itself, you also try to capture the process that is associated with it. And more towards the properties, we capture the hyperparameters, the metric, the decision thresholds that was used on the metric and for the, to ensure that we have, we comply with acts like EU, EU acts and EU AI act and to see other aspects, like what are the different bar, different, if the privacy information is captured, what is the criticality level of it? You have the fields like safety risk assessment, the autonomy type is the model explainable, which is one of the, which is one key aspect of the EU AI act. So these are the fields that we use to describe the AI properties and can you go to the next slide please? Here's a worked out example of famous large language model, Lama 2. The problem currently seems to be that many of these, for describing an AI model, we need a lot of these details, which are not readily available for many of the existing AI models, but as we can see from here, we are able to describe AI model when all the details are available and the details can be reasonably expected to be captured. And the data set that it is trained on is expressed through a relationship to the data set profile and the license can be expressed as a relationship to. Can you go to the next slide please? And at this moment, I wanna bring our attention to some of the challenges and the next steps that we wanna focus on. One of the key problems right now is we don't have a fully integrated S1 where AI profile is acting on top of the software profile and a software that describes this whole thing being fully worked out and that this is something as a community we would like some contributions around. And one of the key problems right now, which is kind of close to the model cuts problem is we, a lot of the fields that we have, since it also captures processes very textual and we need better machine readable representations and any contributions along the lines would be really welcome. And right now, since the profile is fairly new, we don't have a lot of tooling support that is required for automatic comprehension, automatic analysis and support from tooling group would be amazing around this. And right now we don't have a lot of user feedback for this and a lot of steps towards uptake would really help. Can you go to the next slide please? Okay, that was about the AI profile. I'll describe the data profile and then my colleague Gaukin would kind of highlight some community examples and data set profile much similar to AI profile is used to describe the data sets, try to highlight the different problems and the challenges that are associated with the data set. And one of the key things that we do here that is not super specific to AI profile is we try to capture the lineage and the provenance associated with the data set. A data set is a data of data set that is used to train an AI model particularly, is typically comprised of data from multiple different sources. And sometimes these data sets have a long lineage as in one data set is derived from multiple data sets which was interconnected from many different data sources and it is important to have this whole chain being traced, described bias in every place being noted, the vulnerabilities in every place being noted. And in addition, these data sets can live without being linked to an AI model also. So we have a separate profile which kind of tries to capture all of this. You can go to the next slide please. Similar to, I won't go too much into the slide because this is the structure of properties being there which captures the data set specific properties, the profiles, the fields that are borrowed from the core profile is similar to the AI profile and we have a similar structure to remain consistent. Can you go to the next slide please? And here too, there is another documentation standard or a transparent approach that is data sheets. And the key reason why we don't necessarily use data sheets as be all and all is the data sheets lack detailed metadata being captured. They don't capture all the lineage, all the data sets that the given data was derived from. They have, like though they kind of mentioned the data collection process, they don't necessarily capture the full provenance or full lineage and they don't have specific, they don't spend any specific attention to the versioning details or the dependencies and the privacy and security information are not fully and wholly captured. And finally, though they describe about some biases that may exist with the data, they don't have specific fields that explicitly capture all the bias or the noise information. So that is why we actually for both AI and data set profiles, what we did was we went through these existing approaches that are being used like model cards, data sheets, fact sheets and as a community, we sat through and we went through all the fields to see if they need to be interpreted in the bill of materials. And then after doing all of this, what are the fields that we're missing? It's something that we tried to analyze and include in the different profiles. Can you go to the next slide please? And these are the properties that we ended up with. So in the left, I show the properties that were borrowed from the other fields and in the right, as you can see, we have intended use, we captured the size of the dataset, the noise that is associated with the dataset or that might be associated with the dataset. If the data was automatically collected, what are the sensors that are associated with it? And as for the process, we have the data pre-processing steps being collected, the dataset collection process being collected and for privacy, we have these capture if sensitive personal information is used, what is the confidentiality level of the dataset and what is the non-imization method used? And yeah, we captured these many different dimensions so that transparency, auditability and traceability can be ensured. Can you go to the next slide please? And I also wanna, so on the right, it shows a worked out example of the dataset and through the open source project which is an LFAI project, open datology, we have right now captured over 37,000 datasets metadata using the SPDX AI profile and it can be accessed so that now this, these data can be used or these data profile can be readily used for many different compliance analysis or other to check other problems associated with it. Now my colleague Gaukan will explain some of the fine. Oh, sorry, I forgot about this slide. So the similarly, the next steps or the challenges that we wanna address here is we still don't have a fully integrated SBOM which works out the AI and dataset components and similar to the AI profile, we have a lot of the fields that are textual right now and we need to move on from the textual field so that it becomes more machine readable and we need support from the tooling group to be able to consume these profiles and integrate it as a part of the SBOM and we require more feedback. And now over to Gaukan.