 Okay, good morning, everyone. My name is Bayou. I'm a faculty member at the iSchool, and my research area is natural language processing. For this semester, I'm teaching the text learning course. I think that to see Jeff was talking about text analysis, because a lot of examples I'm going to give, but also to be right from a natural language processing area. So today, this topic is about metadata and trustworthy AI. This is actually a collaborative researchers, Professor Jin Chen, who's expert in metadata. So first of all, I would like to mention that the concept of trustworthiness is not started with AI. So it's a long-standing concept for any kind of computer system. However, because of new possibilities given by the genetic AI technology, there's suddenly a huge increase of public interest in AI. So for people in these research areas, they would feel that we've been doing this for such a long time. Why suddenly there is such a popularity? So this is a new thing, and we have to look at trustworthiness in the new lens, and also the importance. So first of all, okay, so I'm going to move the low tech. Yes, okay. So that's where we have this slide. We're at this turning point with AI, particularly about trustworthiness. And there's a quote on trustworthy AI, like, we may be able to build a perfectly safe AI system, but how do we prevent that from somebody using it for evil purposes? So now, first of all, let's look at the concept of trustworthiness from several different perspectives. One is, where do the academic researchers think about the concept of trustworthy AI, whether it's in contest, and also from the government perspective, the European Union, the US government, how do they view the trustworthiness? The regulation will be an important piece. So this is a paper published by Jeanette Nguyen, a leading computer scientist, and she laid out these principles for trustworthy AI. So I'm going to use Child GPT as an example to describe those concepts. The first one is reliability. So basically it means the AI system does what you want it to do, isn't doing the right thing. For example, if you ask Child GPT to output citations, as Jeff mentioned, it could be a real citation, it could be a laydown citation. So there's no guarantee that it is reliable. And the other one is safety. Does it do any harm? So have you heard of the news story about Child GPT telling someone to go to the suicide? So this is a general harm. And also security. So security is also not a new concept of the AI system, especially this increase the importance in utility. So what if it is under attack, and how would that affect our right in society? Privacy concerns. So the more I use Child GPT, it would be wondering like how will my data be used in our model? So open AI, I checked with their website, they claimed that we will not use user input through API to training our model. However, have you heard of the memorization problem of large language models? So something about it, right? So some of these large language models could memorize a long sequence of tokens. So it depends on what it is, right? It was a quote from Shakespeare that's okay. But if it's your social security number, it's somehow ended up in the training data, but not the problem. So privacy is the concern for our child's model. And the availability. So in one way, the world is not even in disjointed, and the models are not like available to everyone in the same way. The thing about Child GPT is training a lot of the languages like English, but they're also low resource languages in this world. And so some languages are spoken by a small number of people. And they're going to be hard to train language models on that. So in the MLP world, they're studying to try to utilize limited data from low resource language to make it available and useful. There's also money question. So if you want more money, you can pay for this. GPT-4 is more expensive than child GPT. And I check it as my limited time this cycle. I think GPT-4 are a little less likely to output data citations than child GPT. So you could pay more to get better data or better results, but there's a fairness and equity issues there, right? And lastly, usability. I can't help but use it easily. So there are people with different kinds of impairment limitations. They also have accessibility. Okay, everybody use these tools in equally effective ways. So that's another question. So these are the issues in the mind of computer scientists, like the leading researchers. So now let's see what people think about it by the government. So they only consider the legal concern, and the entire society cares about the ethical concern, and we all care about the robustness of the system. So now let's see the system that lines from the European Union permissions. So in the next slide, I'm going to show the element derived from the US government. So here we mentioned about human agency and oversight. So this means that we want human to be able to control the AI systems rather than run it on film. So a human in command, a human in the room. The other one is technically called robustness and safety. But robustness here means that we want the system to be accurate, reliable, and reproducible. So how many of you encountered a reproducible problem when you used child GPT? So today you enter your prompt and you get a result, and tomorrow you enter the same prompt and the result will change. So I'm using child GPT to do some annotation for me and sometimes is going to be really washing on certain answers. So there's the reproducibility question. And the next principle is about privacy data governance. Only manage the data and make sure that user privacy is protected in AI usage. And the next one is transparency. So how do you know how the model came to their decisions or the answers? Especially when these models are used in critical environment, for example hospitals. So if a model is used to help doctors to decide who should be treated next when there's a limited resource, it's a matter of life and death. And if the model is opaque to the users, to the doctors, then there's a big question about ethics. And then there's diversity, non-discrimination, and fairness. So we probably all have heard about bias in the models. So corrective bias inherited in a training data might be captured by the model. I get reflected. So how do we detect them? Corrective bias is a big question. The next one is societal and environmental well-being. So this well-being involved humans well-being. For example, open AI has been using the annotators around the world, especially in some developing countries because of the low cost. However, they probably have to annotate some sensitive content that could harm their health. How are they taken care of? That's a question. And also building the AI models require a lot of computing power. That means electricity. So there's a huge sustainability question when it comes to AI modeling. The last one is accountability. So even though we say AI made a decision, how do we increasingly interpret it to a humanistic system? However, who should be responsible for those decisions? I think in the case of southern forest, when there are some objects in a difficult situation, like the dog, there's a cat, there's a human being. So if you have to pick one, then who would you pick for a southern forest? There will be a lot of questions about accountability. And there will be, in the society, ethical discussions and also those legal conversations. So now let's look at the United States. Actually, the United States government has a pretty similar view on those principles, even though they summarized it into five major principles. The first one is save and effective use. Then you could think of it as a safety by do we have a fallback system when the AI system is broken? So we do have anything to fall back and also it does no harm. The second one is algorithmic discrimination and protection. So you have heard of like an image recognition system cannot recognize some faces. The performance is better for some population, but worse for the other population. The next one is data privacy again. Notice that explanation is mostly for tomorrow's explainable, especially for critical decisions. So the user has the right to know how are those decisions made by AI under work principle and is it reasonable or not? One example is actually I think of, you know, MLA is using algorithm to decide on the productivity of employees. They could determine information and find someone. So then there's a lot of questions about how are those decisions linked. The last one is human maternity and consternation of fallbacks. So I think the effective safe system and the fallback is sort of connect to each other depending on how you define those dimensions. But these two names, like from European Union and the US, basically any academic share a lot in common with different versions. So a lot of concepts, critical concepts are shared in common. So now I would like to have a quick story for everyone in this room. So how do you feel about the trust agreement of AI and all of these? So if you feel like very positive, very good hand, very good, raise your hand. Anyway, I want to hear it. Okay, great. And how about these people feel like on the other end of the spectrum, like I'm very much worried. A third on this room? Yeah, a third in this room feel very much worried. So how about the last, like maybe you're kind of in between, but you're more on the positive side? Yeah, give me another one first. So how about your kind of side? You're on the negative side, but not so like optimistic. A few, but not a lot. So probably like people who are very positive or very worried are like, I think there are quite a few people who are very worried, probably one third in this room. But I see a lot of people who are on the positive side, look at the reservation. So I guess I share this sentiment of this pocket. I think there are a lot of efforts in terms of the development of trust with the AI system. However, identifying those problems is part of the solution. And leading those views could be, you know, sometimes you feel like I'm having a hard attack. So this is part of our trajectory toward priming the solution. So now let's look at some ideas about how to make AI trustworthy. So this is more of a framework that we can think of and apply to all of the views of all of the solutions that we think of in this framework to have a systematic view of that. So this is a diagram for corresponding to the AI risk management framework. So basically it has four different components, and that is a youth in terms of the AI development cycle. So on the top left, you can see this is the application context. So before we start to build an AMR, we need to think about its application context. And at this time, if we think about the accountability, data governance, privacy issues, transparency, etc., it is important to actually think about all of these trustworthiness questions, even before you are right to build the model. And then the next step is, okay, so we decide about what we want to work on and then we need our data. So in the data part, data in the input part, we have a lot of data quality questions with garbage in, garbage out. So in the data stage, how do we manage our data quality, privacy concerns, etc.? And with the data, we can move on to the third stage, that is to build the AI models. So when we build the AI models, we want to make it feasible, verifiable, and fair. So I'm going to talk a bit more about what that means. And after you build the model, we're ready to deploy it in the field. So this is also the time to check in safety, the use scenario, you have the one around some user study to make sure that it is also easy to use and it's available, and also what are the alternative, the fallback. So those are the key trustworthiness concepts. It's attention in different stages of AI model project. So let's look a little bit further into the first part. This is the application context. So we want to adapt a human object approach, for example, you have all heard of human in the loop. We want to have human involvement in preparing, when you develop the system, and also when you're training and testing these models. And also, what are the equal restrictions for developing the model? So in the case of charging between you and also in those bars, they actually put a lot of guardrails in the design, and they use the human prompts to train these models. So sometimes you encounter the situation where the model would refuse to answer your question. And you also see that some people are trying to jailbreak. So those are the things that what the company has concerns about what they use and want to know or what users have concerns. But these should be prepared when you start to build an AI person. And also, I would like to emphasize this concept of respect of human autonomy. So how often do you feel like you're manipulated by an algorithm? Yeah, can you give an example like when you feel that way? Legally. I don't know if you're shopping at Amazon, when they're to buy something, you ended up buying more than you wanted to, just because they showed it to you. We're thinking about it, so that's the situation. Yeah, I recommend this system. I mean, I like both products, but I didn't set out to buy both, right? Yeah. So they're all kind of a recommended system, social media, news website, shopping website. They're trying to give you influence. And the thing is how many choices do you have to turn it on? So those are the choices. One project that I did was actually detecting health advice from research papers. And my goal was to help average people like me, even though I don't have a lot of medical knowledge, I want to know what people are saying about like other people's treatment. I want to know more about that. But when we published it, people are actually now the research community has this ethical review. So they want to see if a project or any potential product would have any potential harm. So you have to think ahead. So I actually do not think that much. I thought these are all research papers. It's publicly available where harm couldn't make. But that only means actually helping a lot to understand things I never thought about. For example, some advice might be outdated because they're from old papers. Now, like after 20 years, we have much better understanding of a disease or a drug or a certain treatment. So they're going to have to be updated. But when we just retrieve advice from individual papers, people like me would not be able to understand which one is up to date and which is not. So how do you give user the ability to differentiate them or give them a warning that you should not just use these advice immediately to your health condition. You have to consult with the doctor on those things. They don't have to be considered. And also allow all of the other of those papers to take the advice out of your system if they think they were thinking, yeah, I think mine is outdated. You should take it off. So do you give them the option to do that? So it's part of the design of the system, not particularly for the model itself, but it's like in its own system. So I think the training from the data science perspective is actually push our students to think more about the application context before you even start to develop, for example, the course project. Okay. And then the next important piece in the cycle is the data quality. So when you gather data, we're building the AI models for a lot of the data quality concern. And one thing is that actually, the quality of the model is largely dependent on the quality of your data. So in this case, if you have messy as the data, if you're trained on identifying birds or other animals by a helicopter, yeah. And there's some system out of the study to show that if you create a model on who knows the high quality data, the performance of the model is way better. So here I'd like to give a quick example, like about sentiment analysis, because many of you do that in your course project, right? So you probably want to ask the question, like, how the training data is obtained. And so who gives a label for this? So you can check, actually, the data description to see how they were obtained. Some of them were obtained by Amazon Mechanical Directory to just ask people to do that. And then, but people have different opinions on the sentiment. So some are like a more a very neutral, think of summarizing, okay, I like, level of positivity or negativity. And sometimes they just use convenient labels like emojis. So this is a treat. I saw a happy face and a positive face, a little bit of sound traffic, et cetera. So there are actually a lot of the data quality questions in each of those models. And my recommendation is not to take the data at the base value, you want to read the data description. So there's this amount of data for the data. So you want to know the feedback before you use this model, at least for defense. So the next step is to build the model. And when we build the model, there are actually a lot of details probably not known to the users. For example, like the parameters, because now like the mega models each have, we could have like millions of parameters to tune from. And then when you develop the model also like the connection between the data, like how the data was input and used, how the algorithms are trained. So now in the machine learning community, there is a push to build like we call it a scorecard or some data description, particularly for the model. So the model descriptors. So each model have an ID. So it has volume of your data, where it's tested, how is it trained, so that the information are transparent to your outside, particularly to the downstream of the page. And lastly, when we deploy the system to the field, we want to make sure that it is safe and also there is a turn of tail when a system breaks down. So these are the parameters. It's for the young to show you the trustworthy AI. So going back to the connection between, first of all, the AI and metadata. So mostly it's about the two pieces. One is the data, one is the model. This metadata, the goal is to develop principles and practices so that we can document everything that's important about the data and also the model. So that these information can be used before evaluating and developing trustworthy AI system. So a lot of frameworks are being posed. For example, how do we document AI models? How do we document the quality of data? And actually, a lot of times those are very difficult tasks. There's so many models there and also data are coming from all kinds of variety. And how do you describe them is actually a question of how do you consolidate and extract the key project. And also, we don't want those scorecards for either data or models to be too long. It's been modelers for like the data analysts. They wouldn't have the time to recruit everything. When you ask people to do too much, you know what's going to happen? I'm going to figure out. So how do you balance the workload and also the usefulness is an operation question and it is yours. So on online, I asked corporations that repeatedly violated regulations affecting privacy and it's a group of difficult, invisibility necessary, judge compliance. Even if good laws and regulations are established, why should citizens and consumers believe in the effect of legal enforcement? Effectively, what? Enforced. Yeah, I think that's a question and also a question I cannot answer with another legal expert. But I share that concern because in May 2023, there was a congressional hearing. I don't know if anyone can watch the video. It says open AICO, last time open, testified at the conference. And why take away from that hearing is actually concerning because the concern is that the industry actually comes forward to the legislature saying that you need to regulate this industry. But on the other hand, the senators actually are asking the industry the question like, we didn't do a good job regulating social media. We don't really know how to regulate AI. So they're trying to learn more about AI to be able to know what kind of regulations is feasible or reasonable. So there are a lot of open questions there. But I think for us it's like if we have more people working on those problems, they will be able to. So just I can believe in efforts right now. Yeah, I love that concern about open AI and regulation. I think there is also a low belay border because if you are the leader in a market, you want regulation to stop border to compete against you. So I think we cannot take everything he says as a fact value because in fact what he wants is the support of his company that is progression for regulation. Yes, I agree. I think that we have to gather like reasons from all aspects, from all like roots in the society, not just the industry, not just the academia, but from like now part of the government region, it's particularly grassroots organizations, individuals, to have this support and discussion on these topics. I think those are important. I have another question. We say that there is a problem with replicability in the generative AI, but we set a seed for the model so then we get the same artwork every time. Because by the time it starts in some DVD, we do not have the opportunity to set a seed. But that could be possible. So you're asking about the ecology of Korea. We set the seed for the random upper generation. That means the same every time. In that case, we will have the same artwork every time. I see. Yeah, so you're asking a question that there are some AI models. For example, if you use traditional machine learning methods, you develop a linear quantifier. You look at the same result every time you run that test. But for generative AI, it's a probability based model and at the same time, there are just so many parameters. So when you retrain it, the result will change. And also, particularly for GPT's auto regress, it predicts the next horrible token. So that adds the instability of the model. I think that is a concern that you will not always get the same answer. And now, especially when you give the different prompts, the prompt will affect its output. So right now, this is, in my tricks, my course, I'm adding new content about prompt engineering strategies. But I have to tell my students that, actually, my lecture title is a prompt engineering science or argument. Okay, I cannot say that these are all my improvement knowledge. This is just people trying to understand. They have done all kinds of tests. We currently understand what's going on in these AI models. So we basically are trying to do some tests on these models on certain and the more tasks they run, the more understanding we have. For example, a lot of the tasks are about reasoning. Okay, so especially when we're moving from narrow AI to strong AI, we want AI to be incredibly human. Then how do we know we're doing the right reason? So when we solve a math problem, are they running, are they going through the steps correctly? And also, recently, there's a new paper talking about AI, child GPT's inability to do some kind of a knowledge inference. For example, we call it a universe of retrieval. So if the model can answer the question, like, who's in our tasks long, right? So they got it right. So if you know the relationship of a parent and child, you should be able to retrieve it in both ways. But if you answer our GPT reverse question, like, I don't know who's the one that's wrong, but let's say, who's a student and then it fails to answer the question. So basically it means if they now have the knowledge of these two persons of actual relationship, they just kind of spit out whatever they're creating data that typically a famous person has mentioned, and then after followed by their relative name. So these kind of inquiries, the more we do it and the more systematic we will be able to tell the limitation in the reasoning process. And then we start to figure out how to improve. What is a large language model? There's a limitation. Not everyone is speaking English. So I'm thinking, have you embedded a translator in front of the model? Every time you enter any language and they translate first time, then you need a model to answer and translate back. You're talking about a type of language, so let's do the live and conquer for low-resource language. I think the question is, the machine translation is also trained in a similar architecture and familiar dialect. So we probably, so if there is a limitation to low-resource language by dividing into two specks may not give you a larger boost. But I think it might be at first trying to understand what are shared knowledge between different languages. Every language has its grammatical structure and things and some of them are common. So if you think after that and then transfer every knowledge in one language to another language, that could be a boost. We're out of time. Thank you so much.