 So my name is Nira Sitola. I'm PhD student at high school. So brief introduction about myself is I've completed master's in computer science from Syracuse University. And I'm working towards ethics in AI and data science as my research topics towards my PhD. So today I'll be talking about some of the problems, approaches and future related to ethics in data science with perspective given on the jobs as well. So maybe I'll start now. So if you have any question please feel free to stop me or let me know. So just to start with the motivation. So the job demands for the data science are really growing according to the US Bureau of Labor Statistics. It is estimated that around 11.5 million new jobs will be there by 2026 for the data science. And what does the typical data science jobs require is it consists of processing the data to find the patterns that will benefit the organization to make some business decision which would profit them. But then with the popularity of the data science and the machine learning and AI that has been used in different domains, there's also rise in the ethical concerns related to that. And it's also being like, look carefully by the industries and now the job descriptions on the data science are including the job responsibility related to the ethical concerns. So these are not only related to one job industry but I saw that many diverse industries including the responsibility related to the ethics in the data science in the job descriptions. So I was just quickly looking on the job and like advertisement site like LinkedIn and indeed.com with the title data science and then including the ethics in the job descriptions. I found some of them which I'm just going to quickly go through. So here I'm just showing some of the job descriptions that has data scientists on the title and also includes AI ethics on the responsibility or the some of the tasks that would require the position to handle in the future. So here, just looking at the first one is it says the distinguished data scientist on ethical artificial intelligence and it was asked on the Verizon which is the technology and the communication. So it's more about the mobile network and internet. So here they asked about like knowledge of AI ethics including fairness, accountability and transparency and previous work related to machine learning audit. So other is like the data scientist on the freestyle outfits and it was on the statistics. The company is more about personalized styling, shopping and clothing company. And it talks about like if you think ethics in AI the impact of machine learning on society and want to bring that to bear in our work here. So and the other is the Microsoft which is software electronics and PC and other different services Microsoft provides. And here in the data science job title it asks about contribution towards the ethics and privacy policies for collecting and preparing data provided providing the updates and suggesting around internal best practices. So even over here we can see like three different industry over here like Verizon is technology and communications whereas this fix is about personalized clothing and Microsoft is more about the software electronics and PC and similarly other industry also are including ethics on the job descriptions like other one icon was scientist engineer on data analytics from Segan and then it asks about testing the machine learning models such as cross validation AB testing for bias and fairness. And the other one was data scientist on the other company which has a data science as a service platform. And it also asks about the knowledge about fairness accountability and transparency in machine learning. And similarly the Walmart which is the retail department store or the grocery store it asks for the data scientist position and then it also have inclusion of like that's related to the ethics and bias on the AI and machine learning where it says about to measure the diagnosis and mitigate the bias in our AI and ML solutions. So the list tip on increasing so other one was like chemical insurance which is the insurance company and then asked about the cross validation and bias and fairness on the machine learning models over here. And similarly, automotive industries like previous asked about the data scientist position which also includes the bias and fairness. And similarly Amazon web service which is about more about cloud computing also required like understanding of the in depth of technical and scientific issues related to machine learning fairness and explainability and develop new solutions to solve in ML problems ML fairness and explainability. So the list may keep on increasing. So the whole which I wanted to show was like it's not only one particular industry which is looking at taking more seriously on the ethics but the whole different diverse domains of the industries are like taking it more seriously and have started including on the job description as well when they ask for data scientists they also look for some knowledge on AI ethics. So I just want to start with the sum of the backgrounds and like how the problems have been growing and what are the problems that have like got more attention. So some of them have already like at the end of the previous talk, Professor Daniela Kunal so the example of like how machine learning model or the deep learning model can be biased. So there are various other cases that has been found. So I'll be discussing some of them. So one of them was like YouTube auto captioning which was which a researcher found that there was higher error rate in women speakers as compared to men. And this was published around like 2017 and other is like word embedding. So many of us are maybe familiar with like comparison of the word embedding where the words of similar were predicted. So as an example, there was like man is to king then woman is to queen would be like predicted. So using the similar kind of embedding researchers found it could be really biased. So they found like man is to computer then woman is to homemaker. So that was biased found on the word embedding using the word to vex. And then which is the popular model everybody used for word embedding. So other was like collection of data itself could be biased. So this was on 2013, I was in 2013. So collection of data to understand the portholes using the smartphones in Boston area could be potentially biased because the smartphones may not be accessible to everyone. So that may cause bias in the data collection. And eventually when those bias data are used for building any models or making any decision then that may not result on the fear decision and cause the bias decision itself. So other issue was from the compass which is the software that was built and then found to be biased against wondrous as it produced higher risk scores for black defendants compared to the white. So other problems are related to the privacy as well. So this was back in 2012 when target the retail industry for the store was able to predict in is a pregnancy and sent advertisement for baby products when even the family of a teenager was not aware about pregnancy. And the other problem of privacy was found when like two anonymized data from the Netflix and internet movie based database were combined by researchers and they could identify the individual in the record data set, even though it's two data is both of this data set were anonymized separately when combined they could reveal the identity of individuals. So other problem was tables.com like charge different prices based on customers estimated chip code. So that would be biased based on like because based on the chip code people the system was trying to estimate the people's spending habit incomes and other different things which then they used to present different prices for the customers. Similarly e-commerce websites like Home Depot and other travel site display different pricing based on the mobile and desktop the user are using. So this could be like privacy related data that we may not be aware but then systems were using those to give some recommendation which were problematic. And like one of example we saw in the earlier talk different problems with the image application as well. So Google made headlines in the news on 2015 when with the headlines like Google photos labeled black people as gorillas. And this was the other case where AI powered facial detection software built could not detect the mask in face. And there are other serious issues as well like in the research where the researchers tried to build a prediction of like being someone being criminals using the images of the real persons and trying to predict it. And it was published as a paper on with 89.51% accuracy. The model could predict if someone would be criminal or not which drew a lot of like negative attention and researchers had to justify their work. So and the other work was that caused like that drew the attention was like model built to predict whether a person is gay which raised the serious ethical concerns. And this was this project I think was used by using the 75,000 or something images that were found on the dating website. So this drew a lot of like attention on like how the data that are available on the website are being used and how can they be used by anyone. So the concern was I think like just because some data are available can we use it or not and then how that affects the privacy of the other people. So there has been a lot of like approaches to handle these policies like problems with some of them being introduction of new policies, rules and the organizers themselves. So some of them are like California introduced the new privacy law that would allow users to have rights to see, delete or stop the sales of the personal details from all kinds of tech industries that operate in the state. San Francisco in California ban facial recognition technology by police and other agencies. And the other policy was the after the fatal accident of Uber autonomous vehicle in Arizona Uber was banned from further testing there. And even the researchers, universities and the industries are collaborating together with multiple disciplines to see what are the new policies rules that are needed to handle these problems better. And we're maybe other one of the popular policy maybe GDPR, which I think is like implemented in the Europe. And the other approaches is to look for the algorithmic transparency. So understanding the machine learning results can become really hard when the data grows when we have large data and then it has a lot of features it becomes difficult to know exactly what's going on inside and how these models are making decisions. So which makes them like kind of black box model and these models can turn someone's life upside down. Algorithms can use like statistics and produce probability scores which are difficult to interpret. And ultimately still they would be used to make decision which may include like someone could be bad hire or not. Like when I, if I apply for a job if my application has some feature that will just reject my application even for the job without me knowing like what's going on over there. And it could also be something like risky borrower if I'm applying for a bank loan or anything it could be rejected based on different features that system has or the machine learning model has learned and may try to relate to my application. So that could be some of the negative impact. And so there, we have, like algorithmic transparency has been one of the focus of the whole research as well to make it more transparent on what's going on behind when the machine learning models or these algorithms are making some decisions on any of the tasks. And one of them has been like investigating an effect of removing any training point of the model on getting inside of like how it's influencing the training or on the prediction of the model. So other one has been like visual analytic system. So that which looks on the goal of bringing the humans in the loop than just relying only on the machine learning models or for making decisions so that there's like more understanding on what's happening on the predictive outcomes of the system. We can also see some of the deep learning package which like TensorFlow shows some of the visual analytics on displaying how it's making decision or how the data has been transferred in the algorithm itself. Besides that, there are also like different kinds of computational methods that are like immersing which are trying to handle the bias and fairness issues. One of them is training the separate models for the different groups so that it's fair for different groups and rather like combining everyone together. And there are also being different introduction of like the metrics of fairness. So in the computational field, there have been like a formulation or the equation for the fairness itself which the model would try to learn and be careful of not making a bias decision. So the goal of the model would be to maximize the fairness for the similar treatment among the groups and individuals. And beside them in the universities, there has been like different courses on ethics to make students more aware about these ethical concerns on the field of AI and data science. So these courses are, some of them are data oriented. So one of the example I saw was where the students learn from their own data on like what can the data reveal, but privacy issues and then predictive analytics, what can it reveal from the data so they have like some hands on experience on what are the ethical concerns on the particular data and how can they think about it. And there are also courses which includes theories and philosophies related to ethics because ethic terms itself may not be clear to many of the students. So there are different philosophies that are being introduced for students to make them a little bit more aware about the ethical issues and how they are handled on different fields and domains which can be taken into the AI as well. And the other way is to introduce the case studies. These examples that I discussed about the problems are also being introduced on different courses so that students are more aware of like what are the previous problems that have occurred due to the use of AI and that caused the bias or the unfair decisions. And some of the future work I see is like the interdisciplinary approaches. So instead of like only computational or the computer science being more focused on the AI and the data science, I think more of the current works have started in including experts from different fields from like social science and other fields like journalism or the domain experts which then gives like more understanding of like what could be the issues related to bias and fairness in the algorithms and only thinking from the computational perspective. Other future works I see is the cross-cultural differences in the case studies and research because even with the AI ethics and fairness bias all these different issues that are there the most of the focus has been on the Western world and even on like more on the US society. So there are more future works to be done on the implementation of these case studies on different society. How would this be related on some of the Asian context or something related on different society because there has been some problem on like just importing algorithms or the algorithms from one society to other may not work. So as an example like we can adopt the technology because the computational system may be similar on different society but then ethics may be different on different society which makes it a little bit complicated. So the understanding of different cultures and backgrounds may be something required more for developing the ethics and to mitigate the bias issues on related to the AI on different societies. And there are also challenges on implementation itself for these ethical issues because there's lack of unified regulatory framework and as well as it's vague to apply in practice because if we think from the computational perspective we can just like write some code or do some mathematical computation to ensure if everything is correct but then if we think from the ethical perspective there may not be like something that would apply on every cases because that's something which needs like more analytical thinking and understanding. So it becomes really difficult to apply in practice even if there is policies and then if there are rules of regulations and like I said there's like proven methods to translate AI ethics principle into practice which is the other challenge and then future work. In context of high school I see there like the good various future works as well which would help on the future works. Some of them is like the school gives broader understanding of data. So from my personal experience I come from the computer science background. When I was in computer science most of my focus was to get the data into the model and increase the performance of the model but then I was not much aware of like how the whole process of collection of data labeling of the data goes. So that's something which I learned in high school which gives the broader understanding of like how even the research question or the problems are defined and how the colleagues recollect the data how we label the data what are the problems that may occur and labeling the data. So this gave me like broad understanding of like what may be the possible issues when I just take the data and then build a model. So also high school is diverse. So we have experts from the different domains. So knowing data from different domains would also be useful to understand like what may be the problem and one domain may not be another other but then how can we think about like from the domain specific problems. And so like I said, the diverse faculty who have expertise on different kinds of data and methods as well. So one of the easy way for me was to reach out to different professors and talk about the research and also to volunteer on the research projects which gives the idea of like how to handle the data how that data handles and what are the problems experts think about and then how are the approaches to take and how to address them when we are doing some processing or applying them. And beyond that, they are also like diverse student collaboration which includes like different departments of students from different departments take different courses in high school like data analytics and natural language processing, data science, data visualization, human computer interaction and others. So there are a lot of opportunities on collaborating with the different students themselves which gives them understanding of like how others view the data because something I learned like coming into high school was like, I have different kind of biases that I may not be aware of but when I work with others and I realize those kind of bias and I understand like where I can think more critically when I'm applying data and making some changes or trying to make some decisions. So here are the references. So maybe if you have any questions. Yeah, so thank you very much. I think there's a couple of questions that we had time for a couple of questions. So the first question was you talked about job descriptions, job postings if you will, where ethics was a key part of it. One of the questions is did you in your searching find any university roles in a university data science positions where ethics was discussed? So I found most of them related to university was like teaching jobs related to ethics but I did not like see with the ethics on the universities off from that but then I think like since different universities are collaborating on and creating the labs on the ethics on the data science. So I think there definitely to be some opportunities for the students who want to do research or to work on that. But I did not look for something related to university and the data science jobs. And maybe time for one last question which is what's one piece of advice you could give either current high school students or recent graduates that are starting to do data science so they weren't hired in into a role that specifically talked about ethics. They're doing data science but what's the one piece of advice that you could give them to maybe either think about something or how can you help them kind of be more ethical than like what's the one thing you would suggest? So those who are in high school I would suggest just to be much involved on the data science project with different faculties because I see like when we're involved in the different faculties research projects we can see how the data is being handled and then different questions comes up not only within like from ourself but different students or the students or faculties or the researchers ask about ethical concerns. So that gives some of the awareness of like what to think about. So that's something I would say and those who are like recent graduate as well I would say like to keep on looking for the news and then as well as the because I keep on seeing different kind of ethical issues. So that's something I would say for the recent graduate as well to follow up on the research and the ethical issues because those who are in high school I think they have some opportunities to even work on the projects related to the AI ethics and then data science.