 Welcome to learning analytics tools course. In this lecture, we will talk about ethics and privacy in data collection. In digital age that is Swift 2.0, we have access to data and we can collect lot of data or also we have access to lot of data on the internet. And we can use this data for analysis and to provide recommendations and everything. Similarly, in education, we talk that we can collect data from technology and learning environments such as we discussed metal or we can data, collect data from sensory sensors, biosensors like a GSR or eye gaze deduction or you can collect data from survey questionnaires. For example, the pre-test, post-test survey, performance survey or the system survey questions like what is how the system looks like, a lot of data you can collect from the students' perception also. Apart from this data, we can also conduct interviews after the interaction to your system or before the interaction or you can simply conduct interviews to understand learning process. These interviews can be recorded and can be manually transcribed and coded and used for the data analysis. So, you know you can collect lot of data from different channels and different sources. So, that you have seen various type of data collection methods and type of analysis that can be performed in the data. Consider your researcher, you want to start collecting data from a particular learning environment. What are the primary steps you will take to protect the ethics and data privacy of the learner? What are the ethics in education? How do you protect the data privacy of the learner? Please pass this video, write down your answers and resume the video to continue. Ethics and data privacy, which means students should be informed about data collection. Like you should inform students that you are collecting the data. Also, you have to describe the data privacy policies to students like who has access to data, why do you keep the data, all this information. Also, you have to get the permission from the head of the institute, school, district or the teacher based on your data collection, based on your such question. You study, you have to get permission from them. Also, you need to know what data will be reported to stakeholders. Like, you want to report all the data to all the stakeholders in education setting, you have to think about that. In general, data privacy and ethics primarily includes these four things. Oneness, obtaining constant. The constant form is that I mentioned previously you have to inform to the participant. So, you have to get the consent from the participant that they are ready to give the data for research purpose or we can use the data for research purpose. So, participant has to consent. And you have to anonymize the data. It is very recommended that immediately after collecting the data from the research study, please remove all the identifiable information and anonymize it. You have to keep the master key of student name and that map key, but that should do only with you, kept very highly secured. You have to use only the anonymized data for the research purpose and share with your collaborators and everything. Also, you have to classify the data into identifiable and non-identifiable. For example, I mentioned that you cannot collect the student's facial expressions using web camera. You cannot anonymize that. If you anonymize the student's facial expression with the mask or something, you will not able to understand the student's facial expressions. So, you have to classify the data such as anonymizable or identifiable, non-identifiable. Also, you have to classify the data into videos, surveys, different types of classification can be done. And the last thing is storage. Where do you keep the data stored? Is the data storage space is secured, free from hacking or it has no, it has proper secured place and key protected with some software, everything has to be checked. So, let us talk about consent form. So, we have to get this consent form. So, usually in academic institutes, there will be Institute Ethics Committee or Institute Research Board IRB or IEC, they will issue the consent form. If a institute do not have one, I strongly recommend talk to your institute administrators and create one committee, committee of researchers who already then experience in conducting research and come up with a set of rules so that anybody can apply to Institute Ethics Committee to get the consent form. Usually, the institute which has humanity science definitely will have it. Also, the institute which has animal study or bio or human studies definitely will have the Institute Ethics Committee. And industry, HR and legal department are very well know what to do because they do a lot of data sharing with the clients, colleagues, data and shared data with researchers. So, they have a huge department, legal department, HR department with a big disclamation forms and everything. So, they know what to write. So, talk to them and get the consent form ready for your search. So, what are the things should be in the consent form? I am going to talk about the consent form for the learning analytics purpose. This is not applicable for conducting study in biological samples or conducting study in management science. This is basically for the education sector. The consent form first introduce who you are, that is your name and what are you doing and why you are collecting the data. And purpose of this project study design, you are collecting data for what purpose and what is the study design, like how long the students has to interact with the system. Is there a pre-test, is there a post-test, is there any other information the students should know about this study to participate because the students should know what are they willing to participate. And why do you select particular site of type of student? For example, if you want to know about students background knowledge who are joining in MCA, you might be selecting only the first year student of MCA to know what are the background knowledge. That is a very good time to collect data. So, now you are based on research question, you already filtered the particular type of student or particular group of students. So, that participation criteria should be explained in the consent form. And what are the risks involved in participating this study or not participating in study? For example, the risk involved in participating the research study you are conducting might be they are losing a lot of time because they have to spend another 1, 2 or 3 or based on your research study design. Or the performance in this particular study might be used for the grading, that may be a risk. This is a risk maybe for being participating in your study. Also, students who are not participating in the study who are not interested to give the data to you should not lose something the peers are learning. For example, the students who are participating in the study you might be introducing a high end simulator to teach a database course. So, they understand everything well. But the students who are not interested to participate should be given a equal amount of content. You should not force them by saying that only students who participate will get the particular content about database you will not learn it that is not correct. This involves a teacher also the teacher can say that I will make sure the students also get all the content will be equivalent to what the students were participating in the course. And importantly data confidentiality like where you will store data and who will have access to the data that should be informed in the consent form. A student should know who will have access to is her data so that they can provide consent to you. And the contact information it is not your contact information the contact information of Institute Ethic Committee or the legal person in industry. So, that the participant can contact them to report about your study is there some problem in your study they can report or they can withdraw their study data something like that they should have contact information. So, consent form for minor if the participant is minor get the consent from the parent also the participant. For example, if the participant is class 6 student you need to write a consent form in the language they understand you should not use too much of technical words which they do not understand. What if the participant is class 2 students say 6 year old or 7 year old? So, you cannot write a consent form so they can read and understand there the teacher might need to explain the consent form to student also the parents. So, they will understand your study design and they provide consent saying that I am interested to participate in your study and you can collect my data that is one thing. What if the kids the is infants or toddlers you cannot conduct study without the parents or guardian that is very important you have to get a consent also they should be with you around when you are conducting a study. And the second thing is anonymizing data like you have to anonymize the data as I mentioned this data should be anonymized so that nobody is able to identify which user data it is. Before anonymizing I recommend you to encrypt the data to converting all the names, email, ID, phone numbers to unreadable codes de-identified data. If you are using an email mobile number all information has to be converted into some unreadable code some unique code or something like that or user ID. Also encrypt the data encrypt the data actually helps researchers from the bias. Also if the if the data is lost encrypted data cannot be accessed without the particular keyword or password. The encrypt in the sense I am saying encrypt the data with such particular secured password or keyword. Also it is highly recommended to keep all your data not just research data your personal and sensitive data in encrypted format so that if it get lost not everybody will have access to the data. Coming to data storage data is very very important data is essential in educational data analytics or in data analytics. Therefore we should be concerned about secured data of therefore we must store the data in a secured place and also make sure that data is backed up every frequently. The network drives maintained by the institute is a good place to store the data. Why? Because you can access the data wherever whenever you need. For example you need a data when you are in the campus you can go to the particular network and collect data or if you are out of campus you need the data you can remotely connect to the server encrypt a space then collect the data or use the data. Also it is backed up regularly if you keep the data in the server maintained by institute institute usually backs up the data once in a week or once in two weeks so that loss of data is less because the data is backed up regularly. Also it has it prevents the unauthorized use only the users who are accessed to the institute network will be able to go and access it. So the third party the members who are not part of the institute cannot even access data. Even if the members who are accessed to data in the institute will not be able to access your data because your data is backed up in your secure drive. So what is data security? We talk about data storage and everything. In a simple simple put data security means ensuring that your research data is kept safe from corruption or the also limited access to limited people. Which means backup regularly and keep in data which in a server which has less virus or whereas antivirus software and secure it so that very few people have access that is it. Now that you have seen different type of data collection methods can you name the key stakeholders in research data analysis and management. We have talked about stakeholders in a week one. Think about who are the stakeholders what kind of access they are on the data. Please pass this video and write down your answers. Resume the video after you continue. Please recollect who are the stakeholders and what kind of access they are on the data. So the four stakeholders like researchers, founders, secondary users or institutions when it comes to data collection and data analysis. So these are the stakeholders. All the stakeholders may not have same type of access on the data you collect. For example, each one have different type of data access. So what data should be reported to whom that is a main question. For example, if you are a researcher you are the one who is running and designing a study and collecting data you might have all the access to data. And you might know what data to collect, how to analyze the data and you can think about what conclusions can be drawn from this data. If you are part of an institution that education institutions have ethics committee. So they have their own set of rules to how to use the data and what data to be published in the media and outside world. Also the institute might provide data to researchers sometimes. For example, if you are a researcher who get data from institutes LMS you do not have all the data. The institute thing provides what kind of data you can have access so that you can make some recommendations. So sometimes institute will also act as a source of data and also institute will act as the receiver of the data you analyze as you perform. If you are a founder which you funded to run this particular project it is very important. They have their own strict declaration statement saying that what kind of data should be shared and what data should be used to publications. Most of the time the most companies do not allow you to publish the study you conducted using their data. So you have to be very careful when you running a study with funded research studies. Second reuses are the ones who were actually reading the research paper you published or the staffs in the institute or the government officials you are reporting a data they do not access to all the data you collected. They do not even know what kind of data you collected but they know what kind of analysis you did in abstract level what data you collected and what are the influences that is enough for them. So you should be careful on what data to be given to second reuses especially please do not give any user related information to second reuses compared to all other stakeholders. So what is data ethics? We talked about data privacy how to keep the data secure safe. What is ethics? Data ethics is a new branch of ethics that studies and evaluates moral problems related to data. The data from data collection, data generation, data analysis all these things and algorithms such as ML algorithms, AI and water algorithms you use and the practices like why do you use these algorithms to invent something new in order to formulate and support morally good solutions. So what is the practice you have to do in data collection and choosing a right algorithm and also doing some new invention is defined in this data ethics. So when you come to ethics in LA these are the main points. In LA we collect and analyze the learners data to develop a model to optimize the environment they work in. The purpose is to improve the students knowledge to help the students to achieve something in that particular environment. So the learner should have knowledge on what data is collected and where it will be used. It is very, very important. The learner should know what data is collected that is given in the consent form. Another important thing is learner can opt out of analysis anytime during the study. This is very important although they might give a consent to be part of your study after sometime they can say I want to get out of the study or after giving the data within a time they can say no I want to be opt out of the study please delete all my data you should obey that you should delete all the data. So the learner has a freedom to opt out of study anytime during the study. The method and algorithm you to predict the learner outcome should not be reviewed should be reviewed and corrected if required. The method or algorithm to predict learner outcome should be reviewed and corrected if required which means you might be thinking a student as confused predictor students is confused based on the interaction or behavior something like that. But the outcome should be reviewed a student might say no I was not confused I was actively engaging I understood everything. So the students should be able to give feedback on what is the state you predicted so that your method or algorithm should be able to review that feedback from the students and correct it if required. Also predicted model should not endure the student's progress when you say that a student is having problem in solving this particular particular complex problem and you might think let us give some easier solution or easier method to solve it. But student might say no no I do not want the easier path I want to solve the problem the way I want to solve. So give a option whenever you recommend something give the option to students the student can accept or not to accept. So a student should have option to accept the recommendations you provided which means you should not endure the progress you should not give too many feedbacks too many interventions in a lecture. So from the previous slide what are the learners' rights on data in L.A. List down three important points this answer can be directly from the previous slides please pass this video and write down your answers after completing this activity resume the video to continue. What are the learners' rights on the data in L.A. So the learners' right is learner can opt out of progress anytime during the study they should have access to the data the learner can ask after completing the study learner can ask can I have a look at my data what data collected you may not need to provide all the biosensor data and everything you provided but you can give abstract view to them. Learner should be allowed to be part of algorithm which means learners should able to give feedback on the predictions you made and that feedback should be used to correct your algorithm. Also learners should accept or reject the feedback provided learner should have option. So when you give a feedback or some new recommendation do not force it on the student they should have a option to accept or reject your feedback provided. So in short these are the checklist for Trusted L.A 8 action points for decision maker. It talks about determination like why you want to apply lead analytics and explain it to the students and what data we collected where it is stored everything should be explained in consent form. Be legitimate that is why you are allowed to collect data why you are the one who can come and collect data of some students collect data of some group of people about their personal information why you are allowed to do it be legitimate. Involve stakeholders and inform them what data is collected and what data can be informed to them let them know what is the process. Also make sure you get the consent form from subjects that is a participants. Anonymize the data after collecting and technically where to store backup all this information has to be there. And externally if you want to collaborate share your data with other researchers in other institute or other country make sure you follow the other institute's principle their set of rules. If you are forwarding the data to a collaborate in other country make sure that the treaty between your country and other country is neutral and you can share the data between these two users without any problem. In general in education data set we do not face these issues mainly in defense or military documents they have a very strong treaty not to share data with anyone else. But in education setups yes we can share the data with other researchers from other countries or the continent. So please consider ethical and privacy issues when you are collecting data and when you go out and plan to collect data in your environment like classroom or MOOC or internship tutoring systems. And this is from the paper which is given here. So in this lecture we talked about educational data in digital world like what is the data collected in digital world also in educational settings. That is data can be collected from tele or different sensors all this information. And we talked about ethics and privacy how to secure the data and what are the ethics should be followed when you collect data. And what is consent form and what should be in the consent form how to get a consent form and how to get a consent form for minus. Also we talked about learner's rights the very important is learner have a right to opt out of study anytime during your study. Okay. So in summary this is the from lack 2016 paper they talked about like word cloud of all this content in ethics and privacy they said ownership and control is more important who has ownership on the data who has a control on data that is secured and learner has the ownership and control of their own data. Thank you.