 Good afternoon. Welcome back from lunch. For the next hour, we want to talk about ethics and integrity and data use and management. And may say, well, this seems a little far out there, but I think it's very important to look at a very brief history of ethics in research and in data management, both from research and clinical perspective. Look at the pertinent ethical principles that we try to apply in any use of data. And discuss the idea of data integrity and ethics within our data management practices. And then look at the applications for data management personnel in the ethical use of data. So we'll start with some definitions. Look at those ethical principles. Briefly discuss some guidelines and regulations that you should be aware of. And discuss the idea or concept of data integrity and what it entails. And then look at applications for our work with data. From the Webster's new Riverside Dictionary, integrity is defined as strict adherence to a standard value or conduct and also refers to personal honesty and independence. And the one that we often think of, completeness, unity, or number four, soundness, certainly applies to data integrity. Ethic or ethics. An ethic is a principle of right or good conduct or a system of moral values. When we use it in plural or ethics, the branch of ethics is a branch of philosophy dealing with the rules of right conduct. And those apply to any branch of work. Every branch of work has certain codes or certain principles of ethics that should be applied. When we talk about data and human subjects in particular, we think of three main ethical principles. The first one being beneficence, which basically means do no harm, but in addition to not doing harm, it also entails trying to get the most benefit for our patients or for our research participants. So we try to minimize harm and maximize benefits whenever possible. So that one's easy to remember. Beneficence, think benefits, okay? And not having negative benefits. Respective persons, which is also known as autonomy, comes from the idea that each person has the right to choose for themselves. And so this involves the concept of voluntary informed consent, not just informed consent, but voluntary informed consent, so that that person makes a choice about how their data are used or what types of research or activities they're involved in. And this one also looks at vulnerable subjects. People who cannot necessarily make decisions easily for themselves must be protected in some way under this principle. And so this is where we have protections for children, protections for prisoners, protections for the mentally ill. We include pregnant women and fetuses in this group, students, because they sometimes don't feel empowered to say yes or no. And I think I'm forgetting one group. Did I mention pregnant women? Yeah, okay. So I've gotten the majority there. Generally, you'll see this very much with pediatric populations who cannot consent for themselves and are reliant upon a guardian, often to consent for them. And so they must be protected to be sure that their rights are properly represented. So this is really the principle of justice. You could think of basically as fairness, treating people in an equal and fair manner. And equity, which is not exactly the same as equality, but equity, again, means fairness. And that means fairness in treating all patients the same or all research participants the same. Fairness in making sure that one group is not carrying all the risk while another group is getting all the benefits that applies to research. And fairness in dealing with the individual as far as what their rights are, what the benefits are that are coming to them. And then also fairness between institutions so that one institution is not getting a huge benefit while the other institution is doing all the work. So it applies on multiple levels when we talk about clinical practice or research. Clinical data versus research data. Are the ethics any different? Any opinions? Are the ethics different or are they the same? I hear multiple people saying the same. Yes, ethics are ethics. The principles are the same. How they are applied may be a little bit different. The regulations or guidelines that govern them may be a little bit different. But the ethical principles that we just discussed are really the same whether you're talking about clinical data or research data. And I can't stress that enough. Don't think that ethics apply to research, but in clinical practice we don't have to think about or do those things because in essence ethical principles are ethical principles and we should use them in all parts of daily life. And this has applications for us in privacy and confidentiality of records. Informed consent versus implied consent when using patient or participant data. The idea of data integrity or data quality which may not be dependent on the patient, more dependent on us but still very, very important in both clinical data and research data. And then data security and storage which are also really part of data integrity but we also often talk about them a little bit separately. So for research we have a large number of ethical guidelines that we use but I would like to bring to the group that we should also be thinking of these guidelines and practices when we're using clinical data as well. One of the best known is the Declaration of Helsinki and I did include that in your reading packet. I don't expect that everybody will have read it but it's something that at some point you should read and understand it's not hard to read and there are quite a number of things in that declaration that do apply to data and use of data. So any data manager that's going to work with clinical data or research data needs to be aware of these principles and this is an ethical standard that is used by international committee of medical journal editors so anybody working on a publication needs to have read this at some point to understand what the ethical principles are that govern medical research and are commonly accepted. The Council for International Organizations of Medical Sciences which we often call CHIOMS or SIOMS has also developed guidelines in collaboration with WHO and many international review organizations or IRBs often use the CHIOMS guidelines as well as the Declaration of Helsinki. I don't believe that's in your instructional packet but be aware that there are different guidelines although most of them are quite similar in content overall. Within the U.S. we often refer to the Belmont report which also gives guidelines for the review and use of research data and then Kenya itself has national guidelines put out by the NCST, the National Council for Science and Technology and within Kenya we need to be aware of what those guidelines are. I don't think I supplied that for the reading packet but I will get a copy and supply to ETA because I think it's important for all of you here within Kenya to definitely have a copy of those guidelines and to understand what they are. They do apply specifically to research but again if you look at them closely they have a lot of broad applications to clinical practice. Within the United States we have a number of regulations so they're not just guidelines, these are actual laws that govern how research is done and under some of them how data can be used and you may say what do U.S. regulations have to do with us in Kenya or what do U.S. regulations have to do with any international research or international clinical data management. Well the fact is at least in this setting we get a lot of money from the United States government to conduct research. And while it doesn't fall under these Code of Federal Regulations most of our clinical data here is covered by U.S. government funds through PEPFAR and so we need to be aware and understand when and how U.S. regulations do apply to our work even though we work in Kenya. For people working in other parts of the world if they're getting U.S. government money some of these regulations may apply to them. These include what we call 45 CFR 46. This is a specific federal regulation that governs the ethical conduct of human subjects research and we'll discuss that a little bit more. For any trials involving drugs the FDA has specific guidelines and rules so if it's research involving drugs or sometimes research involving devices then the FDA regulations also may apply. And then for projects which are funded by NIH NIH may have its own guidelines on the use of data and how data are used and why do we have to pay attention to them because whoever's paying for it has the right to make rules about it and so the funder can often have a say in the principles guiding our work. And finally there's something called HIPAA which we will discuss again. This is a U.S. law designed more for clinical records but also for research that guards the privacy of records and something that we also need to be aware of depending on our setups but here we need to know about it. So the U.S. Federal Code of Regulations 45 Part 46 covers human subjects protections it also covers the requirements for the operation and review by IRBs institutional review boards that are supposed to review all research and the protection of vulnerable subjects which we discussed about under the principle of autonomy. A human subject under this law is defined as a living individual about whom an investigator, professional or student who is conducting research obtains data through an intervention or interaction with that subject or identifiable private information. Now while this applies to research you can see it's also very applicable to our clinical setting because by that definition every patient or research participant that we come in contact with is a human subject. Privacy and confidentiality rules as I mentioned apply to both clinical and research data. This is a major concern for patients when it comes to electronic records and data. If you look through the readings I provided you when it comes to electronic data patients are very concerned about whether or not their data will be private and in fact within one of those articles patients sometimes do not go to places that have electronic medical records because they fear about the privacy and confidentiality of their data and so knowing how we protect these things and being able to educate participants and patients about this is very important as we know electronic records are growing and provide a lot of benefits to improving care. Privacy and confidentiality is a process that has to be safeguarded by every member of the team from the person taking down the information from the clinician or the healthcare worker all the way up through the data manager and data management group all the way to the medical records people who are storing and archiving data. At every step privacy and confidentiality must be maintained. So in the United States we have a very specific law called the health insurance portability and accountability act of 1996 which we know as HIPAA and it has issued regulations entitled standards for privacy of individually identifiable health information. For most covered entities who are covered by this law compliance with the regulations has been required in the United States since 2003. Now some of you here will know that are we covered under HIPAA here in Kenya? Do we need to pay attention to HIPAA here in Kenya? I hear a couple of no's. Does everybody agree with that? HIPAA does not apply in Kenya if all the data is collected and analyzed in Kenya. But within our setting we have a rather unusual situation and within many academic setting internationally they will come up with the same situation because sometimes data managers in the United States working for another university manipulate or analyze these data now that means that we are accountable to this regulation because they work for a covered entity that must apply all the rules of this law and so some of you will recall we had a huge discussion about creating data sets for our research on whether or not we had to pay attention to this law and the answer according to our IRAC and IRB was yes you do because Indiana University is a covered entity and if any data manager at IU is going to look at personally identifiable information we have to pay attention to HIPAA. So unfortunately the answer is yes we have to pay attention to it except in some cases where all data is only collected here and only analyzed here. In which case HIPAA does not apply. So this privacy rule in the CFR 45 establishes this category of health information which we call protected health information and a covered entity may only use or disclose to others in certain situations these PHI. It usually requires that an individual should provide consent or an authorization before any of these types of information can be disclosed or used. And that need for authorization though can be waived by an IRB when they say that it's very minimal risk for loss of confidentiality or privacy. The types of PHI and this is a subset there are some states which will list up to 16 of these but the most important subset includes things that could possibly lead to the identification of an individual if they were included in a data set. So name that's pretty obvious that if you release somebody's name into a data set that could identify that individual that could be a loss of privacy or confidentiality. All elements of any date except for the year are considered PHI. A social security number and I would for Kenya's purposes that would be the national ID number. Driver's license number. Any subdivision smaller than a state. That's a pretty tough one because most of our patients come from villages. But that village or location, sublocation is considered PHI because it's smaller than 5,000 people or smaller than a state. URLs or IP addresses for any individual. Vehicle numbers or license numbers can be traced back to an individual and phone numbers. These are some of the more commonly used PHI. Please take a look at that list because you might see it again. Just a hint. Although I hear all of your quizzes are open books so you know exactly where to go for that. So PHI only become personally PHI when it is received, created or received by a covered entity. So when it's sitting in our database it's not PHI. When you put it in a data set and send it to a data manager in the US to help create a data set who is working for a university or a health plan in the United States now it's PHI. Because a covered entity is anyone working in a US health plan, US healthcare clearinghouse or US health providers that transmit electronic information. And so in some instances for us in Kenya this now will apply if we are transmitting any data to one of these groups in the United States. So while I certainly don't expect people to understand everything about HIPAA you should understand where to learn about it and when it does apply in the international setting because it is an important regulation. A researcher themselves is not a covered entity unless they are a provider in a covered entity. So myself, I am a covered entity because I work for Indiana University and they are a covered entity. So my research comes under this law even though I am working in Kenya. And because we collaborate with many different international schools we are often going to be in this situation when we are using data in research. Research is governed by HIPAA if it is obtained from a covered entity that does not apply to us and as I mentioned the IRB may on occasion waive the HIPAA restrictions on use of PHI. Now we have gotten a waiver here at Moore University for the use of specific items within our data sets. That includes birth dates that includes patient locations that includes GPS data because we were able to argue successfully to the IRB in the U.S. that it would be virtually impossible for any data manager in the U.S. to identify a Kenyan patient based on those small pieces of information. And so they were able to give us a waiver which makes our job here much easier but not all IRBs are going to make those waivers and so sometimes you may be in the situation where you are going to have to really manipulate data to make sure you are not using any of that personal health information. So how does this all translate into practice? We've talked about some guidelines we've talked about some regulations why does that all matter? We've talked about some ethical principles Well electronic health information whether it's used clinically or whether it's used for research has a lot of important applications that are applicable to ethics and also has some misuses which are very applicable to ethics. So we know that electronic health information can improve quality and safety of medical care. That is beneficence, maximizing benefits. That is beneficence. We know that electronic medical records can be key to sometimes showing accurate outcomes in research on health or health practices that again is beneficence but that benefit is now extended to all of society and not just to the individual. If we analyze data from a lot of people and use that in public health practices or evidence-based medicine that again is beneficence. We are maximizing the benefits of that information for use to improve care for many. Confidentiality though and privacy can be lost if we do not properly safeguard the security of our data and so that would be a breach of autonomy personal choice about the use of information and that could be a breach of beneficence if it causes harm to an individual. Information sometimes is or can be used without a patient's knowledge or consent. That certainly applies to a breach of autonomy or a breach of the idea of respect for persons. We'll talk about that a little bit later on because there's often questions about who the data belongs to. Does it belong to the patient, the institution, to the research project? That is a question. So when can it or can it not be used without the patient's knowledge or consent or is that always required? But this definitely applies to the principle of respect of persons. And finally, depending on how we choose to design our databases or collect our data we could end up excluding some very important populations and in so doing we may change the analysis, we may change the way that outcomes are reported and if we do that improperly now we have breached justice. We have not paid attention to the correct design in a way that treats all populations fairly. Data integrity. Now I'm sure everybody working in data has heard a lot in all of your studies about data integrity and if you go online and start looking for definitions of data integrity you will find a lot of different definitions and these are just four of about ten that I found. The assurance that data is accurate, correct and valid. Accuracy and consistency of stored data indicated by an absence of any alteration in data between two updates of a data record and that it's imposed within a database at the design stage through standardized rules and procedures. Also involves error checking and validation. Another definition states that it's the exact duplication of sent data at the receiving end. Doesn't say much about the collection end. And then a final one, assurance that data are unchanged from creation to reception. I actually like this one, this last one because it covers all points from the person marking things down on a data collection form all the way through to the person storing that file or backing up that file or archiving that file. This one includes every step of the process but you'll see the common thread throughout all of these is talking about accuracy and completeness of data. And the idea that this is a process which starts with the first point of collecting data to the very final point of data storage. So think of data integrity as a process and it's a process that depends on collection which is accurate representation of the data. If a patient tells me something in the clinic but I think well that's not right and I mark down something else I have now compromised data integrity. I have not recorded accurately what the patient told me. Data transfer. Any time that data is transferred from one point to another it should be done in a way that ensures that that record is complete and is transferred in its entirety and is received in its entirety without change. Also storage and security preventing loss of data is just as important in data integrity as making sure the data are accurate at the time that they are received losing data also compromises the data record and so storage and security are very important. This involves sharing of data how much do we share who can see it and then the analysis of data the proper representation of those data when we start discussing what they mean. These are all important in data integrity. This fabrication and falsification is a very serious breach of data integrity and this is an important challenge to data integrity in research. There have been very highly publicized research where data have been fabricated or falsified in some way in order to make a certain research project or a certain analysis look better and that is a serious breach of ethical integrity. However, that is only one of the points as I showed you earlier in data integrity but one that we hear much about. Human error also contributes to loss of data integrity. If I mark down that somebody's liver function test is a 40 and when the person entering the data marks down a 4.0 into the database data integrity has been compromised is that a breach of ethics? I see one person shaking their head how many people think that's a breach? How many people think that's just human error and should be expected? Good. It's human error. While this is a breach of data integrity we would not consider that unethical behavior because human error is an accepted part of human practice. We try to minimize those errors as much as possible to protect the integrity of our data but we also recognize that unintentional human error will that is not a breach of ethics whereas purposefully fabricating data or falsifying data is a terrible breach of ethics and I hope nobody disagrees with that statement. Sometimes there are things that can compromise data integrity that are not unethical while there are other things that compromise data integrity that are terribly unethical. Try to learn to balance those ideas in your mind that there is unethical behavior and is human error which is an acceptable part of human practice although we try to minimize it. There has been quite a bit of concern about research misconduct because as I mentioned there are some highly publicized cases where people have fabricated or falsified data to make the results look better whether that's from clinical data or analysis from research data and so the US Department of Health and Human Services actually sponsored a conference in 1990 on data management specifically to summarize the ways in which conduct of research depends upon the responsible use of data and responsible data management and out of that conference they came out with a number of very important principles and guidelines and one of the responsible research begins with experimental design and protocol approval. Does that have anything to do with data? How many people here have been involved in writing a research protocol? No one's holding up their hand but I know three people in here at least who have been. So, yes, design of a protocol has a lot to do with data because every protocol should have a data management plan a data sampling plan a data analysis plan much of research design is about data and so understanding that means that the ethics of data integrity begin with the design of the study not just the conduct of study but the whole design of a study. Studies should be designed that involve record keeping in a way that will ensure the accuracy and integrity of the data and an attempt to avoid or minimize bias whenever possible. The design guides the criteria for including or excluding data from the statistical analysis and it should entail responsibility for collection, use and sharing who is going to see that data who is going to analyze the data and how is that data going to be used. It cannot stress this one enough and you will probably see that a couple more times. Everybody in research and I should expand that say everyone who deals with data from a patient or research participant so let's expand that to both clinical and research. Everyone in a project or a clinical care system has a responsibility to ensure the integrity of data that comes from the person collecting the data to the person picking up the paper forms and carrying them to the data entry operator to the data entry person and the data manager to the person filing those forms to the person archiving those forms and storing those forms everyone has a role and a responsibility to make sure those data are maintained and those data are kept confidential. The ultimate responsibility within research does sit with the principal investigator but the fact is that anybody who helps plan the study helps collect data, helps analyze or interpret the data or publishes is part of the publishing of the data or maintains records is responsible for helping to maintain data integrity it is a team effort it is not something the data manager alone has to grapple with but something that we often rely on the data manager to educate the rest of the team about and so you need to be as a data manager an expert in data integrity and know how everyone's behavior affects it and how you can help educate them to protect data integrity whenever possible data collection can be very repetitious very time consuming and quite boring and there's often a temptation to underestimate how important it is we've seen this on our own forms if people consistently miss fields or they don't understand what a field means and are often marking down the wrong types of information or incorrect information this severely compromises data quality and data integrity so it is a very important part of the process so those who are responsible for collecting data and in our case that is the clinicians all of the care workers within the other programs within our clinical setting everybody must be adequately trained and motivated to collect data in the most conscientious way possible and the most obsessive compulsive way possible and we are planning to do here quite a number of trainings throughout the care system to try and improve data quality to educate our clinicians and our workers within the different clinical settings about the importance of data quality data collection plays a role in that we should always employ methods that limit or eliminate the effect of bias whenever possible and we should always keep records of what was done by whom and when so this is very important in the design of a database for those of you who work with the AMRS we want to know who entered the data who viewed or changed data from a data management perspective because keeping records of what was done by whom and when is very important when we start looking at data quality and ensuring data integrity and finally analysis and selection of data oh good we'll have a bit of time for questions finally analysis and selection of data data does not stop data management and ethics of data management do not stop with the creation of a data set but it also goes as far as reporting how data is used in analysis is also governed by ethical principles we do not change analyses to give the outcome we want we report all the data we analyze all the data and we report what we find in an honest fashion however because we often can't report everything that's done we often have to choose which ones we think are more important and that's where judgments come in but we need to try and make decisions which are not just favorable to the outcomes we were hoping for but decisions that make sense with regard to representing the data accurately so we have to critically evaluate the reasons that we included or excluded data from a data set we have to talk about those we have to tell people what we did or did not include and why okay and we have to have a lot of regard for how that might bias the outcomes when we're doing analysis and we have to clearly document at each step of the process what we did why were these things done so that if somebody wants to come and do that analysis again they could replicate it and if we used an analysis technique which was a little bit controversial we chose a test that maybe some other statisticians wouldn't use we should explain why we used it and justify that and open ourselves up to comment by others on whether we had the correct analysis to do retention of data what should we retain it might be impractical to store large amounts of data particularly if they're on paper okay and so what do we choose to retain at minimum enough data should always be retained to be able to reconstruct anything that was done for an analysis now in a patient record from a clinical standpoint everything and we should retain everything for as long as is required by law by the country we're working in okay this will differ from country to country and from institution to institution okay I honestly don't know what the law is in Kenya as or if there is one as to how long more university or more teaching hospital has to maintain patient records for research data it may be determined by the funder may be determined by the country you're in may be determined by the country that funded the study so for many of our studies here we must retain data for five to seven years depending on the state okay the state law that applies to that project in the United States if they were funding that project okay but for clinical records that is going to depend on the institution and the country's law sharing of data who gets to see data this is important for responsible research and also for responsible clinical practice de-identified data should be shared and this is a conclusion again from that same conference we discussed earlier data should be shared so that others can verify your conclusions or analysis we don't throw analysis out there and then suddenly decide now we're not going to let anybody see what we did okay that is not considered to be ethical it should be verifiable by others but we do not share personal information on participants or patients we only share de-identified information, aggregate data so that personal information should never be shared because of confidentiality and privacy concerns data security involves two processes and I hope this is second nature to most of you by now but this is a good review it means both limiting access to data backing up data data backups we don't just do because we worry about it we do it because it's part of data integrity and data security loss of data compromises data integrity so we do not want to lose it and because we know that computer systems fail data can easily be lost we do regular backups to prevent that so we can rebuild data sets in places when necessary as far as limiting access that can be locking up all paper records password protecting all computerized information limiting access to any records to people who have the defined privileges to see that data defining privileges of access for different levels of users with an electronic system and then preventing outside access so that people can't hack into private information ownership of data I'm not going to give you an answer on this one but I'm going to make you aware that this is an important ethical concern within data whether you are talking about clinical data or research data but often comes up with research who owns it does the patient own it or does the hospital own the data you will hear you say patient I agree with you I think the patient owns the data I should not do anything without that patient's consent unless it's part of a hospital policy that these data are going to be used in that way and they've accepted care within that facility as is the case here but I personally as a physician feel that that data belongs to the patient not to me but to them however you will find administrators at MTRH or any hospital in the world who tell you that data belongs to the institution that's part of their record that's part of their mandate and so there's disagreement sometimes on who owns patient level data when it comes to clinical care research does the research own the data does the funder own the data and when you publish who owns the copyright does the public we sign away our copyright to that published information there are a lot of ethical concerns when it comes to ownership of data and there are no right or wrong answers these are things that are hotly debated all across the globe but something you should be aware of you know this is an ethical question when it comes to use of data who owns it and who has the final say on how it should be used and finally ethics in publication we've covered many of these but because many of you will be involved in creating data sets or analyzing data in journals you should understand ethics in publication have a lot of guidelines that are very pertinent to data and data management research should always try to answer a specific question we don't just go out and collect masses of information and then start doing willy-nilly analysis to hope we find some question in there we can answer research should always have a specific question in mind and we should try to answer that question through proper collection and use of the data that we get statistical issues such as sample size and what statistical tests we use are very important part of design to ensure that the data is very likely to answer the question that we want I may design a study with great research question but if I choose the wrong strategy to collect and analyze data with no hope of answering that question that research is now unethical because I am not creating a plan that will maintain any kind of data integrity okay IRB approval is always required when using human subjects human tissues or medical records and so there must be oversight by an ethics review board for the use of human subjects data in whatever form okay for research to be considered ethical we don't just go out and start doing collecting data and reporting data from human subjects without some type of ethical oversight data must be appropriately analyzed I shouldn't say should be it must be okay now there may be questions about what the appropriate analysis is biostatisticians don't always agree on the same test but inappropriate analysis are not necessarily ethical misconduct again this comes with the idea of human error or human judgment if somebody is doing a master's project and they do an analysis and they do an analysis with every intention of doing a good one and they analyze to the best of their ability they may report things that were not necessarily as well reported as they could have been okay they might even be wrong because perhaps they didn't know to use the right statistical tests is that ethical misconduct no that's human error again okay would have been better if they had the advice of a biostatistician or a statistician who could help them analyze the data but that is not ethical misconduct that's human error sometimes biostatisticians at the very top of their game disagree on which test is the best and so the analysis there may be disagreements about the ethical misconduct there is sometimes disagreement on how to analyze the data in the best possible way so that is not ethical misconduct however as I said before fabrication or falsification of data is always considered ethical misconduct finally sources and methods of attaining and processing data must be disclosed exclusions should be explained in full if I decide to remove something from the data set I need to say so and methods used to analyze the data should be explained in detail it's okay it's considered fine to go ahead and do other analyses from the data once your initial question you've made an attempt to answer that but again we don't just go collecting data and then start saying okay now what in here is there to analyze okay and finally bias which always exists should be discussed in detail in all publications or in all data analysis