 Thank you, can everybody hear me all the way to the back to I'm gonna talk like this No protests. Okay, I guess My name is Philip by I'm gonna talk the first half of this talk about crowdsourcing genome-wide association studies Here's an overview first me Until the privacy implications and after that Basti So first of all, what are genome-wide association studies? The purpose of genome-wide association studies is to link genetic variants single nucleotide polymorphisms To certain traits like eye or hair color or even probabilities to develop diseases like diabetes or certain types of cancer What is a single nucleotide polymorphism? What you see here is the DNA of two individuals and a single nucleotide polymorphism means that a single nucleotide is switched between both The top one is a GC and bottom one is a TA. So it's just on one position and In the lab if you want to analyze SNPs, you have to use microarrays, which if you recall high school are based on the principle that DNA strands bind to their antagonists. So what we have here are fixed probes and in the middle we have a fully complementary strand binding and because these probes are labeled with fluorescence, you can see the light in the lab and Geno-wide association studies work by basically taking a whole population and then grouping it in two groups Here we have a healthy person and with a nice head a carry-over disease In this example, let's say it's prostate cancer So if we split them up, we have the healthy people on the left and the prostate cancer people on the right and Then we check certain SNPs and the SNPs have unique names always starting with RS and then some number and Because we are diploids we have two nucleotides at that position So to the left you see TT, TG, GG and to the right you see GG, GG, TG and GVAS work by Comparing the frequencies then you can see here our healthy population has TT 33% 50% TG and our prostate cancer population has 60% GG So that's a big difference and we can link the GG SNP to prostate cancer and Some couple of real-life examples. So let's get all identified four variants which are linked to heightened risk to develop diabetes type 2 There's a slightly weird study by Kogan et al which linked one SNP to pro-social behavior It's not that it's a bit, how do you say? controversial and Big study good study is from the Wellcome Trust which linked 24 of these SNPs to seven major diseases So there's always a couple of problems of genome-wise association studies here We have we have to have a large enough sample size We have to correct for multiple testing and of course correlation is not always causation and nowadays GRAS can be used by the private customer by direct to consumer genetic testing companies Like for example 23 and me or decode me They analyze about a million SNPs and put a summary of the disease risks together with Anson Street and it's just 200 bucks It's not much. It's nothing A couple of the companies doing that and the best thing is that they send you the raw data. They don't lock it up from you looks for example like that and 23 me alone has over a hundred thousand customers and According to their numbers 76% of their customers agreed to proceed to have their data used in research and 59% of them shared their phenotypic information with the companies after said phenotypic information is everything that which is Describing your body meaning phenotype could be eye color or risk to develop diabetes or skin color all of that and There's research going on in these company labs Turn if you meet published a couple of studies with up to 3,000 participants of their Customers On the one hand they were able to replicate all the studies which showed that their approach works and on the other hand they found a couple of new associations for Parkinson's and The problem is that People are already sharing their raw data from these DTC companies with other researchers and About one to five percent of all customers of 23 and me would allow would be able to share their data there would Be okay with that and there's one project going on the personal genome project Which is open data, but it's close participation. So not everybody can participate if they even wanted to and We made a small Questionnaire we asked around 100 200 people if they wanted to participate or if they wanted to share their data 68% would share their data and 26% would share the data but not with everybody only with the companies and only 6% would not share their data at all and We asked what what are the reasons for sharing the data and you can see here most of them would share because of possible personal benefits, which we will talk about later again and most of them wanted to help scientists with their research and Now I give over to Basi Okay, so We are really in favor of doing those open genome wide as a station studies Which would allow everybody to do studies on their own So you don't need a large lab and the best thing about it It's already paid for so if individuals get genotyped on their own costs and make the data available to the public We all could use the data for our own research. So There are some implications by making our genotypic information public sharing 1 million markers with the public is Problematic for some reasons, but there are also some good positive consequences out of this and we have to warn you If you think about this there are possible extreme bad consequences, which could arise out of it So you should really make sure you know what you are doing it if you really would everybody to know about your open data So on the positive side first of all you will gain more knowledge about you first of all because if you make your data available to the public you will get More people which share the same diseases share the same phenotypic information like you and it's just Yeah, like patients like me you get others to share experiences on your diseases or traits Then it's nice for everybody because it's cheap open science if you use Openly available data for something you can make nice science without spending up money out of it and it's not even feasible for trained researchers which have Related universities for a long time, but everybody with an internet access could use such you know typing information for their own projects and The negative consequences first of all it's people might know more about you than you would like them to know Yesterday there was a talk about forensics how DNA fingerprinting is used by law enforcement agencies So this is one thing you should keep in mind other things are your boss or your next boss might get hold of the data And may use it in things or in ways you won't like them to do so You might not get a job or your health insurance company will raise their rates or even not ensure you at all because of some Genetic information that you will might develop a risk or have elevated risk factors Another thing is this knowledge isn't static Phil said there's one billion markers which are tested but up to now We only know have associations for about 30,000 of those markers So it's nearly still a million left, which we don't know what they are doing if they are doing anything at all so Even if you're quite informed about what your genetic information right now says you have no guarantee that not Two days from now there will be some publication with which links snip six six six to some kind of deadly disease You will develop in five years from now and everybody knows because they published already your data. So make sure you know about this and well your data is Not only Informative about you but also about your next of kin So your parents and especially your own children will have similar Information so I'll illustrate this on my own example I've published my data already and it's I have this nice a a Allee here and this means I have one point six times higher risk for breast cancer Which might not sound that much, but it's nice to know But this also says that my dad and my mom will have at least one a and this one a already means that They will have at least a one point times higher risk for breast cancer as well and with some more elevated statistics You can even figure out the frequencies of these Alleles in the population and you can get a quite good view on they are risk factors as well So you should make sure that your next of kin might also agree with publishing your data the same is true if I Have a daughter or son someday and yeah, even they will have one a at least so again You can't say if they will develop a disease or not just by me publishing their data So they are consent would be needed as well So possible solutions to this Yeah for future this heart So what about laws because there are already some laws in place which try to minimize the impact of public genetic information The first of the United States is the genetic information on discrimination act and this is aimed Mainly at insurance companies. So and your future employees. So What it basically says is insurance companies or others may not discriminate you because of your genetic information, which may be available Because they did some hidden testing without your knowledge But this does not keep this is only for basic health insurance So life insurance for example could still be denied because of your genetic information Here in Germany, we have the game diagnostic visits Which is I think a bit harder because it basically makes it not that legal to perform those direct to customer genetic testing at all so it's They sell it here in Germany, but it's still not really clear if they are really allowed to do so But if you want to get a test you still can do so But besides of all those problems and possible influences with laws We did a platform for those who still want to share if you are not totally freaked out and say oh god those genetic information will kill me we've built open SNP and What basically allows you is to upload your data there a genetic information and phenotypic information as much as you will and If you are a customer your benefits out of it is mainly because you get access to really really many publications or primary literature on genetic information if you're interested in this for researchers or citizen scientists It's clear you get the data and it's a Cheap way to get lots of data to use near on research So up to then there was no central repository for open genotypings. There were about 50 people who already uploaded their data to github or source forage if you remember it Stuff like this and you had to do a lots amount of research to find all this data If you wanted to use it and mostly where there was no phenotypic information about diseases, etc So yes, we've created it and people are using it already and So what are this? It's really open. It's CC zero. So it's basically public domain all the data in there. So it's Yeah, genetic information should not be owned by anybody with things. So it's really open and can be used for everything you would like to do with it and Yeah, it allows users to annotate phenotypes and it's completely crowdsourced. So we don't make any Yeah, we don't make any suggestions what you should enter or not people can go there and say I would like to know your phenotypic information about SAT scores So and people already do enter their test results from high school basically or on nicotine dependence hair color stuff like this and there's Some disease grisks or if there's somebody in the family who develop cancer sometimes so people are entering this and Everybody can download everything the only thing that will be kept from everybody is the email addresses and passwords of the users Besides from this you can get a complete dump of all the data and perform research with it And so far we got 81 genotypings and 20 27 users So as we said we would need 1% at least of 23 meat customers, which would be 1000 users So we've 8% so far and the numbers are rising So we are optimistic that someday we will have enough data to be used in open genome-wide association studies So to conclude everything we think G-Wiz will be the future of personalized medicine because the numbers of customers for direct to consumer genetic testing are rising still It gets cheaper every year. I think 23 and we start with prices of around thousand US dollars five years ago. No, it's 200 euros so you see the prices are dropping and more and more people will get genotyped and More and more people are willing to share their data as we've seen in our survey 67% would be willing to share their data and Yeah, it's in the hands of all of us to make or break the situation So we could make good science out of it or we could publish total crap like Philip Said the study on pro-social behavior on this. This was done with 23 People which were genotyped so the sample size is basically nothing and you shouldn't get published But somehow they got into proceedings of the National Academy of Science in the US. So, yeah we shouldn't do this and Yeah, we have the chance to take science in our own hands So it gets out of the company walls where 23 and we sits on this large large amount of data and The public can't make use of it and other researchers can't use it as well and So far we have won the public library of science and mentally binary battle with open snip and We've got some funding through the German Wikimedia Foundation. So it's only 5000 euros, but this would be I think 25 More people genotyped which would be public available and if you're interested in getting genotyped We will release more details on this in the next year. So, yeah, check it out if you want to get genotyped and don't have the money And Yeah, we are constantly improving the project currently We are working on including an API to deliver genetic information using the distributed and annotation system Which is widely used in genetics for genome browsers and stuff like this. So others can build third-party tools out of it And yeah, basically, that's already it. So Thanks, and if you have questions regarding privacy and stuff like this go on You definitely not a switch and speakers or so. Okay They said they would do a switch in the speaker. So what anyways now q&a we have again a microphone here running around and one signal Angel sitting in the IRC and apparently there are already some questions and Before I do the Before we start with that again, if you see any matter bottles, please bring them out Yes, so now we can start the first question. I would say from the audience and then we go over to the IRC Thank you. Thank you for the interesting talk Was an issue. I didn't know about anything. So very interesting my question or what you You just make very short summary. So I think there are lots of aspects we can talk about but the most interesting for me would be as you said if I publish my genome I Cannot see the effect that will have in the future because of the research which is coming up Which can also be like okay, there will be a Theoretically you can tell me what my intelligence is or whatever. So and The much more interesting aspect I think if the impact which would will have for my children and maybe the children of my children and very much Aspects so on and we had like these discussion about post privacy also And do you think we will have like in the future like in 100 years? Like everybody has a public genome and we will know like every everything about everybody It will this be a possible future or do you think we will have to go more the way? Okay, nobody can publish this because it's too dangerous What is your opinion in which direction? basically personally, I'd say we're probably landing somewhere in the middle like people like us publishing it and There's too much information about you in there So I don't think that everybody's gonna release it and it's it's too too important for the insurance companies and for your kids and everybody I don't think we're gonna have all open genomes for everybody Do you think I differ on this because it's Yeah, ten years ago the first human genome was published and it cost about three billion US dollars and took ten years Now you can do it in a week for about 10,000 euros so and this price is still dropping at the same rate basically it's a More slow is slow against it. So we will have the situation where it's gets at least affordable It's not a question of the price So I think we already see that people get their full genome fully sequenced if they have some kinds of diseases So it's in medical applications. We already do it and up to now we have not figured out how to deal with all the data So basically it keeps biologists like Philip in your life and in job Yet To have all this data and I think we will get there that all the data is available Yeah, if it's cheap enough you can't stop people from just taking a sample of you Yesterday in the forensics talk how many DNA you are losing every single second. So I Think we'll get there So on your website do you ask for people's names or is it important that you have the identity or could these be anonymous? Most of them are anonymous. You don't we're not Google. We don't have a real name policy Okay, so how do you know if the data is real? Do you get it directly from the company somehow? We have no way to a certain if the data is real you can't say it could be that you upload your sister stuff or some from some random stranger We can't is this much of a concern or have you thought of systems where you would have a cooperation with one of the companies or? it is not yet a concern of course, but Even if we were to work together with the company I don't know it's because then we would get the real names and we don't need them. We don't want them Yeah, I think companies will not be willing to give other data directly just out of liability and privacy concerns, which was totally Yeah, it's a I think they won't do it because it's a Yeah, it's still a liability risk that I won't be want to get sealed because somebody checked the box and didn't really know what he did So they might not do it The authenticity of the data, I think there's no Benefit from uploading some fake data up to now. So when the lesson is not yet a problem that might be some day But I don't see the benefit of vandalizing it. So it's not even fun to upload a fake genotyping How much actually is the? Acquirement of DNA sequencing today and can a private person Buy it on his own and when will be the time to buy it on his own to have real sort of freedom? I mean in a sort of in a way of open source hardware DNA sequencing whatever There are some projects which are trying to build do-it-yourself biology DNA sequences They are not really usable for complete genome sequencing yet because they are much too slow and that basically the Chemicals are too expensive to use it yet. So we won't get there so fast But I think at 23 and me you can already buy a fully sequenced exome of yourself So it's all the protein coding regions of your genome for one thousand dollars So it's a it's we are getting there and there are people working on doing it open source Just two questions for you One of them is Have you actually looked at any of the DNA forms that are out there the forms of FT DNA? 23 and me users Because they're very very active and they'd be interested in something like this. They often Submit their data to third-party services The other thing is I'm a 23 and me and FT DNA customer and One thing I have is For my personal records, I'm keeping all my All my test results on Google health at the moment Because there's nowhere else to store them and there's no place that I can put my data and give controlled access to it Is there anything like that on the horizon? What's the last sentence again? There's no place that I can put my test results in my data that I can give somebody else controlled access to them for example, I have results of my supposed susceptibility to Side effects in certain drugs and things and I can't give my doctor access to that Not easily anyway That's why we have the phenotype system so you can upload your data and enter your phenotype for example susceptibility to the disease you carry So that Okay, so that So that the doctor or other researchers Researching that disease can then download your genotype and look at your phenotypes and link that link them together That is just to reiterate for the stream. That's research not personal as medicine, but Yeah, we're not there yet soon in the next couple of years five to ten years and Yeah, we are on active on the 23 and me forums and Trying to contact users there who already are sharing Okay, so I've got a quack here. I've got many questions, but Here's the question. What sort of a curary can be expected in the personal genome Sequencing test is a data repeat table and very Very fee able Hard words We have a wonderful signal angel at the moment That that differs from case to case sometimes we've seen studies in our database where we linked snips to which were just based on 20 Chinese people or five Europeans so It's very hard and you always have to look at every you have to look at it from case to case It's not always verifiable To answer that question. Yeah, the example we've seen was a one point six times the risk for breast cancer So this is not really high, but there are some Markers which have Yes or no, basically if you have this variance you will have the following phenotype if you have the other you left another phenotype So there are such hard cases, but they are not frequently found yet So another question was do you plan to offer a open SNP like platform for other genomes than human EG picks mice for university research Or are there already such platforms? We don't plan on doing that, but there's lots of DNA databases for the mice DNA databases. There's everything for everything we viruses through What we what we could do in the future is like we've already mentioned before we could make a similar Database once exome sequences really takes off cheap exome sequence and then we can make a similar project But we're not we're not at all focusing on any animals Yeah, but if you want to do it for other animals you could do so the source code is online available So it's open source take it make it out for pick open SNP and go for it Another question is how much data is there for one person some GB? It's 30 megabits 25 to 30 megabytes Well, this is the Just the 1 million markers the fully sequenced human genome would have about 3 billion base pairs instead of 1 million So it would get bigger a couple of gigabytes for the full human genome Hi, it seems you have a Kind of basically an upload site and a data sharing policy and where you know what I'm really interested in is the API When we have exome data, I mean, what are you thinking because there you've got the whole problem of novel markers How you interrogate that data and that's where it gets interesting. So are you thinking about this problem? What's your plan to get into novel markers multi-marker associations and this kind of you know genetic material? Yeah in terms of exos we are currently implementing the distributed annotation system, which allows you to query data for single chromosome for Specific region in a chromosome. So you don't have to download all the data anymore This would be of a huge help and I think this is widely used and this is it's a standard So we will search for this. I think great idea What would be nice is like once you have a lot of users in your database and you have tons of data It would be nice as a researcher to go in and say, oh, I found these really interesting people But I would like to interrogate them further onto their phenotyping So I'd like to get them back into the lab So if there would be some way I could contact certain people and say, you know We found these really interesting variations and we'd be interested to you know Take a blood sample or something and compensate you would you be interested in participating in our study? Which could be anything and if that would be a way I could contact people through your website that I'm interested in that would be fantastic Yeah, we have a small power messaging system so you can just use that just to ask the people around Yeah, and if it's more users you won't Be able that anybody could just click a button and send one million users the same spam message So if you are interested in sending lots of emails through the system you as a researcher just contact us So we could do this So thanks for your nice presentation. I have two questions that are not that related one to the other. So first one is from Data collection perspective. So this is kind of very sensitive data Right and under the European Commission Legislation that has been transposed into national legislations. You need to have special permissions to kind of you know Host and share personal sensitive data such as this one. So do you have any? Yeah, comment on this. How do you handle this? Also the the licensing thing is kind of weird Because CC0 Has kind of very weird status in under the European Union legislation. So how do you handle this? I'm just very curious and the second question is For Juas, we have very Few Studies like where you transpose actually your statistical findings like the probability that this SNP Causes this cancer Is this this SNP explains 0.1% of cancer cases blah you have very few studies where actually This is verified in the lab So how reliable actually is the the information we get? Here Even if people put their phenotypical data and how reliable is this at the end since we don't have a lab verification Yeah, thanks Okay, I'll start on the first question is Privacy terms and terms of services. We are currently working on implementing a system called we consent it's by John Wilbanks who was the creative commons science director and Yeah, we are working on this and the idea is finally there should be an IRB approved so institutional review board approved For me we'll have to sign or check and answer questions that you've understood all the terms and services So this would be the best legal way and we are working on this And for the second question, that's true It's really hard to verify that stuff later on and you can't really It's not really a death sentence if you see this data in your results So it's always should be always taken with a grain of salt and if you see it now It could be completely changed in five ten years. So it's it's always moving so Right now it's I Wouldn't trust it 100% I would wait Yeah, but I think it's the best we have up to now for a broader audience. So Exome or full genome sequencing is still not affordable for everyone So we could already make use of the data which is available now Okay one more question. I'm just curious since this is so sensitive information Do you go and educate the users in some kind of way or do you just rely on the users? Educate themselves before they upload the data when you sign up for the page. There's a long disclaimer where we Educate the user on the possible problems No, no, it's a it's human readable not lawyer readable and when you upload your data you probably you're gonna see the same disclaimer again and we also have a Block on snip dot wordpress.com where we also even wore that a bit longer on the possible problems and risks you go into another question anywhere We have our wonderful video intro should I do the front or? Do you have a microphone my dear? Yes It's it's late in the evening, and I think he's waiting for jeopardy, but we have so much time so we can have many questions If I want to upload my data, and I already have some knowledge about an association that I do not want to Publish is it feasible to keep this part of information out? me not really you could try to redact single snips out of it, but As said the knowledge isn't steady you might know now which new association for the same disease might turn up two days from now So you could try to do so, but it's not guaranteed that it will work He's taking over now well, it was said several times and it's obvious that it's very sensitive data and It how is it protected against being taken say by brute force be it Legal at some point of time or be it illegal Some people with arms standing in the door and saying give me your service containing all these genotype and phenotype data and Users emails and stuff which may be correlated with clear names in that entity. So it's really really sensitive stuff But the data is open anyway, we don't have to be broken into To minimize the risk we we don't log IPs We we don't you can use fake email addresses. We don't verify that and The only thing that you probably could the only thing that's not Open is the messaging system. That's the only thing we could lose if we got broken into so if you if you use a fake email address and if you If you don't use the messaging system, then he should not be identifiable, but the problem is because the data is so The data itself is so singular for yourself We could identify you again based on that data and there's no nothing we can do against that It's not for research. It's for personalized medicine. So you you don't have anonymous This anonymity, right? You have just pseudonymity So if you are standing receiving and standing back and back and forth So you have a two-way communication and another party a third party Can rely on that, right? So and now the trust is broken at one point in time in Two years from now or in hundred years from now, but the data is still valid for my grandson or for from whoever So there's no answer to that problem That I see right now because your grandson is not your clone his data is probably going to be very very different from yours Yeah for one quarter, but we're still gonna get mixed up data so Even if you now lose all your data this you can just you cannot say everything about your grandson My wife is uploading something to then you know the half at least of my grandchildren So and so on and so on and so on It's a never-ending story, right? So if you open the box of the Pandora, you never can close it again That's true. It is already open. So that means it's another body's problem then One question from the ISP apparently So here's the question is the data format format for exchange a free one and which format is it? Right now we offer all the data in the same format We get it from meaning that the 23 and me data we offer is in the format that we got from 23 and me And all the phenotypic data if you download this is just a tap the limited CSV What we are trying to do soon is to make up our own small format for easier parsing of everything It's probably gonna come sometime next year early next year Just just an answer to your question or just because I think it's really interesting to talk about this Philosophical aspect of it or the moral aspect and I I mean one thing you said that is and I've and it tend to share this opinion that in in in the near future we will lose our genetic information so fast and so often that we cannot avoid any way to lose it and Learning from history is that when we lose our information anyway, maybe it's the best idea to open that up to everybody So that at least we as the people have the same opportunity as the government or as the companies or whoever Who has the power to take it away anyway, so maybe it's Just the right idea to make it public in this way just an idea. I don't have the clue already, but yeah You're referring to 19 what? Would having published This kind of data if the possibility had been then Would have been published the data say in Germany in 1935 when there were certain laws About Genetically related Things were made and I personally would prefer if you would not make a remark about this because else I think the Nazi would have killed me Because I have a genetic defect which is quite severe. So thank you sit in what you see is the main practical like sort of way of Reconnecting somebody someone's identity with the data that they upload because it doesn't seem very Straightforward there could be some third party looking at the Trying to connect an IP address to an upload or something, but say I upload it through Tor or we have some Sophisticated way of uploading data so that you don't know my IP address What are other ways that you foresee in the future that someone's identity could be re Attached to the data that they share because it to me it seems very difficult or they would have to already have my of my DNA anyway in which case they won't be interested in The data that you have I think very recent research shows that you need around seventy seventy five snips to 99.9 percent identify a person and because the data set is like a million snips So I just need a very few of your things of your snips to Back identify you even if you're anonymous. So there could be a problem in the future Yeah, I think basically you're right It's in the future it will be easier not to just intercept somehow your communication with this platform You will just grab some DNA out of you and Do it themselves and get the million markers for themselves. So it might be easier in the future more questions I see I'm confused about two things first of all one of the commenters said this was not Mainly for research, but for personalized medicine as I understood it It's mostly for research and that users only get something out of this project because if more research is done They can benefit from this information just as much as everybody else who gets genotype Maybe could you clarify that what the user gets out of? the Participating in open snip as opposed to just getting their genotyping from a private company like 23 and me That's the one thing the other thing is I don't understand the problem if somebody like took a hair from me in the street and Got my DNA and then could backtrack this to my open snip data They wouldn't get anything from it because they already know who I am and they already got my DNA So why would it be a problem if they also got my genotyping information from open snip that is already included in my DNA? It's no privacy leak. I think Okay, let me just answer the first question When you get your results from your companies, they'll give you a digest of the current research and What we do basically is we pass three databases first of all the public library of science Which is open access research then the Mendeley literature database which encompasses most of current research and SNP which is a Wiki-like system where people can enter information on snips. So what you get as a user is the very very newest Unchanged research so you can look at the papers themselves and you can see what's being done And you can see what people are thinking about it other than what? 23 and me and other companies are telling you about your data Well on the side of privacy what you might get is your phenotypic information, which if you entered some Freakish disease stuff, which you can't see or you can't tell anyone about Then you can connect your genotyping information to this phenotypic information, which wouldn't be possible without uploading it So I've got a very famous question. It means can the featherhood be looked up from the data Could you just repeat it? Sorry. Okay. Can the can the Can fatherhood be looked up from the data father? You can people are not doing that because strs that we had in talk yesterday is small tandem repeats It's cheaper and quicker, but I think you can do that. Yes, okay. It's just not done Yeah, you usually do this for wider kinship, so your Cousins or uncles or stuff like this where it's not that easy using Str so they are we are not doing this because we're not really interested in genealogy and Kinship relations stuffs, but you can do so and I think family tree DNA is the main provider for genotypings for such reasons And they do it and have a large database of other people which they can compare you to