 So, thanks Anu for the introduction and in the next 40 minutes or so, I will take you through this rather interesting and evolving area that will I am sure play a greater role in our lives as we go forward. So, start with this picture which is a collection of cells that all of us are made up of and for this audience think of cells as processors. So, each cell is a processor, it does some computation and this is a truly distributed system in the sense that every cell is running its own process. What is remarkable about this distributed system is that unlike a typical one that we use from a programming perspective where there is a master controller, so to say which splits up the problem, breaks it up, gives it to the individual nodes, monitors the nodes as they go along making sure they are doing what they are supposed to do and then collects back the results and so on. It is very centrally driven. This distributed system on the other hand is completely truly distributed in the sense there is no central controller at all. Every cell executes its own program and does whatever it feels like in a way and in spite of that somehow all the cells together cooperatively managed to keep a particular organism or a human being for instance functional and coherent and focused in some way. So, how does that happen? What is common amongst all these cells, all these processors that unifies them and makes them all not run in different directions but cooperatively do something together? What is common is that they all execute essentially the same program and that is what unifies them other than that there is nothing that unifies them. So, they all execute exactly the same program and somehow the program is such that it forces all of them into discipline and making all of them work cooperatively. So, what is that program? So, deep inside the cells you see you saw those black regions called the nucleus inside the nucleus. If you had a powerful enough sort of microscope so to say you could actually see all the way down to what is called the DNA and we all know it looks like this and somewhere inside that is the program that actually every one of these cells executes. We know today over several years of development that the program that is sitting inside that DNA is we know the alphabet that that program is made up of and it is the it is just four letters A, G, C and T just like now computer program would be written typically in the Roman alphabet or the English alphabet. So, A to Z and 1 to 10 so on here there are just four characters. So, we know that much, but unlike computer programs where we are able to look at big chunks and big functions and say this is what it does our ability to look at this program at such a high level is still very primitive. This program is exceedingly long so it is 3 billion in length and the task that confronts us today is that imagine somebody gave you a program let alone 3 billion a couple of 100,000 lines I am sure you will struggle with going through it to figure out what it is doing and that is in spite of knowing all the higher level constructs in spite of knowing how things work and here we know much less. So, that is the challenge that we are confronted with today. Now, if there is a program that every one of these cells executes so let us think of this program it is running what are the variables in this program. So, the variables happen to be a number of molecules that sit inside these cells there are literally hundreds may be a good guess would be hundreds of thousands of molecules that hang around in each cell and the level or the amount of each molecule is the variable value. So, the variables are the molecules the amount let us say how many copies are there of glucose how many copies of ATP those would be the variable values that this program works with. So, the state of the program is the value for each of these variables and as the program evolves it is going to change the values of these variables as it goes along and why is it changing again it has a goal in mind and the goal sort of it is a very high level goal it is called survival meaning I need to make sure the cells survive I need to make sure the organism survives I need to make sure the species survives as a whole. So, the computation happens towards that end and at a low level what is happening is that these variables are being constantly modulated by the program these variables are being modulated there are and there are two types of variables so to say these molecules some molecules come in from the outside. So, you drink water that is a new molecule that goes in to your cells you breathe oxygen that is another molecule that goes in and there are molecules that are created from within. So, what are these molecules that are created from within. So, this big program that I talked about this 3 billion long has certain parts of it dedicated to describing these molecules these parts are called genes these parts. So, that box for instance is a little part of the entire 3 billion and that is that box describes the you can think of it as a recipe for creating one of those one of those hundred to know several hundred thousand molecules that inhabit the cell. So, this recipe actually lays out in great detail how exactly that molecule should be look like how it should be assembled and so on. So, it is just a you know so how what the description is less important to us, but the fact that it is a description that describes that molecule is all we care about. So, this program contains embedded inside it descriptions of hundred thousand molecules that this program can act will so to say start assembling and manufacture. Notice that this description is not one contiguous chunk of the program it is sort of many small pieces and that adds a little bit of complexity though I just wanted to point that out, but that is not important for this talk. So, we have about a hundred thousand molecules that are coming in from whose descriptions are embedded in this program and then there are many other molecules that are coming in from the outside and now what is the goal of these molecules. So, when molecules come in from the outside the cell as I said the program is geared towards survival it is got to take action when molecules come in from the outside to ensure that if those are harmful molecules for instance then their levels are reduced if their useful molecules may be their levels are kept up or amplified etcetera. Now, so those variables the molecules that come in from outside those variables are not set by this program they are being set by the environment. What this program can do is in response to that take one of the molecules whose recipes lies inside the program and increase the level. So, increase the variable value of this variable and this particular molecule that gets synthesized from this recipe will actually do something to the molecule that came in maybe it will destroy it or maybe it will change its nature in some form and that is the computation that the cell is running and that is the computation that ensure that the cell survives. So, I want to give you an example of that and an example of a very small example of what happens when you for instance inhale smoke. So, inhale cigarette smoke for instance. So, what so the smoke brings in a number of molecules I mean the core molecule of interest for instance is nicotine which is not the subject of what I am going to talk about that is the stimulation molecule, but smoke brings in all kinds of other things which are called hydrocarbons of various kinds. So, various sorts of molecules that are there in the in all the paraphernalia that surrounds the nicotine. So, this is the molecule that is coming in from the outside. So, this variable value is being increased by the environment by our habits so to say by the environment. As a result of that the cell has a particular molecule that it keeps the program keeps the levels of this molecule constantly at it is called the AHR molecule it keeps it at a certain level and the goal of keeping it at that level is to say that when these molecules come in when the variable values of these molecules are increased this molecule is the sentinel the guard that goes and captures these molecules so to say it binds to them captures them and drags them deep inside into the dungeons of the nucleus. So, once these invading molecules are captured and brought into the nucleus the program senses that and says there is something bad happening here. I now need to manufacture one of these molecules that whose recipes are sitting inside the program and one of those molecules is called it's got a name it's called SIP 1A1 so it starts manufacturing more of that molecule. So, it is increasing the variable levels of that molecule. So, the program is simply watching looking at the variables levels and that's what programs do look at a particular state of the variables replace those set of variables by new set of values and that evolution continues. So, this the variable value of this molecule is increasing the goal of this molecule creating this molecule is that it knows how to chop this up into small pieces and push it out of the system. So, this guy just simply goes through what is called a drug metabolic process which is a fancy name for saying that it chops up these invading molecules and makes converts them into a form where the body's cleaning machinery can push it out and these things that go out through the urine for instance. So, that gives you an example of what when you think of this as a program what is this program doing what are the variables what are the variables coming in from outside how is the program sort of reacting in response and setting its own variables to combat the variables that are being set from the outside. So, that the natural state of the variables can be set back to a state where the cell is happy to live and survive. Now, there are a few catches in this design and one of those is what I want to point out next is that I told you at the beginning that every cell is executing the same program and that's what unifies all of them. There's nothing else stopping a cell from running away and doing saying that I want to do what I want to do and not what everybody else is doing and that would be disaster for any sort of cooperative environment as you can say. Now, since every cell is executing the same programs there is discipline enforced the program itself is largely a read only program that there are recipes for molecules in there the program can sense the variable values that are being forced by the system by the environment and appropriately make changes, but the program itself is not changing. Now, that is partly true partly not true what happens as a result of lot of these drug metabolic processes wherever you have invading molecules coming in from outside that are getting chopped up. There's a flaw in the design that causes a side effect namely that these things called reactive oxygen species are a side effect of those reactions that chop up these invading molecules and these things have this unfortunate side effect of actually violating this read only nature of the program and they actually write into the program unfortunately and they do what is called DNA modification they change the program. Now, this is largely this is not this is an exception rather than the rule as a rule that the program is read only, but just as you have read only memories now they read only, but if you go and hit them with the hammer obviously the things are going to change likewise I mean this is like a extra constitutional way to modify the program and the program does get modified in bits and pieces. The program again has the ability to sense that it's being written into and that's the amazing thing about evolution that it's given us all of these protected mechanisms. So, program is changing and it has the ability to sense that it's changing and it has the ability to then correct for those changes, but all of those are now have limits to it and so over a period of time over a period of years of you know years of once lifetime over a period of years of these small levels of the read only program actually being written into the program starts looking different and then the cell with the different program decides that I'm no longer doing what everybody else is doing I'm doing my own thing and that leads to a bunch of diseases including cancer is the most common disease that results from that behavior, but you can see that the system that we are dealing with is not very unlike what we are used to. We are taught biology completely differently, we are taught that it's something else it's to do with frogs and mice and rats. At the end of the day it's a program, it's a program that is remarkably interesting from the perspective of being a truly distributed program that still keeps things together but yet things drift apart once in a while. The other interesting thing about this program and I don't know whether there's this horrendous program that I've put out here and I don't know whether some of you would probably be able to spot it if you had long time ago we used to have this challenge question can you write a program that outputs itself, that replicates itself. So if you write a program when you run it it should print itself out and you can see that this is not easy because let's say you have you say print something that something will get printed but the print statement itself won't get printed. So the output has to be the same as the program itself. So now you say there is something you declare a string variable and say print that string variable the variable gets printed but the print statement itself is not printed. How do you make the print statement itself print and you've got to go through a few tricks and so on to do that and you'll realize that you have to use loops of some form to do that. So there are two statements in the program the first the second statement will first print the first line and then it has to print the second line but you can't have two print statements because then you'll have two print statements and you'll have three lines to print. So you need the second statement to be a loop and so on. So you can see that there's some complexity of thinking that has to go in into creating a program that prints itself and and very interestingly the programs that all of ourselves run and is a program that's capable of printing itself that's capable of reproducing and creating another complete new copy of itself. When it creates a complete new copy of itself again there is a small violation of that read only rule that the program creates a complete copy but it makes a few changes to it. Those few changes are sometimes bad sometimes good they are good in the sense that that's what leads to the large variety of life forms that inhabit a planet or even the large variety of races that humankind has. They all look different somewhat some are dark skinned some are light skinned some are tall short etc and those variations arise from the fact that when this program reproduces itself it makes a few changes. Those changes are not too many so if you look at rough about one in a thousand characters gets changed. So here there's a character that's changed here that's a character that changes this is roughly to scale meaning there are about a thousand characters in between these two. So these changes can be just changing one character to another or it can be that you just remove a character and just take it out and put nothing back in return or you can remove a long chunk for instance maybe 10 characters out and things like that. On the average it's about one in a thousand. So if you take somebody from Africa and somebody from Asia, China, Indonesia wherever you take whichever part of the world humans they're all just their differences are only one in a thousand. So in that sense they're remarkably similar there's barely any difference you could say. Now the quest is so we have this program that each of our cells runs the program that you run is slightly different from the program that I run because of the reasons that I mentioned and that is what makes you different from me and because you're different from me you would look different you might behave differently you might have a set of characteristics that is different and the quest of science is clearly to understand what is it about this program here that leads you to become tall but keeps me short or for that causes a particular person to get a particular disease etc. So clearly this one in thousand variation is even though it looks like a small variation it's big enough for all of us to look very different behave very differently for some of us to unfortunately get diseases that the others don't have for some of us to have exceptionally long life spans while the others have exceptionally short life spans etc. So these small differences is what one wants to study and see if there's a one can at you know at some point in the future be able to correlate these differences to to the various behavioral and so-called phenotypical differences and variety that exists I'm not sure what the text at the bottom is so ignore. So what we need today is some sort of instrument implement that can allow us to see with the program inside each of us and if and that's the first stage so we need to be able to see what the program is inside us before we can get to the next more interesting stage of saying now we understand what are the differences that all of us have and now we can start correlating these differences to behavioral differences and then eventually get actionable information that will allow us to say since we have you have this difference you're likely to know have this sort of maybe disease coming up in the future and this is what you can do about it proactively. So at this point we are simply talking of the very first primitive step in this in this whole journey which is can we see that that program can we see the individual at an individual level not at a course level but at an individual level where we can see the one in thousand changes that each of us has and the reason I'm giving this talk today is simply that over the last 20 30 years substantial progress has been made towards building that set of spectacles and it's a complex set of spectacles and but but not too complex in the sense that so here is how those glasses look like you you take this little test tube and you put a you spit into it literally and then you send it to a lab in that lab there's an instrument where you run run this this saliva through a few steps and dump it into this instrument and all in all it will cost at today's rate about $5,000 and out will come some data from which you can literally see the individual changes that you have compared to what everybody else has $5,000 is is you know tantalizingly close to being something that is low enough and the rate at which this cost is decreasing is substantial as well and the expectations are that it will be definitely $1,000 in a couple of years from now and so that's tantalizingly close to the point where you could say this the expense is no longer an issue now maybe we can really identify differences in lots and lots of people people run it at a population scale and get to the more interesting level of understanding what is it about these differences that actually translates into disease or not so now I come to the computational part and the algorithms part of saying what comes out of this machine and what what does that have to do with big data and what sorts of algorithms are needed to complete the set of spectacles meaning take what comes out of this machine and at the end output here are the places where you are different so you gave your saliva in here and so here are the places the $1,000 places where you are different and here is how you are different the way the machine on the previous slide works is that it so the program so your saliva has cells inside it and each of the cells is executing the same program that we talked about and somehow the machine extracts that program out of that saliva and it it of course doesn't just take one copy of the program it takes many many copies because this it's an inherently erroneous process so you want fault tolerance in some form by taking many copies and reading all of those simultaneously so it takes many copies of the same program you don't know any of the characters in this program yet all you know is that the program is sitting somewhere inside the test you and it it does a number of crazy things to it and it's it's actually mind-boggling to at the end look at it and say all of this comes together and accurately gives you at the end exactly the places where you differ from everybody else but it does all kinds of things so what it's going to do is to chop up this these these many copies of the program into small pieces so it's just randomly chop up close your eyes and just cut them you cut them because the molecule is so huge that it's very hard to manipulate it as a whole you've got to chop it up into small pieces so that they are manageable and then what you do is you end up reading reading meaning that you end up taking each of these pieces and reading a little bit from the sand and potentially a little bit from the other end so about about 100 characters from each end and that's all that the machine is capable of doing today ideally it should be reading that whole thing but today it's just capable of reading about 100 pieces from the sand and 100 from that end before errors start piling up and take it beyond the realm of recovery so what you get as a result is a lot of these small snippets so you had the whole program to begin with this means many copies have been taken and chopped up into just pieces of about length 100 each so you've got lots and lots of these small pieces and that's what the machine spits out so if you know spend that $5,000 and give you a saliva you'll get back a file which contains these snippets and there'll be about a billion of these snippets each of length 100 in what you get back and that's where the connection to big data comes in that in a very I mean there's not the conventional big data but just large amounts of data on which now you have to run the right types of algorithms to be able to recover take these pieces and then come to the conclusion that here are the places where my program is different from everybody else you can now imagine how so now we are stepping out from the realm of biology or if you want to call it as I said everything is programs or carbon-based programs as opposed to silicon-based programs now maybe the future sort of nomenclature which narrows the distance between these two fields but stepping into the world of algorithms and computing and how do you take all of those small pieces and and bring them together into what we want so one way is to start taking these pieces in what is called assembling them you take a piece here take another piece and if this piece has a long suffix that's identical or close to identical to a prefix of this piece it's a good guess to say that these two pieces actually came from the same place in in the program and that you can piece them together and now you could run this procedure repeatedly to start piecing things together into and hopefully if you're lucky you'll get the big program back at the end so our problem is that program was huge 3 billion long stretch and what the machine gave us was all these chopped up small pieces and we're trying to piece it back together and this is one way to do it the challenge in this is that very soon you run into this combinatorial explosion of many many possibilities that this piece not only nicely you know jigsaw fits into this piece it also fits into 10 other pieces and each of those might fit into 10 other pieces but only one of those is the right piece that came from the right place in the program and you don't know which one and so you've got to explore this search space and that explodes very fast and then very soon you're in a you're in a situation where of course you're running out of time you're running you know this plenty of memory plenty of time needed and all of that and it it's also at some level completely infeasible to actually bring back the entire sequence from this so now how do we then proceed so what happened 10 years ago was a big step in allowing us to proceed and 10 years ago there was this major effort which was a culmination of several years of work which was called sequencing of the human genome the human genome project and the goal of that project was not the goal that I've laid out today which is that on an individual basis we want to assess what differences you have related to me the goal of that project was to say on a species basis as the homosapien species what does our our program look like so it's not your program my program or anybody's program but some rough draft of what all of our programs on the average sort of look like so they took maybe five people five people and said we'll pull all of their their programs together and then we'll read it and they had to go through this piecing process that I talked to on the previous slide and it took several years several you know literally hundreds of millions of billions of dollars to do this took lots of time lots of combinations of experimental tricks to limit the combinatorial explosion that was happening with the competition and combined with the effective competition etc and they eventually managed over several years to actually piece together one copy of the humans you know one representative copy which is which is as I said some five people's copies but this is not a procedure that can be run on a regular basis if you were to walk in into a lab and say I want to know my program because this takes too much time it takes too much effort it goes for years it needs a big big and you can see the number of people who worked on this but what this gave us is that it gave us one representative sequence one representative program and as I said most people are no different from one in thousands right so in which case if you have one representative program now we have say if I were to measure my genome I already have what is called a reference genome which is I know it's not very different from mine I know I'm different in one in a thousand and now the task is I have all these little snippets with me but I have also this reference program to go by which I can use to reduce the competition space and so let's say this is the reference program it's called the reference sequence it's a one-time effort that's been done and now we have it to use on on an ongoing basis and we've got all of these little snippets that have come because you've walked in and want to read your your program or your genome so how do we use this reference sequence to piece together all these little snippets and the way we use that is that we convert it to a search problem so we say you search you no longer trying to take these pieces and piece them together you know in the context of the reference genome all you're doing to do trying to do is to search for these pieces in the reference genome so these pieces are all strings of length 100 this is a long string of length 3 billion and this is a search where you're searching but remember since you are going to be different from this reference genome in one in thousand places the search has to tolerate errors it has to tolerate mismatches and gaps so the differences between you and the reference could also be that some parts are removed or some parts are added so the search has to tolerate mismatches and gaps much like if you were to do for instance an Oxford English dictionary search you would give you would give a word and you would typically expect that the search is an exact search it should locate the exactly that word and give you its meaning sometimes when you're doing Google search for instance you want to type in something approximate and you want it to correct for typos or even you know approximate spellings etc so this is that sort of a search and so basically now we've reduced the computation problem to a problem that all of us are probably familiar with and which we use in our daily lives which is a standard search problem and this is how once so here is one of those little snippets and here's the reference genome and here is an example of how the little snippet may be slightly different from the reference so there might be a G here instead of C and these two places the reference had two T's and that's been completely removed from the genome that got measured so something like this is what you want to search for you want to say that this string here matches here if it matches within a finite small number of mismatches and what are called gaps and so this is a classic problem of indexing the reference genome so you want to index the genome for fast approximate searches much like we index Google indexes text so it takes all the web pages and it converts it into a certain data structure so that when you come up with the search key the time taken for the search is typically a function of the length of the search query and largely independent of the length of the corpus that's being searched if you didn't do the indexing you would have to go through the corpus and that would take time proportional at least to the size of the corpus but the goal of indexing is to take now do it one time index that corpus one time so that the time taken for a search is more a function of the query as opposed to a function of the corpus and exactly that's what you want to do here except that the notion of approximation may be a little different from what typically you use in other situations so if you're building a dictionary you would do the same you take the Oxford English dictionary index it so that when you get a word you can quickly know check where that word is and what its meaning is without searching through the whole dictionary literally and I'll just give you a five minute introduction I have ten minutes I think so I'll give you a five minute introduction five minutes left oh that's it okay then I'm going to skip pretty much to the algorithms except say that the indexing for you know approximate matches usually a little difficult to do see you end up indexing for exact matches and and then what you do is you take these snippets you look for substrings that match exactly out here and then once you've got an anchor you try to extend it using what's called dynamic programming it's a standard method to look for strings which are close to the string very string that you have and then you know there are a number of data structures suffix trees interesting very interesting data structures that I wanted to talk about but no time so this burrows wheeler transforms etc and all of that ensures that you can actually do all of this computation now and depending on the amount of memory you have etc in about hours so you know 40 hours of 15 hours depending on whether you have 4 gigabytes or no 20 gigabytes of memory and then you can speed that up using what are called graphics processors which means you've got a single card with literally hundreds of cores in it and if you were to use that you can actually get it down to a few hours and we are working now to get it actually down to maybe the goal is about 3 hours or so and that's doable we feel so that's that's the computational problem now let me spend the next 3 or 4 minutes and telling you what impact this can have so now you walk in you give us a live up you pay $5,000 you get all these little snippets you go through this process and at the end of it you you get you know for each snippet you know where it arises in the in the in the genome then then when you when you take all of these snippets and put them at the right place where they aligned in the genome naturally you'll start seeing where you differ from from the reference and these are the places where you know the snippet has a different value than the reference and so those are the places where you differ and then you basically you've identified that and now what can you do with that so I want to give you one example in the last couple of minutes which is what you can do with that is a variety of things it gives you a lot of information a lot of that information the actionability of that information is still a big no area of research but the one area that is I want to talk about just give you one example is all of us are carriers for various sorts of diseases luckily those diseases don't manifest because we have one copy of the program each of us is two copies one from the pay one from each parent we have one copy that's that sort of has a bad bad gene so to say and one copy that has a good and one copy is one good copy is good a good enough to protect us but when there are two bad copies in the disease manifest all of us are carriers meaning we have one bad copy and then we if we happen to meet with another person who also has a bad copy there's a good chance that the offspring will have a bad cop two bad copies in the disease manifest so one application for this is simply to see what all are we carriers for and therefore what are our children at risk for now this is not something that happens the frequency is relatively rare it's like one in thousand one in five thousand etc but over a billion people one in five thousand can be quite a lot so here is one classic example it's a 25 base pair deletion 25 characters get deleted and it causes what's called hypertrophic cardiomyopathy and it's very interesting in the sense that this is a Indian mutation it's known to occur only in India it's known to have arisen about 30,000 years ago in India only people in India have it nobody else in the world has it and interestingly 5% of us are carriers for this so as carriers will be largely okay except as we grow older there will be you know heart disease manifestations that will occur and so even as carriers this is useful information for us to say that we need to get checked periodically because the disease as itself hypertrophic cardiomyopathy typically manifests itself in sudden death and you read all these footballers dying on the football field this is what they die of so 5% of us are carriers which means we'll have milder symptoms that show up over as we age but if two carriers have a child then the child will have an acute case which will manifest itself in sudden death on the football field so this is one reason why we need to sort of know about what is inside our program there are many other reasons but I don't have time so with that I will end and thank you very much and open up a few questions thanks for your talk Mr. Hariaran so I have a question for you clearly this is this task is probably more than the big data task the complex data task how would you look at it is it the size typically that we've been discussing since yesterday or is it the complexity in the size and the big data problems are interesting challenges but in the if you look at a scheme of things over they are not challenges that are good that are going to challenges for for decades for sure even for years I would say they are challenges that will challenges for months and a couple of years and those will get solved right so so the big data problem that I talked about the algorithmic problems that I talked about once they get solved and once the prices of sequencing come down and sequencing sort of becomes ubiquitous that many many people get then there'll be the next big data problem that for instance what what do most of you know look at when you talk about big data you talk about mining retail data mining financial data this will be yet another pool of data that will come your way so you'll have a billion people on earth with all of their their variations from everybody else sitting somewhere and all of their maybe you know information on whether they are tall short whether they are this disease or that so that'll be yet another pool of the data that will come your way and that I'm sure there'll be talks on just as you mine retail data and financial data you'll be talking about how to mine that data so that's that's one from a big data perspective the key problems from a real science perspective or can we understand how those variations really impact the biology of no disease and then can we figure out ways to avoid disease or cure disease that's the hard problem I think that nobody knows today how long that's going to go that's it's the beginning of the journey today Ramesh I have two questions the first one being when you're doing all this personal genomics when you know that you are a carrier of some particular gene how does psychology play a role in that second question is how long will it take do you think that for the doctors to start describing medicines on your genotype psychology is an important issue but like several other issues it's a question of getting used to it and it'll you know that as awareness builds up it's more it's more an issue of knowledge education as awareness builds up and as we realize that things are you know things are actionable meaning that particularly if you're a carrier a carrier is really the best piece of information that you can get it doesn't impact you as much it will impact the next generation potentially and you have enough time to you know take action to to to avoid that impact so a carrier is really the simplest of the problems there are many other things that could that could be more troubling but it does play a role and no it's a challenge and over time you know we'll have to see how that sorts out but there are being psychological studies done today on on does it really impact the only study that I know of concluded that while this knowledge doesn't seem to trouble anybody exceptionally beyond a period of a week or so it also doesn't force them into taking action for instance if you discover you have that you should be exercising a lot more because you have potential for heart disease it doesn't really force you to start exercising a lot more either so that's that's that's the conclusion of whatever study that I've known of how soon it will it be before doctors start doing personalized medicine I had two examples on my last slide of things that are happening today of course they're not happening in every health center or not but in certain no in thought leader hospitals etc that's starting to happen so I had two examples very nice examples which I didn't go over but the answer is in small you know small numbers it started and it'll just increase over time permission do you think sometime in the future people will start looking at genome types before selecting life partners you tell me I mean that's a you know all our job is to put this information out and then how you use it what you do with it you know do you look at what the person bought at Walmart before you decide whether you're going to buy marry them do you look at no their stock profiles before you decide you're going to marry them do you look at their other whatever whatever they'll be a top later today I assume and they'll tell you what they store you decide whether you look at their genotype as well before you marry Ramesh one last question one quick question here this side in terms of the indexing I understood the big data going towards the index is it finally something akin to a lucine index if so how are you tolerating the jitters no no no no lucine indices these are all very specialized indices that one has to build these are very specialized problems so yes we don't use general purpose indices and you do you need to define that you need to shard the indices or they fit in one unit they don't fit in so depending on what you mean by so they don't fit in one depending on how much RAM you have so these indices will happily fit into about so the good thing about these indices is the way these algorithms are designed they have a very nice time-space trade-offs so you decide you tell me how much RAM you have the same program the same index will work but I'll sample it so that it'll fit into that much memory at the expense of increased time of running is there enough locality that you can page things in and out of the disk or do you need it all in memory at one time so the index has to be in memory but as I said you can sample these indices so that if you have only so much memory so the that's the nice property about these indices you can reconstruct from sampled information and that's that's and that's what you use primarily so whatever memory you have we'll fit into that much memory thank you