 So thank you for the invite to come and talk, and I'm going to show you just some early work which uses the application of whole genome sequencing, which you've already started to hear about from Dr. Ryan's talk and from Key in terms of what he said. But I want to try to develop a hypothesis here where we can actually begin to use this type of data analysis to predict risk or for risk assessment. So just to give you a very brief history, I mean, DNA has been around for a very long time. It's a very stable molecule. The structure of it was solved in about the late 1950s, early 1960s with the famous Watson and Crick experiments. And then we began to find more about its analytical potential. And then, for example, if you look at my timeline here and you move on to 1977, you had the first report of a DNA sequencing protocol by Fred Sanger at the University of Cambridge in the UK. And Sanger is a very interesting character. He's one of the very few people who have won two Nobel Prizes for sequencing different biological molecules, proteins, and nucleic acids. Moving on, then, you had a technological revolution which started around the late 1980s, where the platforms that were being designed and developed to provide DNA sequence information began to improve. And so we were able to produce huge amounts of sequence data over that period of time, leading to this so-called big data or quantum leap forward that we have in genomic data. And so really now we have oceans of this type of information. The critical piece that we lack at the present time is our ability to analyze it fast enough. There are different types of DNA sequencing platforms going from the first revolution, which was Sanger's original enzymatic concept, through to the second revolution, where you see the Illumina sequencing model, which is what Dr. O'Brien alluded to. And this is very much the workhorse of many labs. And then the third revolution, which is a very interesting one, again, also alluded to by Dr. O'Brien, which is the use of single molecule sequencing. So we can actually sequence one thread of DNA and sequence it to a high degree of accuracy. In terms of the data outputs that I refer to briefly, these are the kinds of numbers you're getting. So from an Illumina platform, you're getting somewhere between 50 to 300 base pairs per reed, a packed biosciences instrument, of which there is none in this country. You get something like 5,000 to 15,000 plus base pair reads. And then the Nanopore, which is a very convenient model, which is developed by Oxoy Technologies. Here you can do something like 5,000 to 50,000 base pair reads. So just to give you a comparison of what that actually means, on the left-hand side of my slide here, this is what Illumina sequencing looks like. And Illumina sequencing essentially does very short reads very often. So it gives you this 50 to 300 base pair type analysis, which you then have to stitch together. So with an Illumina platform, you don't actually get a complete genome sequence. You get an almost complete sequence at about 99% finish. On the other hand, the packed bio instrument gives you very long reads. And this is very convenient and suitable for a scenario where you want to close an actual genome and there may be reasons that you may want to do that. But in particular, you see the smaller circle here, which represents a plasmid. And this may be particularly interesting to close because of its impact in terms of public health medicine. Even as both my previous colleagues have alluded to, the amount of data which is now emerging from whole genome sequencing is rising at an exponential rate. And we are now at the point where we're getting so much data that the speed at which we can analyze this with effect is very much slower. So to give you a comparator in terms of what we would predict by the year 2025 in terms of data outputs, this just gives you some idea. So Twitter would put out about 1.5 petabytes, whereas whole genome sequencing data, and particularly based on human genomes, would be putting out 40 exabytes of data. And you would have a value probably not quite similar to that but somewhat smaller than that with respect to bacterial genomes. So there's an enormous amount of information that needs to be analyzed. So if we bring this back to food safety then, how can we harness this data? Well, there's probably three reasons why we would want to apply whole genome sequencing. One is an obvious one, prevent problems in the industrial environment. And the second one which is probably more obvious, which is to track infections around the world when outbreaks occur. And the third one which is the one I want to try and focus on is to use this data to refine risk assessment. So to really develop a working hypothesis here, so that we can use this type of big genome data to differentiate between a regulatory control pathogen, such as Listeria monosatogenes or Salmonella or pathogenic E. coli, and one of the same genera that is really not a true pathogen. So how can we make that distinction? It's always good to have a food safety talk just before lunch, so I can inform you of some of the risks you're about to face. But this is one of the organisms that is of interest, Listeria monosatogenes. And it's an organism that we really don't see a huge amount of clinical evidence of here in Ireland, at least in terms of the numbers of cases reported per year. However, having said that, it is the scourge of the food industry. And the food industry have real challenges in terms of trying to control this organism. It's normally associated with ready to eat foods, so delicatessen meats, cheeses for example, sandwiches, that type of food matrix. And it can survive very, very nicely at refrigeration conditions. So even though the food is chilled, doesn't necessarily prevent an infection associated with Listeria monosatogenes. Where this organism comes from, we're not entirely sure. There's evidence that it's present in the natural environment. And there's also evidence that it's present in some food producing animals. And irrespective, it does find its way into the food processing environment domain per se. And when it does that, it potentially poses the risk of contaminating a final food product. Which if we as consumers consume it, and we're susceptible, we may result in becoming infected. So just to show you how this process happened. So we would assume that you would consume a contaminated food product of some kind. The organism makes its way down to the small intestine. And then in the small intestine, the bacterium in red here meets the host cell, which is us, and becomes internalized. And for this, it needs certain genes in order to execute that function. Once the organism is internalized in the host cells, it's then liberated from that particular vacuole where it was captured. And it begins to spread throughout the host cell itself. And then finally begins to spread from cell to cell. So as you can see, there are a number of genes that control this process. And those genes are functioning during this virulent stage of the life cycle. So we decided that we'd take a look at this. And again, looking at Listeria monocytogenes, isolates taken from a food production environment. These included environmental samples, samples from food, and samples from unknown sources. We wanted to study those at a very deep level to try to see if we could develop this hypothesis of risk assessment. So essentially what we did here was isolated 100 of these isolates. And then for every single isolate, verified its purity, then sequenced it using this MySeq instrument, which you see here on the top midsection of my slide, whereby we were able to deconstruct the genome, sequenced the fragments using these aluminum instruments, and then put the whole thing back together again. And then as Kean alluded to earlier on, use complex pipelines of bioinformatics programs to extract out data of value. So without boring you with the detail, this is what we found. So here you're looking at about 18 different clonal complexes. So this is CC1 all the way through to CC451 on the right-hand side. Now epidemiologically, we know that there are certain types of clonal complex that are epidemiologically linked to infection in humans. And those are the ones that I've indicated with the red star. So any of these bars that you see here with the red star indicate particular types of listeria monocytogenes that have been associated with clinical origins. And those strains we regard as being hypervalent. So they're likely to cause an infection in a human host, in a susceptible human host. The other ones that are indicated in the blue bars are more often associated with food origins, either the food matrix itself or the food production environment. And these are so-called hypervalent strains. So they may not necessarily be as infective. And that's specifically the issue that we want to test here. Looking at this another way, we can analyze the genomes of all 100 listeria monocytogenes isolates by comparing something in the order of about 1748 genes within each strain each time. And this gives rise to two different lineages, lineage one on the right-hand side, which are the types of listeria monocytogenes strains that are more likely to be associated with human infections, and then lineage two on the left-hand side, which are those strains that are more likely to colonize the food production environment or to be found in food products. When we look deeper into the genome, and in particular, we go back to that infection model that I showed you a couple of slides back, and you look at the evidence of those genes to support that infection model. What you see here is a heat map, which describes to the presence or absence or modification of certain genes extracted from the genomes of those 100 listeria monocytogenes isolates. And across the top, you'll see Lipe 1, Lipe 3, SSI 1, and Lipe 4. These are pathogenic genes that we know to be associated with the infection process itself. So you can see here that we can clearly cluster these strains using this type of analysis. You can see using the green blocks that, in many cases, these genes are present. In others indicated in gray, the genes are absent. And for those that are indicated in either red or yellow, the genes are truncated, which perhaps means that they're inactive. There's one isolate towards the bottom of my slide, where you see on the right-hand side evidence of Lipe 4 being present. All those genes are present. And those genes, if we were to decode them, are responsible for allowing that particular strain of listeria monocytogenes to cross either the blood-brain barrier and cause a meningitis, or to cross the placenta and cause a premature abortion. And this is why medical doctors advise women when they're pregnant not to consume unpasteurized dairy products. So moving on then, can I test this hypothesis? And do I know that what I'm seeing here is actually valid? Well, using a very simple model called a zebrafish model, I can select examples of the genomes that I've sequenced based on that stratification of hyperverilence versus hyperverilence and infect the u-sites of this fish and then follow the survival of those u-sites. So on the top here, you see a survival graph where we've selected a number of the so-called hyperverilence strains of listeria monocytogenes. And we've compared them against the control, which is the red dotted line here that you see. That's another type of listeria monocytogenes strain. But what you can see here is that in most cases, the u-sites, when they're injected with these types of strains, tend to survive. In other words, none of those curves, apart from the control dotted line curve in red, are hitting the x-axis. If we contrast that then with strains that were selected, which I regard as being hyperverilent and I run the same model again, you see a slightly different picture. This time, you see strains which are clearly infecting the u-sites. The u-sites are dying, they're not surviving. And this time, many of those curves are hitting the x-axis, which means the u-sites are being completely wiped out by those particular strains. So now we think we have the early stages of a model, which allow us to differentiate strains that are likely or more likely to cause human infections versus those that are less likely to cause human infections based on a careful analysis of the whole genome sequencing data, which is backed up by some of these phenotypic-type experiments. So to conclude then, what I would say is certainly this data is very welcome. And whole genome sequencing data gives us a deep look into the workings, if you like, or the blueprint of an individual bacteria. And we could do this for any organism. When you assess this genotype in combination with the phenotype, it's essential that we are able to prove that connection. So we need to be able to use models like these zebrafish models in order to infer the connection. And finally then, and I think for the food industry going forward, if we're able to translate these genomic insights, we may be able to facilitate a refinement of risk assessment models that might help to improve food safety measures that people like the Nestle companies and others around the world are implementing, and also to protect our consumer. And there's one other possible caveat here, which is that if we can do this, we may be able to use this refined model to feed the world in 2050 when the population doubles. Thank you very much. Thank you, Shay. Thank you.