 Okay. Hi there. How you guys doing? You getting tired yet? Hang in there. I'm going to try to be as engaging as I can and my goal is to bring up some points to consider as you move forward where I highlight a couple of resources and then at the end I'm going to get into some analyses and some interesting things that we've been learning in my group. I did a really crappy introduction of myself at the beginning because I showed up like I didn't realize that things started at 30 so I was very flustered and I'll just mention that I basically my fundamental interest is in coming up with more sustainable approaches for infectious disease control and genomic epidemiology is one such approach developing anti-infectives. You can ask Venus who's here for my group and also anything I mentioned regarding genomic islands. We also got Kristin here for my group so I'm just going to point to them because unfortunately I have to leave tomorrow so I'm not going to be around but I do want to mention that a bunch of us instructors are going for dinner tonight to the Queen and Beaver and it wasn't clear I think that there's basically you guys are welcome. Any of you are welcome to join us. I would love to meet more of you and so certainly if you're interested there's a reservation I think for 7.15 but there was no problem getting a reservation I'm sure we'll be able to get something and that is assuming they didn't have a lot of showing of the World Cup semifinal but they took reservations so I guess not that bad. Anyways but just moving on just going to be talking to you then about some of my research but a little bit more also about some big picture points at the beginning and really one of the key underlying themes is this idea of sort of that we really need to move to more open bioinformatics and this is an important thing that's really taking center stage and effects disease control and open and organized bioinformatics I wanted to emphasize and I'll talk more about that and expand a little bit on the ontologies that Will has been talking about and in case I forget to mention it congratulations Will on your new CIHR grant because that's one of the few public ones that you just got announced at lunch today or at least you learned about at lunch today so if you guys know of bright students postdocs etc who are interested in infectious disease control and really get a little anal about data organization please point them to Will's direction because these get to a bunch of hires in fact a bunch of us are going to have to do a bunch of hires in the next little while because we've all got a bunch of grants recently so certainly we'd love to hear about if you know of people who are interested so we have been living through a revolution okay a revolution driven by data and this has gotten huge and I just wanted to mention more generally you know the amounts of data being generated are significant and we've passed at Zetabyte now and next is up is a Yotabyte just so you know I think that's the most awesome one that we're going to get to but it's going to take a while to get there that's a thousand you know Zetabytes but but certainly we are really going crazy with the amount of data and it's actually turning into some interesting sort of ironies that a lot of the genomic data we may sequence and then some of that information may get stored as collectors collectives of genomic data may end up getting encoded back into DNA to store it as code because we're having problems with at least for archival purposes coming up with more efficient data storage and I just wanted to highlight just for reference this catalog DNA dot com approach is sort of interesting so normally when you want to catalog data you're sort of taking zero zero as like a and zero one as G and I don't know exactly which is which but and then you've got one zero as C and and and one one as T so you've got these fights and all the possible combinations in your synthesizing DNA so their processes to generate large volumes of a few different molecules and then encoding the data by combination of molecules so instead of like synthesizing the DNA to encode your data and they're going to decouple this and make very efficient large volumes of a few different molecules and encode that data via combinations of molecules which I think is interesting but look out for this because this is gonna have to happen because the amount of data being generated is huge and even in just you know my epidemiology we've got these incredible capacities occurring and I laugh when I do this because I mean I'm saying this is incredible but think of what it was like 10 years ago if you know and so think of what it's going to be like 10 years from now we're going to be laughing at the soul remember when we only had 6,000 gigabytes capacity right but the longest nanopore read now is that 2.2 megs that's been reported in literature granted on bio archive but still and you know that's the size of a you know many small bacterial genome so we really are getting to this realm of having some really interesting data that's going to allow us to do a lot more public health control and surveillance via genomic sequencing so this is again causing this revolution in public health and leading to this idea of real-time infectious disease outbreak investigations using this genomic data coupled with what they call metadata which I'm not a big fan of because it implies that it's somehow less important I mean the genomic data is really important but also the other data is very important so I like to think of it as this other data like lab data epidemiological environmental data that becomes really key but and again I can't emphasize enough you know sequencing is cheap so that's not the hard part is sequencing with this analysis and that's why you guys are all here right to learn about this analysis that uses the shared data that's really really becoming powerful and I stuck in cute animals and I've got actually even cuter animal kittens jumping and stuff like that just to get your endorphins going and make you feel a bit better about my talk but anyways so but this is also leading to a lot of infrastructure needs so there's been infrastructure developed for example in the UK with this microbial bioinformatics cyber infrastructure I thought I'd point to a couple of resources just so you guys know that they exist and in particular I wanted to highlight in Compute Canada we now have four clusters including Cedar in at SFU that I just named after the official tree of BC one of these really big you know they last for thousands of years and and there's other clusters too Arbutus Graham and Niagara across Canada including here at U of T there's the new Niagara system being set up but I did want to highlight that these are really cool so for example this one here is entirely housed under a water tower purposefully I usually think of water as an evil thing near computing equipment but the idea is this is entirely water-cooled and they use the hot water for other purposes so they actually have this PUE efficiency that's really impressive which is the for every megawatt you use of computing it's it's how much you use for cooling and support that's usually double usually one of your biggest energy requirements so most of them is usually a PUE of two and so I just thought I'd mention if you ever using something like Irodo or something at SFU or whatever you've got you're really using it's environmentally friendly computing power just so you know but in case you're not aware though you can for free get Compute Canada accounts if you're at all associated with an academic group in Canada and pretty decent amount of space and you can ask for a increased allocation pretty easily through a pretty simple application process so be aware that that's a resource you can use there's other resources too that have resulted in some bioinformatics platforms that this germics virtual laboratory is one that's been developed over there in the UK and another one is the DTU Denmark Technical University set up their bacterial analysis platform that's also available to use another one I think is really taken off nicely is this pathogen analysis system them at associated with NCBI that is being used with the US FDA for this genome tracker network that was mentioned and the idea is there's now sequencing about 5,000 isolates a month and basically the idea is they put them up on a tree and every night somebody looks at it goes and sees if there's clusters forming and they just try to in real time notice if there's clusters of related isolates and then the problem is the metadata is or other data is not so good and well organized so usually they sort of have to get up on the phone with these people and figure out what's going on and that could be okay but I just want to to the horn of that will is really leading some efforts with this Lexmapper applying it to this data and basically coming up with more organized data about food source etc to make it so that we can query these more more robustly but the point is that this is now happening you know this is being used and also they're in the UK for example you'll they're doing a whole genome sequencing of every TB isolate to diagnose and do surveillance for TB so this is as developed really nicely but there are some issues with some of these resources with either they are sort of closed source like the NCBI one or they are maybe a web-based application so you have to sort of upload your stuff to that site and there's a lot of public health agencies around the world that have concerns about that they can't just sort of take their outbreak data and upload it to another country that's sort of not the best for their security and so but there have been a I do want to put a little plug for micro react is one resource that Anna over there and wrote you were you here this morning I remember seeing you okay but anyways we yeah you were okay sorry and basically she'll be talking about this more later so I'm not going to get into that but it's a really nice visual for looking at data but the concept of irida which you're going to be introduced to in this course but I want to emphasize it's not the only resource I recommend you use but this resource really the philosophy is to make something sort of open source so you can actually see the code freely available where you can either go to a sort of a web version like we've got this version at SFU set up as a public version or you could install this locally so we have a bunch of resources researchers around the world who have been installing this and the idea is to make it sort of modular design so that we can add in more features as it grows using galaxies workflow engine I think that'll be discussed more as part of your integrated assignment you'll be using this so again I'm not going to go into this too much but I do want to make you aware that the whole concept is unlike some of these other resources you can actually install a copy of this from GitHub and you could also or you can go to a web to use it but the idea is to make it a user friendly experience and hopefully you'll see that after doing some of the command line stuff you've been doing today I also want to really emphasize this sort of I won't go into detail about all this but just to say that for this platform that a key underpinning is we were very keen to increase there's a lot still to be done and it's very actively being developed with a growing community but that ontologies and data standards are really important component of that we want to increasingly incorporate as part of this and I'll explain why we care about that but I just wanted to stick in this acknowledgement slide and say it thanks to these people in particular Gary and will who've been real and Gary who's really been leading and championing the development of this first version of of Iroda and this growing group of researchers that are now all funded thanks to a bunch of successful grants to continue this work more and this has been installed in a bunch of places pictures actually a bit about out of date but I just want to emphasize that you know for example in Switzerland the colleague of ours was able just to install it on a machine without trouble so certainly though this is something very actively to be developed and certainly if you do decide you want to install it don't hesitate to get in touch there is a helpful that you know we can provide there is assistance provided for help in dealing with that but but anyways the point though I want to make is that the goal is philosophically to have it that you know we're making something for Canada and that's being used actively in Canada's public health labs but also have it be that there's a resource for some of these other resource poor core countries so you'll see that I when I mentioned resources being developed I was mentioning UK a lot and US a lot because they have a lot of money and they've developed a lot of great resources but there's all these other countries around the world they want to each have their own ability to do these analyses but they're much more they don't have the you know millions of dollars to develop these by a customized bioinformatics resources so certainly this is the sort of whole we're trying to fill with Iroda the concept though is really to deal with as was mentioned you know we're really sharing microbes globally now and in fact outbreaks follow flight paths more closely than simple geographic distance and I realized when will was talking that we shouldn't just talk about human flight paths we should talk about bird flight paths too but I mean so then it got me thinking it's really about the you know this these flight distances of whatever that are really having a big impact on infectious disease sort of migrations how we say long distances but they really are having an audible impact and if there's a zombie outbreak I recommend heading north from here or from Vancouver where I live but but basically yeah East Coast not looking so good but but I do want to say that this kind of data sharing that's required for us to really be doing global analyses really requires two main things data quality and data standards and ontology so organized data and you know I don't have to tell you that if you have garbage data even if you have the perfect model you're going to get garbage results and if you have perfect data if you have a garbage model you can get garbage results so you really need that quality data and in addition to standard operating procedures and accuracy assessment you know I really want to encourage you as you move forward in bioinformatics analysis to get to know your methods you know really don't just sort of avoid the risk of just plugging and playing things you know get to know what these methods are doing and always have controls so for example we wrote a paper that just went after we did a metagomics analysis we thought well you know we learned some things about when we assess the accuracy of tools to figure out what to do we learned some things so we thought we'd publish paper and actually got a lot of feedback on that and one of the most popular questions was okay yeah you mentioned all this complexity of different tools that are good for different reasons but what should I use so I would say that you can start you know cracking will get mentioned later but it is good for if you have very well-known microbiome so you're comparing to like say gut microbiome or something but if you're doing something like water you may want to look at more like tools like Megan and stuff that don't sort of force predictions and but certainly you're welcome to look at that paper which has some general comments in fact we got so many questions we ended up doing a little comment note and the page at the bottom of the paper there's a comment that was that they got good feedback on about just general thoughts about this but also always having controls is really important particularly for say metagomics analysis which you'll get talked about later in the course having a positive control is really important and a negative control to detect when you're having when your analyses are working and just don't want to encourage that because I don't see that with every metagomics analysis and if you want to know more you're doing like if you're doing metagomics analysis and you're sort of and this is going to be increasing in genomic epidemiology as we start to do more sort of ecologically based analyses not just looking at say humans with infection but also say the chicken the poultry that maybe the salmonella is coming from and then the processing plant and then the environment where the poultry are and examining the birds so this kind of more ecological approach or one health approaches which is taking off does result in you needing to incorporate metagomics into genomic epidemiology more and Cami is an effort and sort of an international effort to try to do critical assessment of metagomics interpretation or basically critically assessing different methods as they come out so check out that there is going to be a more activity in that in the future but right now they do have some analyses but they're working on making it a bit more shall we say user friendly for interpretation of all their results but the other thing I wanted to mention as I sort of just pepper in a few little bits of advice is you know just remember the sort of ecological approach but also not just bacteria for example in this watershed study we did we were looking at trying to come up with an improved water quality test as you guys some of you will many of you will probably know coliform counts are really inaccurate for say closing beaches I mean a high coliform count means you have a high number of certain bacteria but not all coliforms are pathogens and not all pathogens are coliforms right there can be protists etc and so we are interested in looking at some clean and fecally contaminated water either agriculturally contaminated or residentially contaminated and coming up with improved water quality tests by doing metagomics analysis and in short this analysis that was done over a year temporally revealed some surprising things but more in the side of looking at we looked at bacteria viruses and microcarriots and basically if we just looked at bacteria we wouldn't have gotten an adequate picture of what was going on we saw some interesting surprising synchrony between DNA and RNA viruses and this is just a mantel ours to stick in with the Q value Q means that you've got false discovery incorporated into your P value but but basically we showed you know in Vancouver this is in the Vancouver area these different watersheds that there is a dry season and a rainy season there's two seasons it's either wet very wet or dry and but it was interesting to see that you could really get that transition between the wet season and dry season and really see some of the synchrony but there were notable differences within a geographic site over time viruses were more stable versus bacteria but were more geographically specific and so we were really seeing some notable differences that were actually helpful in the future for future development but key was to integrate this other data you know this data about geography and we had a lot of chemical analysis data etc. to really flesh out you know which were the factors that might be driving some of these differences versus confounding factors we also were not limiting our microbiome analysis to just bacteria so something to consider in the future now again I'll come back to this but factoring in the ecosystem becomes important I like this slide because you know this idea of looking at the other data for data standards and ontologies we really need the time resources to do it properly I mean you have stopgap measures and people are often do this kind of thing you know like you don't even bother taking the thing out of the doorstop the stopgap measures can often be very rough but we really need to do things and make ways to make the process easier that's what Will was starting to allude to is coming up with tools to help with this so you can use ontologies and I did want to just you know mention ontologies are really a way of structuring information that are for the digital age what dictionaries were for the age of print okay so dictionaries were really important in driving things forward in the age of print now with we've got this digital material where we can connect it very very easily this is where we really need to move more to ontologies or standardized well-defined hierarchy of terms interconnected with logical relationships so they really are different from just a data dictionary okay to organize with ontologies you would have things like like this and you might have leafy greens and you've got spinach but what's useful is it can help resolve some of these issues that you might have if say you have an outbreak of E. coli and somebody says it's associated with a lettuce where somebody says it's associated with endive a computer doesn't know that lettuce and endive are the same thing unless you tell it so having those relationships allows you to link terms so it deals with issues of granularity of how specific you are it deals with issues of taxonomy for example in South Africa spinach is a different thing than in most of the rest of the world and so you know just to be if it still takes the sharing data to a different levels of detail so some jurisdictions might want to share information about their infectious disease outbreak at a very high level you know if it's just associated with a lettuce and they don't want to be specific where somebody else may want to go down to the particular company because they're comfortable with doing that so you this allows you to pull that data together so the other thing it can deal with the semantic ambiguity I like this because you know it's sort of like the chicken is ready to eat or is the is it are we ready to eat the chicken what's going on there and my my pet peeve is the scone biscuit biscuit cookie conundrum of and somebody you got brought up by Scottish parents when you talk about biscuits I think other than something different than in what is in North America biscuit would be a little bit more like a scone but but basically you know but this causes literally confusion when I've literally gone to get fish and chips in in the US and gotten fish with potato chips like hostess potato chips not not french fries I don't even know what term to use right so not potato crisps I guess potato crisps I've gotten fish with potato crisps as fish and chips before and so the point is that you can basically get these things organized and I can't emphasize enough as you move forward with genomic epidemiology investigations you can do so much by sorting at the source by doing this this kind of information organizing at the beginning this is what they do in many industries they sort of the source and so you want to do that that's why you have this recycling you know of different containers because it's much more efficient I did want to also emphasize that ontologies don't mean you have really long drop-down menus you can have intelligent query information I guess I won't go on about this but just if you search cookies you could find biscuits for example as an option but really a key thing is that it allows this sort of harmonization and data harmonization allows you to bring in different people's data together okay and reduce errors because you can correct for example in this one study where we are looking at medications I could not believe how many different ways people could spell ibuprofen when they talked about their child's medication and we were out I'll mention that child study later but it was you know just phenomenal to me but by putting in an ontology not only could we look at kids microbiums associated those who've got ibuprofen or did not get ibuprofen but also we could look at any people who kids who got any any inflammatory or whether they got to an antibiotic and look at these higher level categories more easily because we had this ontology present so as was alluded to you know will is really the one leading this genomic epidemiology ontology and and really open community-based development is key for that I did want to just put in a little plug that later they'll be mentioning Andrew is going to be talking tomorrow about antimicrobial resistance more in the morning and we'll be mentioning aero ontology and there's a mobile elements ontology and then this food ontology are sort of like the main focus of foci so if you're at all interested in any of these certainly get in touch with us and we actually have a group of consortium of people a key is to basically maintain like a language you have to maintain these things that have in continual and development and development so we've engaged a bunch of researchers around the world who can contribute to this and certainly if you're interested there's the email address that you're welcome to join us and we're really at this stage asking people just to sort of sign up and say they'd be interested in getting more information about this and then and then but most of it would be sort of like us coming up with ontologies and then having you maybe review components or you might want to propose certain components another use case for using ontologies I just thought very briefly in one slide mentioned is this child study in part because I need to hire some people associated with that too but basically we're looking at diverse data for 3,500 healthy children from birth is three months six months one year three years five years and eight years and we're going to plan to continue it looking at genetic and environmental determinants of allergic disease board is an incredible data set that's looking at everything like from how many hours a night they leave the kids leave their windows open at night you know just all this incredible diverse data that includes microbial data and of course as you probably heard you know a big factor in development of allergic diseases appears to be you know what kind of microbes you got exposed to early in life and it appears to see you know it's all within the first three months so if you know of anybody who's pregnant or expecting a child soon get them to just get their kid to crawl around the dirt a little bit until we know better what the key microbes are but but basically a key component of this is data standardization ontologies to be able to be able to look at all these factors that are playing a role but obviously we can look at other factors as well and look at microbial associations with other diseases that inevitably some of these kids are getting and in this case we have been looking at sort of medications and other factors but I did want to get into one area a little bit more about sort of enabling global analyses that have been made possible through data sharing and integration and I'm gonna focus for the last bit on genomic islands what time I start at 5 20 I guess right okay so and this is a genomic islands are basically clusters of genes of probable horizontal origin in bacterial genomes and they're in acryl genomes generally using the tools that have been developed to date they're sort of defined as over 8 kb but that's sort of varying and it general most of them are are really primarily thought to be phage coming in and anecdotally we're found to get commonly contain virulence factors so there's a lot of interest for you know a couple decades in this idea or at least the last decade and and growing of looking at these and seeing what was there in these regions I'll just mention I think there was a nature of these microbiology paper that Gary mentioned but we have a new paper out on microbial genomic island discovery that came out this just a briefings and bioinformatics paper on just methods for analysis and visualization that you're welcome to check out that has some of the more commentary than that I can provide here but the point is and it reviews different tools but one of the tools that we made in our group is Island viewer which basically integrates now for tools to Island path and CIGI HMM it's sort of hard to see here that are basically sequence composition based methods they look at unusual sequence composition in the genome coupled with other genes that might be involved in mobility and then Island pick which is a sequence comparative method and Islander which is a method developed by colleagues who is very it's very precise so when you get a prediction with Islander because it actually gives you even it looks for the tRNAs that the region might have got inserted into and looks for those sequences so it finds these regions very precisely but it doesn't have good recall it's only finding a subset of islands that it can find so so you know this kind of this is a visual representation that we made of these sort of different tools and you can see here that you know there's clearly something going on there of an island and the idea is you can just very interactively click here if you go to Island viewer for and you can click there and sort of view a vertical view of the area or horizontal view and you can sort of do a you know a sort of two fingers kind of thing if you've got a Mac but and other tools for PC but you can basically zoom in and out in these regions and look at them interactively so it's actually not a bad tool for just looking at a bacterial genome in general and we pre-compute and run all the bacterial genomes every so often so they're there and and then a key thing though was integrating these other things like virulence factors resistance gene predictions from the card that you'll hear about more of the comprehensive antibiotic resistance database and predicted by this resistance gene identifier that Andrew is developing and that's really widely used and this basically allowed us to do some more global analyses genomic islands and their association with different kinds of genes I will oh one thing I wanted to mention is Island path got recently updated there's another paper out of 2018 and so if you have used this tool at all and certainly be aware that there's a sort of better version now but we're actually trying to do better right now these tools they leave the islands are a bit fragmented and not really that there's many islands are not predicted the boundaries very well so we are developing a better method right now to identify those and again Kristen's going to be taking over some of that work so you can always bug her with any questions or examples maybe you might want examples of islands but but basically we're also developing this island compare tool which allows you to more do a more population-based view rather than looking at one genome if you want to look at a bunch of genomes and we're trying to refine the clustering of these islands so you could see sort of commonly similarly related islands together and you basically look at a tree of these genomes and it's again another visualization other way to view genomes you can actually you know click and drag in a region and zoom in all the way to the gene level or zoom out or click a node here and just look at say these two genomes or click a node here and look at these subset of genomes and then we're interesting adding other features including antimicrobial resistance gene predictions from the card which will be very valuable so Island Viewer is an example of a tool that's been integrated into this public version of irida that's available at SFU and but it really what's key for us is being able to do some of these more these analyses and so I'm going to sort of for the last bit talk about you know some of the insights we've been gaining and thoughts and I welcome your thoughts on some of these insights so one of the things we were doing is just doing a very simple analysis of regions of genome plasticity so not looking at just genomic islands but any kind of regions where they are conserved in less than 90% of a bunch of genomes and the idea was we wanted to see these regions how they treat out as a feature or character versus just a sniff tree so if you take a single nucleotide variance and you make a tree like you guys learned with sniffle and and was very wisely brought up there's this issue of that things can diverge you know with what's happening in the core genome versus a successory genome but we found interesting is when we start to look at this that for example this is a data set of pseudomonas originosa isolates which is a bug we care about a lot because now I'm the top three pathogens is requiring new therapeutics by going to World Health Organization because there is such a problem with antimicrobial resistance it's a really intrinsically resistant bug and we were so we're working on new therapeutics but also in better ways to track this but basically there's three different groups if you do a sniff face tree you get these sort of PA 14 strain like isolates PA 1 strain like isolates and then these sort of other deep branching ones as as indicated by having an out group group or other other species closely related species that sort of ancestral to all of them and what we do the same thing by using these presence or absence of these regions of genome plasticity as a character so when you're doing a phylogenic analysis each row of your sequence alignment is basically a character you're looking at similar to looking at parts of skulls or whatever you might do in classic physical evolutionary analyses of organisms but you're basically we find that we get that same kind of tree coming out and what that but there's differences within those subgroups so what that tells me is that that actually there's a different mobile gene pool associated with these different clades so these clades are treeing out and these mobile gene pools are also treeing out it's sort of similar to what we know already about sort of bacteria archaea and eukarya we have these three big domains of life that basically are we think one of the reasons they exist as three domains of life is that they're not it isn't enough horizontal gene transfer right to make them come together so you basically have bacteria with the bacterial viruses or phage archaea with the archaeo viruses and eukarya with the eukarya of your chaotic viruses and but maybe what we should be thinking of is there's sort of we've got these gene pools and we think about this sort of phage gene pool or viral gene pool being much bigger but some estimates are like it's about 10 times bigger than the gene pool in bacteria and and that diversity is is you know on spiring but maybe we should be thinking it as there's little subsets of these that are and that's why we get these trees because we don't really have species by the traditional mating definition in bacteria and interestingly a recent paper came out looking at this sort of cut-off we tend to use of 95 percent for sort of species and that it actually seemed to hold true and I'm wondering if the whole bacterial species concept is just a relationship between how easily viruses can get between different bacteria that basically when you get to below 95 percent sequence identity it's sort of harder for a virus to infect the same um you know bacteria that are that far distantly related and so then you sort of have to start having that bit of a quote species barrier in terms of the successory genome or mobile genome so I'm wondering if there's like I'd like to call them gene puddles that's what I'm calling them anyways associated with these different clades and that what I think we really should be doing in genomic epidemiology is paying attention to what are the phylogeny of some of these accessory and particularly mobile sequences versus the ones that are core and paying attention to how those are the same and different because they may give insights into what can move around and I'll talk more about that in terms of antimicrobial resistance in particular because we did notice that furnace factors when we did a sort of global analysis are disproportionately associated with geomic islands and this this makes sense um really ridiculously high p values no matter how do you slice it to the minus hundreds and but um but basically this made sense to us because if you think and I should mention that with ontologies we're able to earn up that classification I should say at this stage we were doing ontologies yet um we were able to sort of see that the less the more defensive virulence factors weren't so associated with um um genomic islands like say uh iron uptake system um it's just a more passive system but the more offensive ones like a toxin that directly sort of causes harm um and virulence uh those were the ones that were really associated with genomic islands and um and it makes sense to us because if you basically have a virulence factor that could kill off your host your bacteria and it could kill off your host say animal um then you know your risk uh by having that gene in there and being too toxic is killing off your host and you would not survive right and so anything that sort of kills off a host their host too much obviously just doesn't survive and you don't see that so there's selective pressure for virulence factors to maybe be associated with the mobile gene pool at least that's what we thought but then we started looking at antimicrobial resistance and the idea was is there any genomic island association as well for antimicrobial resistance thanks to this genomic island predictor and thanks to this um Andrew MacArthur's um resistance gene identifier we could pull those together and do a bit of a global analysis looking at basically all of the the genomes and um uh NCBI sort of complete microbial genomes in the short AMR genes are disproportionately found in mobile sequences collectively particularly plasmids but note that genomic islands are generally under predicted so you've got to keep that in mind but um what we found was that the association depended on the AMR mechanism so when we look at things that are depleting the antibiotic in the environment like it's an enzyme that goes out and and does like an inactivation or something so the the orange are uh plasmid and the blue is genomic island this white at the top is sort of the non-mobile other um chromosome components that basically if you look at this distribution of these genes uh and you basically get um this is a sort of all genes uh you know you get you have roughly about 10 percent of all genes are in these sort of mobile sequences in all of these genomes collectively for all of AMR it's you know a bit higher but you'll notice that when you start to go to environmental depletion where you're actually putting something an enzyme out that's actually changing the antibiotic in the environment and sort of more um offensive shall we say in its approach for um getting rid of this antibiotic that it's that is threatens it uh then you're basically are uh much more likely to be mobile and we thought oh maybe that's sort of like the virulence kind of concept of you know you don't want to be uh producing something that then some other organism is there's selective pressure for some organism to come back with something else because you're interfering with their production of antibiotic or something like that and uh they're notably that if you just have an antimicrobial barrier method you get reduced permeability to antibiotic for example you're very likely to be on the chromosome these are like the e-flex systems for example or you can see e-flex here sorry that's a different category but the point is that um we didn't see quite the same correlation so there's target protection which is also sort of a barrier method that didn't fit so we decided to look at other things thanks to talk talk to some ecologists but um one of the things we did notice that this is first observed in these species that we looked at in 1970s it's a very recent introduction so maybe we should be factoring in how recent something came in and I think it's important when we start looking at genomic epidemiology of antimicrobial resistance which is a big topic right now that we keep in mind that uh when you see something in a pathogen that you're looking at you have to pay attention to when that first came in because that first came in there but it probably existed in some other organism for many many many years right and so they actually think this particular um resistance these are some of the quintalones stuff that it it basically is um uh you know probably in these the marine organisms and then came in uh to these organisms so uh you know that you do have to pay attention to when stuff came into something but the other thing that really caught our attention was looking at generalizability and I apologize on your slides um there's a problem with the pdf version conversion it made this really wonky so I've tried to show you the right one here but basically um if you look at the number of uh drug classes um that each um you know mechanism because there are resistance to so when you're looking at these sort of these classes they're really sort of you know one or two on average um uh drugs that this resistance mechanism is conferring resistance to so it's very specific whereas uh the sort of um efflux reduced permeability target alteration and target uh protection um to some degree uh you know are basically uh showing that trend uh towards um being more um uh generalizable or involving more drug classes so in short um we're still investigating this actively but uh definitely it does look like as we investigate the specialized AMR genes do disproportionately are more associated with mobile sequences and what we think might be going on is actually something that we could draw upon from some of the ecological theories out there uh about evolutionary theory about ecological public goods so in um a lot of um environmental areas there are many things that um organisms do where they do as part of communities and uh we could think of bacteria as a community where they share the resources we do it as humans we share a lot of resources and uh what we think we might be seeing is this concept of ecological public goods in that the AMR gene optimally benefits all members of a community if it's present in a subset of that community so for example if you've got a secreted enzyme with a high fitness cost you know costs a lot of energy to synthesize this thing like a cloning vector beta-lactamase you know you'll those of you worked in a lab will know that you can get your cloning vector and you can select for your clones and you play on antibiotic containing media on a petri dish and you've got to make sure you keep maintaining it on that because you'll lose that plasmid quickly if you don't keep it on antibiotic because um there's it's such a high fitness cost that it has to make this 500 copies or 300 copies of this plasmid that it's making you know because you're trying to do some sort of expression system and cloning vector that it really is um it could lose it very quickly and so really what's happening is these communities of bacteria are probably benefiting from having uh say these um some sort of um uh antibiotic resistance mechanism is secreting into the environment that it's basically like a say a bilactamase that if it's only present in a subset of the members then that can sort of keep the resistance uh you know keep it um so it's resistant to that antibiotic you know it's keeping the levels down enough so the colony can survive uh but doesn't require everybody to be making it so you sort of have this shared public goods concept and I think that might be what's going on and we have to look more at a sort of ecological perspective on how and why um some of these things are very mobile and some are not and some of the ones that are more mobile may really be basically more shared public goods versus the ones that are say putting up a barrier where basically every member needs to have that barrier if they're going to survive right like an e-flex system so um I think what we need to do is move towards coming up with more of an equation of you know what shifts between pressure selective pressure to become more mobile beyond more mobile or non-mobile for antimicrobial resistance looking at things like fitness costs that data acquired whether it's secreted or not I think there's a lot of factors we can look at um I'm not saying we should definitely come up with a score right now but definitely I think we could come up with something that could aid risk assessment AMR mobility risk assessment what we need though is more prediction better prediction which Andrew MacArthur's group is is sort of leading a sort of a large effort to improve that antimicrobial resistance gene prediction they'll talk more about that and then I think also we need more ecological sampling I think I really encourage you as you do in um genomic epidemiology analyses to not just think of tracking in one location think of the environment these microbes are and we need to understand that some of the ecology around some of these environments that these microbes are in uh so uh one last thing uh just as I close here it just uh we are developing to sort of aid more ecological analysis we're developing this amortime which is like uh was inspired by hammer time uh though I guess you could say it's inspired by Thor or something like that but amortime you know it's a picture the music but uh but basically the idea is to develop an AMR predictor designed for metagomics data this is really being led by Rob Beko's group in Dalhousie and um this is um part of this sort of move to we need to incorporate better predictors for metagomics data right now which is very problematic okay so in summary um I can't emphasize enough again I'm going to say that again uh we need more open data sharing um and data integration ensuring we have good quality so pay pay attention to what methods you're using look for whether something is forcing predictions and look for um what kind of precision do you want something with very high precision or high recall as in precision is you know if it makes a prediction that you're definitely right or a high recall means it's making all of the possible predictions high high precision might be sort of missing some things and high recall might be making some level of incorrect predictions um data organization ontologies I don't think this has to be onerous particularly if you start this sort of sorting at the source you know get your data organized right at the source and um but be aware there are tools being developed uh like uh this lexmapper for example to to basically take text that is unorganized and try to organize it better uh in canda you know we've got this irida tool being developed that you'll get introduced to more tomorrow that's supposed to be you know really aims to fill a gap in tools that are available and again this is being actively developed so welcome feedback it particularly use of usability issues that you might have and also um I just want to emphasize the fact that we can make these global analyses possible because of all this data if this data hadn't been all integrated um some of these um insights wouldn't be possible and lastly this idea that was first brought up actually by will so maybe it's proper to sort of close with this it's just a reminder that we're all sort of linked by these microbes and these jurisdictions are all linked so that we really do have to coordinate efforts right every one of us uh needs to try to come up with ways and I wanted to just mention um if you guys and I would love comments about it uh as we get to the end here and get take any questions all as well but but uh I just wanted to get a show of hands of does anybody feel like in their organization they have barriers to data sharing like they can't quite you know share everything or do they feel okay yeah we've got one oh we've got a few yeah and so I would love to just at the very end if we have time we I don't want to take too much of your time but um just if you guys have any thoughts I would love to hear more about that or maybe you guys can collect some information later about what kinds of barriers you have because these are really important barriers that we need to overcome uh but I will just first end with just closing I just I really like to thank Claire and Bev and Justin who've been involved in that analysis and and Andrew's group for that making that AMR analysis possible and uh Jeff and my group um who basically uh together with also these um uh great researchers that as if you were able to make this sort of public version of every era that you'll be um uh using and uh oh and then just I I just really want to thank this great group of people who are taking a lot of their time busy time to to do this workshop and I'm I'm just sort of parachuting in for this one keynote but these guys are doing so much work and uh just uh please join me in thanking them too for all they've done so thank you very much any comments or just uh just we just I don't really want to take like five minutes time but does anybody have any comments on sort of in particular data sharing challenges they have maybe to start yeah or do you have questions yeah I'd say jurisdiction since they don't want to share because a different organization like for example the federal government provincial analyses and that's a really big barrier I think to right yeah yeah so can everybody hear what she's saying uh these barriers okay because I encourage you to speak up a bit but does anybody have any other kinds of barriers that they're facing uh yeah yeah no it's it's interesting you bring that up so just uh to emphasize so there's a lot of jurisdictional issues and um issues around people who may not totally understand the data having some fears about things happening uh with the data that might be unfounded I encourage you to point people to what's happening with genome tracker I mean with genome tracker they're sequencing you know 5000 isolates a month they're just sticking it all up yes they limit the amount of other data associated they just have is it food or you know um you know is it food born is it a human isolate and generalized localization um in these regions for example of the U.S. and uh there is other richer data associated with it but it is very generalized but nobody's consumed nothing has happened you know so um that said I do appreciate the issues and I think they are important ones of you can't just go out and just put it out publicly that you know a bunch of isolates are associated with maple leaf foods for example and it turns out that that's really not the source of of something and implicate a particular company so I do think there's this this is why ontologies can play an important role so if you put things into ontologies then you can basically imply certain things that might be at the more detailed level and you might annotate that information at the more detailed level but only release the higher level information you might not want to release specifically what type of lettuce it was because what if that type of lettuce is only produced in one part of the country and it's sort of or there you know you might have uh referred to a certain abattoir and there's only one of those abattoirs of that particular food product in Quebec or something uh so you you can basically change the level of granularity of the information you're releasing right um yeah I was just gonna comment speaking of ontologies that you nice to have almost a permission ontology is so many different analyses done with a lot of times you're worried about stepping on people's toes even though they're not analysis so may nice to have a standardized almost like it's a permission ontology where you can do this you can publish up to this extent if you do this type of analysis but not this is that yeah yeah that's a really interesting point yeah because there is we are sort of challenged with that unfortunately we still have this currency so he's talking about this idea of maybe a permission ontology and um just the fact that somebody may want to publish some data and do a particular analysis but they don't mind if people do other analyses generally the philosophy has been it's sort of like once it's open it's open but you know I can appreciate that and I think what it comes back to is we have this problem right now in science right with this currency of people have to get publications and they had to get it for tenure and promotions and etc and for getting grants and more money and uh and so uh this is uh an issue um that I think it would be nice to address but the the tendency right now is I'm on the board for genome candor and um we've been really pushing for sort of more open data um keeping in mind that certain uh types of data you know have notable privacy issues and certainly there is perfectly justified reasons for delays on some data but generally the idea is to sort of get it out um similar to how gen bank transformed by informatics I mean if gen bank hadn't been made and we didn't get all this sequence data I'll think of all the things we couldn't have done right but um anyways I don't want to take up more of your time I'd certainly encourage you if you if you have any questions feel free we've got this dinner with the what's it called again the queen and beaver kind of love that name uh so um it's it's actually just uh down on elm street uh sort of near the chelsea hotel across the street or whatever it's uh a pretty nice um it's actually got pretty nice it's got decent like british and british food that is actually a decent that's it um so encourage you to do that and uh but certainly um uh thanks and if you have any questions you can always ask me to okay thanks very much