 It's my pleasure to introduce you to Humberto de Batte, who's going to talk us through his talk. I'll just hand it over to you. We'll do 15 minutes, and Humberto has said we're leaving some time for questions, so we'll do that when that happens. This talk is in English, right? Yeah. Correct. Go for it. Okay, thank you very much for the invitation. I'm so happy to be here. These are going to be my slides. It's not very easy to connect these notes, but I'll try to give you an essence about virus discovery using publicly available data. The first slides are going to be a little about viruses, what are they, and how do we, how much there are of them, and how we classify them, and all the slides are going to be about examples of that. So if I get into a lengthy presentation, I can cut any time, and my time goes down, because they're just going to be examples. So the first one is about our virus alive, because I wanted to talk a little bit about what viruses are. It's like they are really living organisms, and they have a lot of functions of organisms. They replicate, they share that with cellular living organisms, but their structure is so simplistic, complex, and fantastic. They are just, from a biochemistry point of view, they are just nucleic acids surrounded by proteins and lipids, and they can do all their functions only with that, and how do they do that by prioritizing essentially all of their functions in their host. So they take the host machinery, they use it to make more copies of them, and they expand, and they diversify, evolve, et cetera. So they have a lot of properties of living things, for instance, and the first one is the capacity to evolve, but they cannot live by their self, so they are an obligate parasite. Well, but about them, how do we recognize them? Well they're like, they come in different shapes and forms. For instance, you can see here like that big one, this one are a few that are of importance for humans. For instance, the big one is on the left here is the Ebola virus, but you can see some with different structures like this one is the rabies, or here there's a small one that you have to take care about when you are here in Buenos Aires today because we're having an outbreak of dengue virus. You should use mosquito repellent when you go a little outside because we have a lot of cases of that, a lot more than the SARS. This is the typical structure of the coronavirus that is doing the pandemic that has been a very important impact in our lives. So how do we know how these virus have these forms? Well, traditionally, we used some things called electron microscopy to see them, like in 2D presentations, and you have here this, this is one of the first photos of a coronavirus, it's from the 60s with an electron micrograph made by John Almeda, who was a Scottish lab employee who made one of the first virus discoveries. So now we talk about coronavirus thanks of this scientist who discovered this structure. So that was some form to classify them, but as you can see, for instance, the SARS-CoV-2 has this same structure and that the SARS that made this epidemic in the 2002. So regardless of their structure, there can be a lot of difference between viruses even sharing that structure. So there are other ways to classify them. We're going to speak about classifying, but firstly, what about how much of viruses are there? This is the 75 million blue whales analogy. That would be the biomass of all the estimated particles around the world of viruses that are at any moment. And for instance, in terms of size, even they are so small, if you put them end to end, all the virus particles that we have today in the Earth, it will span around 65 million galaxies in size. So even though they are small, they are the most prevalent biological entity in the world, which has been a lot of the time ignored, and you're going to see about that. And here, for instance, you have here the blue whale, but if you go to Chubut in Argentina to see whales, you're going to see this one, which is the southern right whale. So again, organ means with similar structure that are biologically very different. So you need better ways to classify them that have to do a lot more with other stuff than morphology. And there's what it comes about genomics and generated data to try to classify base in that primary information, which is in the nucleic acids. But first of all, I say there are a lot of virile particles. What about the species when we classify them? Well, for instance, in the world, there are around a little bit over 5,000 species of mammals, and every year they identify new species or maybe two or a handful at least of species. But that number grows not very rapidly. In the case of viruses, as of today, there are only 11,000 species that have been formally classified, recognized by an organist, which is called International Committee of Virus Taxonomy, that is the one that say this is a species, this is another species, because as I say, they are maybe not alive, but they have biological entities, so we can classify it as a biological replicator at least. But the estimation, so this number doesn't represent at all the diversity of viruses there is. Some ballpark estimation, and you'll see the range because of how very little we have, we're 10 minutes already, or I have 10, oh, that's great. So you'll see this interval to take a glimpse of what we know and what we don't know. The estimate talked about 10 to the 7th or 10 to the 9th distinct virus species. I'm not good at numbers, but I think this is like 10 million and this is like 1 billion. So and I told you that there are currently only these numbers already classified. So that's one of the great perks of being a virus discovery scientist, because every day we're looking to things that nobody knew existed. So every day is like a new discovery, something new, because almost everything is unknown. Even in the recent years, we have a lot of effort to try to understand the diversity of virus. The classification lacks very well behind, even though we now know, we always know, but in this context, we really know the importance of viruses and their classification. So now we're going to start about like a personal view of virus discovery. So here, for instance, you're going to learn about the mate in Argentina. It's a most iconic drink. It's from a plant which is native from northeast Argentina and the south of Brazil and Paraguay. It is cultivated there in its place of emergency. Argentina generally is 90% of its productions. I'll almost 95% of Argentinians drink mate every day. So it's like we drink coffee, but mate is even more prevalent here. So like 10 years ago, we were working about trying to understand the genes of this crop, because we didn't know a thing about it. So we generally like a cattle of the genes of these plants. And there, to do this, we try to, I'm going to tell you a little bit later how we read the genes, but it's basically we extract the nucleic acids of these organisms and try to read all the nucleic acids that are there and then we try to organize them. So after doing that, we generated like a catalogue and we published that there were around 30,000 genes doing different stuff for this plant. And there I have one idea which is so obvious today, but it was not that obvious 10 years ago. So if we are reading all these nucleic acids of the genes of the plant, maybe there I can read the nucleic acid of some virus of these plants. So there we go and we describe it, sorry, like the first views are associated to this plant. Now we know about five or six of them that we have classified and they could have an impact. So during that year, I was just looking around and I saw this paper about the rabbit genome and its analysis. So I tried to replicate what we had, what I did with the data from the germamate and what I found was interesting because even the scientists who describe this genome didn't know in one of the samples of the liver of one of the rabbits they used to generate this genome, there was a hepatitis E virus. So even though the scientists were trying to explain what the rabbit genome was, etc., there was also a virus there. So, well, from that point of view, we managed to start like a kind of cycle of virus discovery based in studies that are trying to introduce genetic data of organisms and they didn't look for viruses. It's not very difficult but it's just you have to look for them to find them. So, well, eventually a few years after they found that rabbits not only can be infected by this virus but they can also share it and could be a vector of this virus. So it has relevance. So again, how we do this, we tried to extract the nucleic acid today and we put that to generate libraries and then we introduced to do this large equipment that can read all that data and then we process it in computer to try to find signatures specific that has to do with genes that do things, etc., in the case of viruses, there are also signatures about that that can tell us a little bit about what they do and what they don't do. After that, there is different pipelines to process this. They are most basically trying to align or overlap those reads because these are short reads of nucleic acid and then if you overlap them, you can have longer reads and then you can understand what they are trying to say. I'm going to try to generate this protein or this structural protein or this functional protein, etc. So what about the availability of data of that kind of runs of libraries of sequencing data? Well, it replicates almost everything in science. That is in the global north there is a lot of more publicly available data from organeers that also the map is not like the geography. The map represents a bias view of what has been sequenced in the world. And the current data is around 30 peta bases, which it says 30 million gigabases. I guess it's a lot maybe not for you, but it's a very large number of nucleic acid data in the publicly available to analyze. So how do you analyze? Well, there are important infrastructure, but there below you can see a representation of my infrastructure for various discoveries. So I tried to use a resource which are available in the cloud and there are wonderful resources and people who have worked a lot to try to provide this tool and democratize the analysis of large data even for scientists who cannot buy those and maintain all that large. Some of them are this platform called UseGalaxy, which can analyze a very big amount of data using only without paying for using it, etc. It has been used a lot for SARS-CoV-2 virus discovery, etc. So I thought there was this big huge machines that can read the nucleic acid, but very recently they developed a very small one. This one that is called the Manion, it's a technology from the UK and it can do almost the same stuff that the big machines can do and that also introduces a lot of democratizing of the sequencing infrastructure. That is, that equipment costs around a thousand dollars and it's ready to work, so even us could buy that kind of stuff and start sequencing our own stuff. The nature of this is fantastic, how they develop this platform, maybe the question I could provide some clues. They are also developed in some smaller ones and here in the picture you can see me with a pipette, which isn't something that I don't usually do, but we did some that during the pandemic. They have used this equipment even to sequence things in the international space stations and it's not only democratizing in terms of price but also in portability so you can go and sequence what you understand. For instance, these guys took it to the Amazon forest and saw a frog and they took a sample of the frog and they identified based in the nucleic acid which frog it was. Related to this, in a study of one colleague from Georgia University, we find that this American green tree frog, they were studying genes of sex, et cetera, these are one of our examples and we find a virus and this virus was related to rabies. So this was the first report of some rabies like virus in an amphibian. We were a little scared about that so the people in Georgia started working in a different manner with these frogs. They were something very cute and now they could be very safe and dangerous. We're still working with that. Eventually, we did something very similar with hepatitis in a frog from the tivet, published a few years ago and now we're going to get to some other examples. For instance, you really know misless is one of the most contagious virus there is. It's a morbid virus that infects humans and for instance, if you are infected and there are 10 susceptible people around you, 9 of them will get infected if they are not vaccinated. So it's incredible. For instance, with COVID it's around 3. This is incredible more. But of course, since the development of the vaccine, the deaths have stopped a lot all around the world. But in the sub-Saharan region of the world, there are still 140,000 deaths each year of this, mostly with kids below the five years and that's because of the lack of access of the vaccine. Anyway, we were looking for, so this last example, but I could provide a little bit more, is about mice in the south of Argentina and Chile that we found have some viruses that are related to this misless virus. So we're working with that and also with a batch of the Panama, with our flies and some other examples and mosquitos because we hate mosquitos. Well, thank you for your time. In the last slide, I have my contact details and what we do here. Thanks so much, Humberto. That's amazing. Okay, you can take a breath now. Has anyone got any questions for Humberto? I can start you off. I was going to ask you how are data analysis techniques changing what you're doing and then you started talking about the smidge ion and the nanon. So maybe I'll ask you, what's next? Like, how's this going to change? Well, it's a great question because I don't know the answer. But I guess, for instance, today, they have published a new pipeline because when you try to find viruses, how do you look for them? You try to look something which is similar of the ones that you have already known. So you know that rabies is a virus, so let's look something that is similar to this and you have to use thresholds of similarity because if your threshold is like too long, then you are going to have false positive that things that like are cellular or something, but are not viruses. So there is a lot of advancements in that area in particular which appears to be simple, but it's not at all. For instance, today they are, of course, using some artificial intelligence algorithm to try to provide a better pipeline for virus discovery. So what I think is that even though there are like large efforts to try to describe the diversity of virus all around us, there is a need for a lot of work among people to try and to not only describe say, oh, here there are 10,000 new viruses, but try to look at them specifically and look for the possibilities of their eventual emergencies or pandemic characteristic of the possibility that they could generate a disease. And well, I didn't talk about that, but for instance, I work in an agricultural sector, so we work a lot with viruses that affect crops because as I told you, they affect all the organisms. So you can imagine that in plants, the virus have been even lesser studied at all. So yeah, I think that what is going to need is like a lot of effort, maybe better platforms to try to advance in this. Thank you. Any other questions? It could be in Spanish, pues en español. I've got a question. Yeah, what are the trade-offs between like a nano small device and the supercomputer? Like what are the trade-offs in terms of accuracy or just, you know, it obviously brings more portability like you can take it to a forest. But what are the trade-offs? Yeah, yeah, of course. Those big machines have a better quality in their readings. So, but you can compensate that by generating more readings. That is, you have, as I said before, I'm very bad with numbers, but for example, if you have one read with it, it has a 99% accuracy, which is something all common, 99.9% accuracy in the large equipment. This small one started with accuracy of about 90%. But if you read something like 20 times in like overlapping ways, eventually, you know, nine of those reads are going to say the correct thing. So it's easier, you know, to correct that. So the, yeah, you have to generate more data, which is more expensive sometimes, but they are working in the chemistry and they are reaching better quality. So all the infrastructure in sequencing is moving more to this portable stuff. They also, that same company also generates some like, like a table of a lot of manaios together and they can be like similar to the big one, but yeah, it's around there. Okay, we've only got 30 seconds left. You got very quick, one of you two had a question. I'm curious where you get your samples. You have so many diverse sequencing projects. Where does like the Yamamate and the frogs and all the different stuff come from? Yeah, that's a great question because I didn't explain it at all. I mean, some of them, we just sequenced them, for instance, the GERB, et cetera. It's like we have partners and they sequenced the first stuff. But there is also a massive amount of libraries, public available for anyone. One of the most important resources at NCVI, it's called the Sequence Read Archive. And we as a scientist have to deposit our reads there. But again, when we have an objective as like our genes, then there is a lot of data that is unprocessed and so available to look for these viruses. I hear, open data. Is that what you just said? Yeah, thank you so much.