 Welcome to the CIB in Silicon Talk series. I am Attis and I am currently working as a bioinformatics platform of the University of Fribourg. Today, I am going to present you our online resource that we named Tasmania. In this presentation, you will hear about Tasmania, a web interface that compiles more than 2 million predictive antitoxin loci from more than 41,000 bacterial genomic assemblies. A web interface that allows the discovery of non-canonical tasks and supports you in your experimental design by defining the task network in your favorite genome. Let's start by a general introduction about the task. Simply put, a toxin antitoxin system is made of two short adjacent genes. A toxin, which is, as its name suggests, toxic for the bacteria when expressed. An antitoxin, which acts as an antidote against the toxin. These two genes are located within the same operon, with usually the A upstream of the T, and sometimes the other way around. In some specific task systems, the antitoxin can be in anti-sense orientation of the toxin. The tasks are described as key players in various bacterial processes like the plasmid maintenance. This is a mechanism where daughter cells, which do not have the plasmid that bears the antitoxin, will die due to the toxin effects. This is how the T systems were originally discovered early 80s, and since then, they've been used for their potential in biotechnology. The tasks have also a major role in cell death mechanism that aborts phage infection, thus protecting the whole population from the spread of the virus. Task are also said to be involved in bacterial persistence, which corresponds to the metabolic shutdown of a small subpopulation. Task can target different biological processes, like different states of the translation mechanism. This is the most studied target of the task. For example, here, MAZF is a toxin that cleaves messenger RNAs in the ribosomal context or not. Some other toxins like BAPC target tRNAs and ribosomal RNAs. Other tasks target DNA replication, the cell membrane, or the cytoskeleton. Targeting of these key machineries would lead to a metabolic shutdown or even the cell death. The tasks are classified in different categories based on the molecule type of the toxin and antitoxin, and how these two interact. The most studied group is the type 2, where the antitoxin protein binds directly to the toxin protein and inhibits its action. Toxin and antitoxin can also interact at the RNA level. For instance, in the type 1, the antitoxin is a small non-coding RNA that binds to the messenger RNA of the toxin. I can already tell you that Tasmania covers any task type that could have at least one protein-coding locus. Finally, a key concept in the TA regulation is a difference in stability or stoichiometry between toxin and antitoxin molecules. Now, why Tasmania? The target audience is a weight lab and Tunisian microbiologist. It's based on a very simple web interface that I will show you in a bit. It contains a very large list of assemblies, cloned from ensemble bacteria. This has never been proposed so far. We chose to make Tasmania a discovery-oriented pool that comes at the cost of a certain number of false positives. The objective tasks here that are proposed could be of any type, as long as there is a protein-coding cognate. Thanks to a relaxed model and an objective annotation, Tasmania provides a deeper insight into the TA clusters' combinations. Finally, one can use Tasmania to characterize the task landscape of a given genome when designing, for instance, TA mutants and deletions. I move now to the description of the model and pipeline that we used to build Tasmania. The main characteristics of Tasmania is its pseudo-operant TA model. If you remember, I told you at the beginning that the canonical model defines a task as a two-short genes operant. We instead designed a model where there is no assumption about the organization of the operant that hosts this task. For example, there could be a task located within a pseudo-polisistronic operant structure as XATXX, with X being a gene not annotated as TA. Second important point, there is no assumption neither about the length of the TA genes. What you should be aware of when using Tasmania is that it's based on ensemble bacteria, which is a repository of annotated assemblies from different sources. The genomes present in the database correspond to four main fields, proteobacteria, fermicutes, actinobacteria, and bacteriodetes. This might have an impact on the task families that can be discovered. I'll show you here in this slide how we build Tasmania. We get uniprot KB heats that correspond to keywords toxin and antitoxin, and we extract their interpro-identifiers that match to the task description. We use these toxin-antitoxin interpro-identifiers to collect all the genes of the assemblies in ensemble bacteria, local database. We obtain at this stage about 100,000 unique toxin or antitoxin protein sequences. Next, we cluster these unique protein sequences with MM6-2. With each cluster, we build a multiple sequence alignment, and then an HMM profile with hammer-3. We also use PRC and cytoscape to cluster these HMM profiles in order to highlight their relationships. We finally use these HMM profiles to scan the ensemble bacteria proteomes with hammer-3, and thus identify more putative TNA loci. Thanks to a pseudooperon annotation, we can also include the neighbor X genes on top of the putative TA candidates. In summary, Tasmania contains the TA loci based on the starting interpro-identifiers. The new TA heats inferred by hammer-3, and the X-neighbor genes formed by kiln-by-association. Now let's have a look on the Tasmania web interface itself. I am showing you here a snapshot of the shiny web interface of Tasmania, which is hosted under this link in here. As you can see, it's a very intuitive resource. You select the first letter of your assembly, assembly's name, and you can then choose directly from a drop-down menu. If your favorite species name is not contained in the drop-down list, you just start by typing here the name in this field. You don't need to write the whole species name. Type in the keyword, for instance, H37RB. Here would be enough to drop down only the assembly's corresponding to this string. You will see the selected species full name appearing here on the top of the pane. You can apply here an E-value filter and obtain the corresponding table output. The full dataset is always accessible under the tab called All. You can download your dataset, either the full one or the E-value filtered one, by clicking your choice here. Finally, there is a keyword search field in here that allows you to focus directly on the rows of interest. For example, you could type WAPC and you would see all the rows containing the word WAPC appearing. Don't hesitate to drop me an email in case you are having any kind of issue with your favorite model. You can also give me feedback about extra functionalities that you would like to be added on this website. People's Tasmania compiles a very large list of assemblies and because it applies a relaxed model of tasks, as I showed you earlier, it gives the user the opportunity to work on uncharacterized TA systems. I'm showing you here a few such examples. There is, for example, a putative A-B-E-I-I like toxin, which goes beyond the usual published upper limit of 300 or 500 residues. Here you can see an orphan toxin M-K. This one really challenges a canonical two-chain model of tasks, as you can imagine. Notice also that these pages shown here are specific to Tasmania. Tasmania outputs not only the HMM hits, but also they are neighboring ex genes like X-T, T, X, Operons. You can apply the Geek Bay Association strategy to interfere putative new tasks families. By doing so, we could ourselves identify some promising clusters of putative antitoxins, like this one here, which corresponds to V-HUGS, like family and family codes. Basically, the message here is this. Don't overlook at the expert side given in the Tasmania output tables. If you get a reliable toxin hit deeper in the neighboring ex genes, you might find some interesting results. I would like to finish with a concept of cluster modularity. You can see here that some clusters present higher degree of connectivity, like this pin group of clusters, while others seem more restricted like this CBTA one. Also, I would like to insist on the fact that in Tasmania, the clusters are objectively annotated independently of the toxin or antitoxin cognates family. And because of that, the pairings go beyond the canonical combinations shown here in black dotted lines. For instance, VAPB antitoxins would have for classical cognate VAPB c-toxins, the other name of this pin, but also other types of toxins like this PEMK. Let me summarize this presentation of Tasmania by highlighting the following points. Tasmania stands out by its size, which has never been proposed so far. It covers most of the known toxin and antitoxins. It goes beyond the canonical tasks, as I showed you a few examples. Basically, it's ready to be used as a discovery tool to study new families or task networks. I would like to thank our group leader, Dr. Laurent Falquet, the expert in bacterial genomics, and who secured this Synergia project from SNF that has been funding me. Ensemble bacteria also for providing the core database. And the HPC cluster from Bern, which is hosting our shiny web server. And finally, please stay tuned. There is an update which is coming soon, which will be based on the core 44 from ensemble bacteria. It will also include a new functionality called taser. And this will allow you to upload your own genomic sequence that you will be able to scan.