 Hi everybody. I was speaker, but I'd like to welcome you to the Lyotralsive Computational Biology Seminastry. We have today the pleasure to have Alan Bridge, who is a biologist by training with the research experience in molecular and cell biology. And now he's working as a scientific database curator at this spot at the Institute of Bioinformatics at Geneva. Briefly, Alan joined the SWISPOT group at SIEB in Geneva in 2004, where he worked as a bio curator in the Quality Assurance Department. In 2008, he became jointly responsible for the integration of all manually curated UNIPROD KB SWISPOT entries. And in 2009, he was made head of the Department of Automation and Enhancements, which groups all transfer notation programs of the SWISPOT group. So today, he's going to talk about one project for SWISPITs, and he's going to talk about the vital world of biocoration in no letter. So I'd like to thank you all for coming. I told you today about a fairly new project that we have running in the SWISPOT group, which is called SWISLIPITs. I don't just want to describe this resource to you, that's really how we built this resource. So I'd like to talk about, and I'd also like to talk about the students. So as Diana said, I'm working in the SWISPOT group. Our focus, really, is on knowledge representation by another sister group, if you like, here in Lausanne. So there are much more kind of, if you like, classic bioinformatics groups, such as infrastructure and development of state space. So the project in Minnesota today, about today, it's really one of the first resources that we developed. So what's biocuration? The problem is very simple, if you like. An ever-increasing body of published data about biology in the lifelines is to look on PubMed. There's more than 20 million papers. So the issue with scientific publications is that it's still the main media, if you like. This is a form of communication, which is, if you like, free text or scroll. So it's very nice for people to read. It provides a nice story about a research. But it's very hard to extract knowledge from. So biocuration is really about getting the knowledge out of the literature today and rediscover it, if you like. So what biocurators do is they really synthesize and structure the knowledge that experts put into prose. They target with identifiers, for instance, the proteins of these ontologies and control the capital. And really, the idea is to make published knowledge much more easy to discover and reuse. And the project that I'm going to talk to you, it really has this as one of the overriding goals. It's to make knowledge easier to discover and to reuse. And in the context of this project, it's knowledge and data from a system-based project called LISP. So the biocurator activities that we have at WISPROC, historically, we've been involved in a lot of resource development for the lifeline community at large. Diana alluded to this resource, which is UNIFROC. This is one of the resource of protein sequences and functional annotations. Around UNIFROC, we also develop a number of complementary or specialized resources like some Hamap, which is similar to Hamap, but to use these resources to annotate protein sequences within UNIFROC. Another resource which we developed is a specialized resource called Viral Zone, which is a little bit like UNIFROC, but with a specific twist for historically, we really focus on protein sequences. More recently, we started to get more into chemistry and biochemistry. One of the resources I'll talk about today in the context of this project is a resource called React. This is a database of biochemical reactions, and we've actually used this resource, Ria, in the generation of UNIFROC distribution. Then I'll talk about how we leverage our existing activity sources and if they're involved there. So I said it's like a collaboration between ourselves and the people in this project. Within LipidX, there's a variety of different groups and different approaches to study lipids. Some of these experiments, there are also some groups doing computational analysis of lipids or modeling of lipids, so we have metabolic modeling. These groups are for the variety of lipids with a variety valid for roles. So we have, obviously, a variety of membrane formation, so this is a classic membrane-forming lipid. I'll come back to this structure quite a lot during this talk. You don't like chemical physics, but I want to point out some of the main features of these kinds of molecules. For forming membrane, we basically have two-dimension molecules and a fairly hydrophobic domain. These hydrophobic domains, lipids have roles also as energy source. Obviously, there's a great interest in studying lipids. The effect of that will be apparent when you go down the street with this role. And lipids also have rodent signaling. So this particular molecule here actually controls aging. So the dosage of this particular molecule in this organism, which is the elegans, it will have to control the ratio of the study lipids, the study all these different kinds of lipids and the variety of other species. So what I want to do in this talk is to focus on how lipids are studied. So the methodologies that people use for study lipids and how they annotate their data. I want to focus on some of the difficulties in annotating the data and also the difficulties in integrating this kind of data with, for instance, metabolic models and metabolic models. To explain how lipids are studied, I use this kind of cartoon representation of really the inner workings of this kind of machine. The idea is we like to study a biological sample and describe it in terms of this. We can inject the sample into the machine. We can ionize it. And then the machine contains three different cells which allow us to either scan and detect the mass of the cause. So this gives us a kind of characteristic spectrum where we have what's called a precursor to ion, which is the insect lipid, and we have a variety of products. Based on what the spectrum looks like, we can, well, there's a lot of inferencing that goes on, and there's a lot of assumptions in the way the data's annotated. So this gives you kind of a high level view of the lipid on with an organism in general. But people like this because you can study dozens, if not hundreds of lipids in a single biological sample using these kinds of technologies. The most lipidologists estimate that there are probably tens, if not hundreds of thousands, in a U-character that they don't. Using these kind of high level surveys is now very popular within systems X, but also within a lot of other projects. So we started to work with people in 2013. So what I'll do is I'll show you how, first how the data's kind of annotated with the annotations. Then I'll show you how we try to address those. So if you imagine a biological sample and it has three different lipid structures in it, which is shown here. So here we have, if you like, the head on this side and here we have a different side. You can see that these three molecules all have the same head. So that means they're across the top, same backbone. So they're all PC's, but they all have different pateases. So this notation shows you what the pateases are. So the first of these figures is the number of carbons in the pateases. The second of these figures tells you not only why that's important in a minute, and when you have this kind of notation, it tells you where the double bond is and how it's done. So this is kind of interesting and very important to the people who study these lipids. So if you have different degrees of contacturations and different bond positions, how they'll affect the physical and chemical properties of these lipids. And that will have an impact on the membrane cells. I also told you that lipids have rodent signaling, and these pateases themselves can be signaling molecules. So the membrane has to be a source of secondary. For this particular lipid here, it has a 16-carbon patease of the position one. The double bond is a position nine, and that means the two groups are pointing out the same way. So the effect that that has on the molecule is it can be kind of king, and the molecule and the membrane which concerns this molecule would be kind of this notation means it's telling you what are the pateases, how many double bonds, and where they're positioned. You can see these two lipids. They have the same pateases, but they're flopped around. This one has different pateases, so it has little in common with the other lipids. So let's imagine we have these three lipids in our bilateral sample, and I told you that we can fragment these molecules, and we can look at the fragments. So depending on how these molecules fragment, we can get more or less structural information about the underlying. So if we were to fragment the head, that would tell us that we have a phosphatolcholine, that we lose this part. And all we'd know really is the sum composition of what's on the left. So we have a very high level view of these lipids. Okay, and this is the kind of level that lipid X is normally looking at. So your data annotation would look something like this. We would seem to have one lipid in the sample, and we'd lose information about three particles. If we followed a different experimental protocol, and we had a different form of ionization, or a different form of collision, we might get fragments corresponding to the pateases. So by a process of elimination or subtraction, you know that if you have a 16, and you know the total mass, but you don't know how they're really distributed. This gives you a little bit more detail. So now you know that these two different structures correspond to one annotation, and we can now distinguish those two from this one. So we think we have two lipids in our sample. And if we can fragment them, if we can fragment them in such a way that we can control the positioning, and we can know the actual position on the glycerol backbone of the pateases, then we can get a more detailed annotation, which tells us the composition of the pateases and where they actually are. So we know now that we have three different lipids, but we still don't know what the particles are, because we're still missing the information about where they spawned it. We know these three are now different, but we still don't especially know what they are. An interesting thing about lipids is that the people who study them, they often have years of training in biochemistry, and they can look at these kind of annotations, and they'll tell you straight away what the particles are likely to be. So if you say PC34 to somebody, PC34-1 to somebody in liquid X, they'll say, well, yeah, I know, it's probably 16-0, 18-1, 9-0, because they have the history in the biochemistry to tell them that. This is not something that are not its value, and it's also not something a machine can tell you. So there's no database which is really providing possible structures for these kind of power sources. So what we try to do is try to make this a little bit more explicit. So to summarize what I just said, if we have a biological sample with three different lipid structures, if we have three different kinds of fragmentation protocol that we're using, some of them will make these lipid materials one or two different molecules or sometimes even three, but we can't get the resolution, we can't get the functional information. At least we'd have to use other technologies like NMR. This kind of notation of what we call lipid species or patiatic scan or sub-species is actually published during this project. We have this kind of collapsing of data into, as I said, lipid biologists, often they don't consider this a problem. For them, PC34-1, they know that in their biological sample they can figure out what the substance is immediately defined, but it's not immediately apparent to somebody who's not familiar with the data, and it's also not apparent to a machine. There's also a relative problem which is similar to this kind of hiding of structures under a single annotation, is that apparently different lipids can share components. So as I was saying, if you show PC34-1 to somebody in lipid eggs, they'll immediately tell you what the substance is. Also, if you show them these kinds of annotations, they'll probably tell you what the substance is after those as well. But again, there's some kind of hidden information because if you look at the most common structures for these, you can often find common elements. So if you consider these three lipids, the next three on the list, they have three different classes, three different sets of carbons, three different sets of bonds. In principle, they don't have anything in common, but if you look at the most common structures which would correspond to these annotations, you can see, for instance, these two share a fatty acid. They share the 16. These two share the 1819Z. These two share a head group. These two share a backbone, and so on. So there's a lot of hidden links in this data. As I said, people doing lipidomics analysis, they often have this kind of natural reflex of knowing this kind of stuff. But when you look at published data or you try to map this model, and so on, this isn't explicitly captured. So the issues with the kind of lipidomics that people are doing, the kind of technologies they're using, is that it provides a data annotation where the notation is used if it requires an expert to interpret it. So figuring out what the links might be between different data points to require. And at least when we started this project, we didn't know the chemistry and we found the whole thing extremely confusing. There's also a problem of mapping this kind of data to knowledge bases and models. The one thing about lipidox is there's a computational element. There are other systems like projects which also, this is not really apparent. So what does PC-341 have to do with a pathway in a database like Unipot or Keg? It's not very easy to map. There's also an issue of the comprehensiveness of the individual knowledge. So the way lipidX works is like many lipidomics platforms. They have a reference list of lipids that they study. These are defined by the researchers within the project. And these are what they scan and these are the terms in which they define their biological systems. By increases or decreases. But it could be that there's other lipids that they should also be looking at. So these could be lipids that share common elements or similar that the other. So these are issues that are not really specific to lipidX. They apply to any lipidox. And these are some of the issues which we tried to address in this project were two minutes ago. As I'll show you in the resource that we built, we tried to make this kind of notation transparent to make explicit relations between these different kinds of concepts. We wanted the mapping of the data to models and pathways to be complete. And as I'll show you what we effectively did was to reverse engineer the analytical output from knowledge of pathways. And we wanted the resource to be comprehensive. So rather than relying on individual people's knowledge of a few lipids, what we wanted to do was to effectively use our curation of power if you like to take all the knowledge of lipid structures from the literature and use that to build a resource which would have all possible lipid structures for lipids of insects to people working in lipidX and else. So the idea is very simple. As I said, it's basically kind of a reverse engineering. So we use the resources that we're working on already. So we leverage the activities that we already have in play. And we start by curating metabolic pathways. So anything that relates to the lipids of insects to people in lipidX, we go back to the primary literature and we curate all the biochemistry that we can find about our lipids. What that gives us is it gives us a catalog of building blocks like I showed you heads and tails by the assistant. And it also gives us a set of rules for how these things are known to combine. So this was something like in total 450 pages that we read the building catalog and moved. So we then enumerate all the possible structures of lipids which might exist. So this runs into the tens and hundreds of thousands as you might imagine. We have a lot of combinations of fragments. And we then rebuild the analyst cloud for the mass spectrometry from us. We organize the lipids in a hierarchical structure which corresponds to the kind of analytics that people do in mass spec. So in effect, we rebuild the list of lipids that, so what we can now do, we do all that programmatically and that gives us a framework for describing and curating and exploring lipid data. So the curation, it begins with Rhea. So we start by building all the possible reactions, linking them to lipids. That goes in the Swiss lipids database. That also contains the rules and all the fragments and all the results of structures in the hierarchy. And then once we built this, we map it onto the lipids with lipid X. We regenerate that hierarchy. And hopefully we regenerate not only all the lipids that we did exit studying, but a lot of other lipids which, perhaps they should be studying. So to explain how we did this, I need to show you a little bit more detail on Rhea. So Rhea, it's a curated resource of biochemical reactions. It's a resource we've been developing for now about seven years. Developed with the collaboration between SIB and EBI. Currently contains around seven and a half thousand reactions, which are curated literature and from about 3,400 publications. So we originally developed Rhea, actually, as a vocabulary for describing biochemistry in English. So currently, if you look in a Uniprot record, you have a textual representation of the reaction. So the biochemistry is defined in words. What we'd like to have is a more restrictive representation, which includes chemistry. So I'll show you a little bit more about Rhea in a moment. Another thing I wanted to say about Rhea is that it's also used in the Metanet X project. So this is a project for model reconstruction. The website is run by Marco Pagni, if I'd like to see. And Rhea actually provides, if you like, a namespace for biochemical reactions within that project. So by using Rhea to build our limited database, what we have is a way of linking Metanet X. So about 20% of the contents of Rhea was actually generated specifically for this project. So we funded by this. To show you how we represent metabolism in Rhea, each reaction has a unique identifier. Each of the components in a reaction has an explicit structural representation. So this can be at varying levels of ambiguity. So a little bit like we have different levels of detail in mass tech pharmacy data. We often have different levels of detail in biochemical data as well. So sometimes people will do biochemistry without really knowing the fluxes and molecules they're working on. So they might know that they're working on a class of litids and they're converting one class of litids to another class of litids, but they don't know the precise structure. In this particular example, you have a litid which is a one acyl something becomes a one two di acyl something. So what that means is there's a fatty acid attached here. We don't know its precise structure. So it's represented by this notation R. So we have placeholders for the fatty acid. We know there's a fatty acid there. We don't know exactly what it is. So we can represent kind of ambiguity in this way. Structures that we use are taken from an ontology called KEBI. So the people we collaborate with on the RIA database they developed this ontological KEBI that has something like 40,000 different but it's a manually curated resource of chemicals. About 10% of the structures in there were submitted by us. So we can define metabolism using KEBI. We describe biochemical reactions. As I said, this is being used as a project to describe being reactions in metabolic models. One of the reasons it's suitable for that, one of the things that makes it especially suitable is that all the reactions are balanced. So we have the checking of the atomic balance of the reactions. So on the left, you'll see there's a number of atoms at the back because there's not getting no losses in that. We have links to other resources like, for example, enzyme classification down here at the bottom of IUBMB, keg, metastake and pubic. So something I wanted to mention about RIA is, as I said, we can describe the biochemistry at different levels under usage. But KEBI, as I said, it's an ontology. So there are different classes of compounds and there are instances. So in the preceding instance, we were talking about a particular class of reaction and in this particular reaction, so now we specify what we can talk about individual molecules changing from one to another as well as classes. So here we have classes and here we have instances of non-classes. So it's quite a flexible system for curating information about lipids or really any of them. So this is how we started building this lucid database. We really went back to First Principles. So we did targeted curation of all the pathways by which these classes of molecules can interconvert. So this is a little extract for phospholipids. So these are based on gluster rolls. There's a small backbone with a phosphate. So we can see that heads can interconvert, fatiasis can be added and removed, fatiasis. So we did systematic curation of all the pathways by which these molecules interconvert. Those are the heads. These are the tails. So we can elongate the two-carbon unit all the way to 38-carbon fatiasis having up to six double bonds. So there's something like 80 to 90 fatiasis which are really known in nature. So we curated all the pathways which would allow us to synthesize and break them down. So just for synthesis, there's something like 200 reactions. We have different classes of fatiasis like if you look at these, the first of the double bonds from the omega n and six carbons away. So these are the omega 6 fatiasis. These are the omega 3 fatiasis. And what you often see in nature is these can interconvert, but only in certain animals. So, for instance, humans can't do a lot of these interconversions. You have to eat these omega 3 fatiasis. But some people have done interesting experiments for, for instance, the engineer pigs using C. elegans enzyme to make healthier bacon. So there's some kind of interesting research going on in fatiasis licenses, which is a little unexpected. So for the LIPID-X project, we did, as I said, systematic and targeted curation of LIPID metabolic pathways. This gives us a very large catalogue of building blocks. Some of them are shown here. So, for instance, if we consider glycerol, it has three positions at which you can touch other groups. These can be fatty acids or fatty alcohols or head groups. The numbers tell you how many fatty acids or fatty alcohols are used in glycerols for glycerolytids or glyceophospholytids with one of these head groups. So we have specific rules for how these things can combine. For instance, you can add an alcohol to the first position, but not to the others. That's what seems to be the nature. We've limited the number of fatty acids in glycerol LIPID because these are generally storage LIPIDs. So we removed all the very, very long-term polyunsaturated fatty acids because they're generally found in the restricted numbers. So we basically built rules which take into account not only the non-chemistry but also the non-biology. And then what we did is very simple. We just enumerated all the possible structures. So we have a library of fragments which we call map-to-kebi. So these are the tails and these are the heads. And by combining them according to the rules we defined, we could generate all possible districts. So we did that conversationally. And that gave us around 240,000 digits. What we then did was we annotated them. So we can say a free molecule. How does it map-to-kebi? What's its class and what are its components? We can generate a name. Well-defined nomenclature schema for LIPIDs. So if you know these three things, derive a name automatically. We can provide cheminformatic descriptors like linear representations of structure. Actually, that's how we build the things. So one of these is called the smile representation. Another one is INCHI which is developed by IU-PAC and INCHI KEY. These are extremely useful ways to represent structures actually. For the smiles, for instance, we can generate all possible representations of chemical structures from this. Like this kind of two-dimensional picture can be generated from this. It's very flexible and easy to work with. And in effect, the way we built this library is that we had catalogs of linear parts. This is an extremely useful representation, the INCHI KEY. These are pretty much unique when given structures. What these allow you to do is compare your structures to other people's databases. So you can generate INCHI KEYs for every person in your database, and then you can compare them to other people's databases and see how this works. This is quite easy to do and it's quite handy. So one of these, this database here, is called LIPIDMAP, which is on the list. What they provide is a database of structures very much like this, which they say are all the lipids that occur in nature. So what we then do is we have all our structural item lists. This is the base of the pyramid if you like. And we use that to infer all the mass spectrometry outputs that you would expect to find. So for this particular lipid, which we just built, we can infer what its corresponding sub-species would look like, how it would appear in mass spectrometry, what its pathiastic scan species would look like, and what its species would look like. So for 244,000 structures, it would look like with about 5,000 species. So those are combinations of a particular class, a number of carbons, and a number of triple bonds. What we can now do is we can now map data onto this structure. So we have a hierarchy. It's not strictly speaking in ontology, but it's very similar to ontology. So we can map, for instance, data from LIPIDX onto this level, or we can map in vitro biochemical experiments with defined structures onto this level and mass spectrometry in between. It allows us to, for instance, relate observations of where a lipid has been seen in the body using mass spects to the enzymes which are known to the top. So we use this as a framework for mapping LIPIDX. So for this particular class of lipids, which are possible lipids, we map 797 of the lipids in LIPIDX to species. So we have a lot more species than are being looked at in the LIPIDX pipeline at the current time. What's interesting about this is that 635 of these could be maps to the structures which map the lipid map. So we can use this hierarchy to map from LIPIDX down and across to the LIPID map database. The LIPID maps could explain about 635 of the output. But by doing this free-force enumeration of products, this is quite useful to explain, for instance, the metabolic pathways which might result in the symptoms to go right down. The LIPID map is just a database of structures with no annotations, no pathways, no enzymes. But as we enumerate all the possible structures and then map them back to the LIPID map, we get a mapping of their data onto our pathway. We can also annotate the occurrences of these structures or other species in the literature. So the people who produce LIPID maps, they all exist. But if you look for real evidence in LIPIDX, you find real evidence of the EDD then, which is the boss of actually a person I met in Singapore a couple of years ago at the LIPID. And you basically sell the LIPID map as the reference for everything that's real, kind of like Wisproc. But actually it seems that if you look at the structural data in there and you look for evidence in literature, you find very little evidence for those structures. But if you look at the number of LIPIDs that LIPIDX is looking at and we sell evidence for, there must be at least 800 structures, but nobody ever really characterised them. So if you think about it, there's a lot of biochemistry to be done by really figuring out the stereochemistry what connects them. So the number of species is much the number of items which LIPID maps is claiming. So I think there's a lot of scope for really explaining what LIPID maps are. So they're kind of experts. Most of these guys are in the 70s. They've been doing biochemistry since the 60s and 70s. So they really know this though. They build this database computationally as all the real liquid. But it's not really proven. But what we did, we tried to be very clear we take every possible combination. We build every possible combination and then we can see, for instance, is there evidence in LIPIDX for structures for which there's no evidence in literature. The answer is yes, there's a lot of evidence. But there's a lot of structures to be figured out. So I think this is something that's really a common theme to me at least in system biology is that people are trying to describe the biological system but we still don't really know what the parts are yet. So there's a lot of enzymes to be characterized. There's a lot of basically before we can really build metabolic models and have things which represent reality. So that's where we are with kind of the annotation. So what we now have is we have a framework where let's say LIPIDX has tissue-specific data on a bunch of species. We can now map that onto our hierarchy and we can tell them what is the corresponding. So I think this might not look great if the beamer is not that high-res once. So I apologize if these slides are a bit hard to read. So we put all this data into a database and a website. Some of the people who built this are sitting in the room, so I hope I won't. So I don't use a Mac normally. So we have this interface for looking at we have an interface that allows you to browse this hierarchy. You can specify what the class of the tissue is interested in, how many carbons, how many double bonds, and so on. So you can kind of start at the top and build out. So if you specify the class you're interested in. The class, I don't know if this is visible to you, the class and the number of carbons so if you have this particular class and this particular number of carbons you're limited according to the non-structures having between 0 and 7. It actually gives you the number of scans, sub-species, or the actual number of structures which corresponds to that output. What you can then do is you can start to drill down by clicking on these bars and you can see what the annotation is for these things. So there's a really nice display here. So you have the name of the lipid and the number of carbons and double bonds. So there's kind of seven species at this level. We have these little icons which tell you what kind of annotation there is for those lipids. You can see that two of these seven have some annotations that the others don't and this actually shows locations. So these are lipids which have been seen by mass spec. So of the seven possibles that exist according to non-bicamers, you can click out and see these things. This might be a little tricky to read. So this is quite a high level description of a lipid. We have a definition which is generated computationally. So it tells you that this is a pop-up alcoholine. It's got a fatty alcohol. So what does it really mean? There's 16 carbons, how it's linked, and so on and so forth. There's a little plug to here which might be a little bit difficult to see. We have the parents in the hierarchy, what class it is in KB, and you can see the fragments. So it's a component. So you can go and look for lipids which have the seven components. Or if you're interested in the metabolism of the components, you can go and look at that. So you can see there's some annotations for this particular lipid here. And you can see that it's been annotated as being found in the monocyte. So we use for this kind of annotation we have an ontology of cell lines so it's been seen in the human monocyte. There's a little link here that allows you to see the evidence for that annotation. So everything that's in there is based on experimental data and papers. And you can see a little link here from the paper where this annotation is derived from saying why we annotate this code ontology. And it tells you it's the same. So if that's interesting too, you can then flip back to the browse function and you can see you're interested in this particular lipid. You might want to go and look at the structures that correspond to that particular output. So you can now click on this with Lycon to get only the children of that one. If you do that you get this kind of collapsed view of the hierarchy which shows you here you have two possible structures for this lipid. If you look at these with Lycon here you can see there's a little colored dot and that's metabolism. So the intron and metabolism is this lipid and not the other one. So you can then flip out and look at this particular lipid. Now you have an actual defined structure for it. It's been mapped for the lipid map from Kebby. You can see these cheminformatic representations of Inchey and Smiles and Microphone and Charging and so on. If you look at the metabolism you can see that it's been annotated as being linked to this rear reaction and this particular protein. Again we have the possibility to see the evidence. What this is telling us is that there's a non-reaction if you find these products interesting either you can click on these so we can see that it's liberating one of the fatty acids. If you know anything about microbiology you'll know that this polyunsaturated fatty acid is involved in cell tissue but we know that this thing can be found in a monocyte it's involved in some signaling. You can then click out you can click and go and look at that one metabolism of this particular lipid as well. What's interesting about this one is that there are several Ticocon P4A's which actually metabolize this with some of which are non-tasks with roles in inflammation. What we also provide is linked also to activated forms of lipids for the metabolism of fatty acids some of it occurs on isolated fatty acids and some of it occurs on activated fatty acids of the species like the ones I made. We provide links for these forms so you can click out you can go to those and you can see what's known about the metabolism of these. This is all the data we've created specifically for this project only in the lipid database you can't find it in the community. Here you can see actually this activated form of this lipid being used by a variety of enzymes to make a variety of membranes that's just right. That's kind of a whistle-stop tour of how the database works. We have this kind of browse function which allows you to go in at a very high level drop-down between the different levels and see what kind of annotation there is. Go all the way down to the structures if you like and then look at the metabolism and the interactions as well as you can search for names. So that's why we are with the Twisted Project now. What we built or we tried to build is really a framework which allows you to precisely describe information about lipids so it can be very hard to find out in the beginning. So we effectively what we did was we put the lipid X we then went to the pathways and reaction databases that we worked on and we kind of reverse engineered that. So you can now ask questions like where does my lipid occur what questions might exist for it and what are the reactions. So that's what we built, how we built it. So I wanted to emphasize the role of biocuration biocuration so it's really kind of collective knowledge building. So it's been quite a long job it involves reading a lot of papers annotating a lot of proteins and a lot of enzymes. What's been really great about the project actually it's not just the vitality and the spot it's been building these people together it's also leveraging the expertise of the people in the today. So this is a project which was never started on our own it's not a problem we could have attacked without the input of the people working on the lipid omics platform in the today. So I think it's been a really nice example of the synergy between biothematic groups, biocuration groups So we built the resource which is fully mapped for lipid X so actually once we rebuild the hierarchy we map it back on to all their lipids it maps other public databases like lipid maps so we can now say what is the corresponding structure in lipid maps from lipid X what's the corresponding reaction and it also links to metanatex so metanatex uses rear the whole thing is built on rear so we have a mapping of the lipid omics data on the metanatex as well So what we're planning on doing next we're still working on finishing the last few lipid classes we expect the database to approximately double inside lipids What we then plan to do is to move on to making doing other targeted projects with other systems like one thing we're interested in if anybody is working on lipids they want to collaborate so we're starting a collaboration with the people so they're studying the interaction with the host so this is kind of an interesting beat because it's extremely lipid rich compared to other as a large portion of this genome is making lipids or metabolising lipids it's cell wall it's extremely complex and it has a lot of roles in pathogenicity and the lipids are extremely important there and historically the lipid composition of the bacterium is being used as a means of typing the bacterium so there's a whole branch of lipid biology in Chebacca motorcycle based on the lipid they have so they have some really extremely long and quite bizarre lipids there's still I think a need for doing some kind of lipid x-style rebuilding of the lipidomic knowledge of this organism so the current the current state of the art is really kind of two databases which are really Excel which were built by two groups in the US essentially I think using Excel macros to build a master's base on the addition so these are guys who built databases of up to 5000 theoretically predicted masses so they're things you might find in the macros bacterium but they're still on the X plane about 20 the best database of the current time is one called Lipid Bank in Japan and it has about 10% of the possible structure for these so we think there's about 2000 of these in a group of bacteria which includes macros bacterium public databases usually have between 20 and 200 so that's all I'd like to say about the lipids what I'd like to do is really thank all the people who played a part the curators Lucila, Nabila and Anne and Lucila actually designed the logo so thanks I'd really like that noise Rob out who's been there from the beginning he did an incredible amount of work on developing the database which meant also developing tools for doing generation, building the web site and so on and then there's Mike Liu who really put into place this kind of ShyPar site browsing Dimitri helped a lot in engineering the database suddenly got extremely big and Anne is the person who built the hierarchy to �er all the software which takes all the structures and it regenerates these kinds of elements I'd like to thank Kizu, Howard and Bartilli from LipidX and also other people in the LipidX community especially Thomas and Ursula, and also the people in the rear team. The funding for this project initially came from Systems X and SID. So this is what they call the Special Opportunities Project for Systems X. So most of the projects in Systems X are research, technology, development projects for RTD. This is something called the Special Opportunities Project. So this is kind of a companion for the Systems X. The aim was really to facilitate the reuse and the preservation of the data in the Systems X. So that was the Systems X, and without them we couldn't have got funding from SID. So I'd like to thank you for your attention and your time, and if you have any questions.