 Can you see? Can you see? Yes, we see and hear. That's wonderful. Perfect, okay. Good evening. It's almost three o'clock in the morning here in Singapore. Talking now, nice continuation from transcriptomics, skipping one step and then we end up with the metabolites. So, yeah, the package or the talk will be about small molecules. So small molecules, just a quick overview are made of course metabolites. Metabolites includes also the lipids and also includes signaling molecules and also becomes more and more relevant is the natural products for secondary metabolites from plants and microbiome. And then we also have the drugs that are measured also in the same approach often. And now we have, you see here up there, you see clinical measured in the first row there, you see clinically measured compounds here, especially focusing on lipids because our lab is working mostly on lipids. And we see here, for example, the total cholesterol, we see here the triglycerides. If you go to the doctor, he does a lipid panel, he will measure these two compounds and these are total compositions. But if we then look more in detail, these are the triglycerides that are measured in the common mass spectrometry platform. So we found around 30 or depending on the methods even up to 100 different triglycerides. And the same is also for the cholesterol. Cholesterol is a mix between astrified and non astrified cholesterol. And the question is, is the total cholesterol more gives you more value than the individual compounds or maybe a subset of the individual lipid species. So the clinical application or the translational research that has been done. It is Metrolomics and Dipidomics focuses basically screening for screenings for new biomarker, biomarker discovery. And then maybe ideally, in the future, we will have multi metabolite or analyte panels, like fuel or 10 to hundreds. And that maybe gives you a fingerprint of a health or disease status. And then what this omics or metabolomics studies also reveal is new diseases or mechanisms in health and those in diseases. So here you really see a map here of lipids are measured in one single mass background. And we can measure 500 pieces depending on the approach that this doesn't reflect by far the reality. So we measure a sum of compositions. And so in reality there's much more behind it. You can already see the complexity and that this may just a quick overview what is the mass spectrometry or how do we do that. So one keyword is here we try to reduce the complexity of the samples so we take plasma we extract the lipids get rid of some of the what soluble compounds. Then we make a chromatography we separate them depending on some properties and then it goes to mass spectrometry. Basically we have here to mass filters, which can select specific molecules with specific mass and they go into a jumper. And then we collide with nitrogen and then break into fragments here. And then the fragments are characteristics for the molecule that we want to measure, and we can measure the abundance of the accounts, and this will give us raw data from some here in the targeted approach MRM here gives us raw data in form of a chromatogram. And the big question is, or the big challenge is how to process such data and how to look at which peak is which one, and how to automatically fit this. And then from this step there's another step from this peak areas is the calculation of concentrations. And this is exactly the part where I'm talking about this is from the process through the raw data to intermediate data that still needs to be like peak areas that has to be further processed. And just give you an overview about our lab or about how we work and what are our challenges. So we are situated in a university but also closely work together with the hospitals in Singapore and around the world. We measure cohorts sample cohorts clinical trials model studies and also mice models and sometimes even plans and and virus samples and like this. So we have a big diversity in projects. Also, and that also is reflected down in the scales. So we measure between four we have panels, we only measure for specific lipids but we also have panels where we can quantify around 600 lipids and this for various more studies like ready patient consider a disease and we try to do some work, five samples but nowadays we have cohorts that reach sorry it's one zero too much here but maybe that's the real future what I wrote here. So, currently we have core cohorts that go to 5000 or 10,000 or even actually close to 30,000 50,000. And we have various tissues, mostly plasma, and we use different methods we have established methods we have new methods that are on the development where we maybe still try to apply them to some studies. And we are trying to implement publish or novel methods completely new from the scratch. So different platforms, and the same is also different analytical platforms that different the way the data looks like, and we also have tools, different tools open source and the vendor specific tools to process the data to do exploratory data and then for data processing that we'll talk about is a big diversity in the lab, Excel are some shiny apps that we have developed. And now coming a little bit to the challenges from our lab. So, we have many people, but we also have a lot of lab is called incubator because you also have a host many visitors and students. And that gives a huge or quite a big diversity in terms of scope from biologists to really analytic chemist that we have background experience. And what we more and more see is that especially because the number of data bioinformaticians and statisticians in our group is relatively small. What we see here is the communication between the biologists biochemist, the analyst, those who really do the work in the lab and really see what the instrument is doing the bioinformaticians, and then the statisticians this communication and the workflow and how we deal with this is becomes more and more of a challenge. And this challenge reflects down in data quality in the reproducibility and the consistency. But also, what especially is lacking in the data is passed from an analyst to be a bioinformatician and there's almost no communication between them. We basically often miss the chance to learn from the data, like something goes wrong with the analyzers or something looks a bit weird. Of course, we can apply batch and drift corrections or this is fine. But maybe if we really look carefully at the data we can maybe we will be able to see artifacts or system issues, or she process problems that we can improve when there is a constant feedback. There's also the issue that many analysts and biochemists, they produce their data, because the large amount of the data, the complexity, like 100,000 or million of data points, they really are dependent on having someone who analyzed the data. And then it starts again, the communication and can you do this block for me, can you look at that. So, and then so what we were asking is how can we improve this process. And strategies are of course we could develop software tools shiny apps that people can use upload their data they get something back. That's one possibility, but this has limits. Especially due to diversity of projects. Some people might have a special case or they wanted this plot and suddenly fixed program tool shine up will not be able to do this and then the workaround is often go back either to excel or really to from scratch start with data license to analyze those specific cases. Well, so we are trying now to train people biochemists students, a little bit with our, but, and this is on the right side we have here from a conference workshop that I organized together with him one joy. And we really go through a published paper and really analyze the data without actually any external packages just to be tidy words mostly and then the statistical part to standard standard protocol standard packages. We really try to go through the people from really peak areas to the final results of the publication. We realize these workshops when we sit together with our people in the lab, we see that there's limits to this. So you, because they are just not able to really catch up with learning are beside the normal work and then they maybe will have one month no time to look at our data and come back. It starts again the procedure. So the idea is a little bit to create a package now that helps me by my colleagues by additional colleagues and colleagues in the lab to analyze data, like a common platform where we can handle the data share the data, and that can be easily enhanced and improved for us. And so this is now online at the moment, but I have to say that it's still not ready for production, but it's amazing online for sharing and for having some feedback. And the idea of this package is to to collect data and manage metadata, especially the metadata is quite complex. So we capture this, the internal version of this package will be able to access our lab information system. So we can directly retrieve sample associated metadata that explains on the analytical side but also on the subject side directly from the limbs. It will be flexible in the way that we can analyze the data so we can apply different batch corrections or normalization strategies and filtering, but we want that these are not applied in the black box approach. So we want really to see what happens when I do a batch correction, but that's the data really look better. This can actually happen. I mean, if we use normalization with internal standards to the, how does it look like, and we want it to be traceable. And then this package should help us also in reporting and sharing of the data, but not just the data itself. As I said, also to record the process, the process to the data, and that this is not just in a, you know, in a notebook somewhere in our notebook, but that this is actually part of the data file that can be shared. And this library then will be used for people in the lab by buying from additions from us and also then serves as a base and more of a space for maybe possible shiny apps or other applications that we use. So in this row actually I prepared a quarter reveal JS presentation, but at the final moment still didn't really work to go online but I have a PDF here. And so we need our and it creates an object that is a S4 so it's a little bit like by a conductor package and we can, but unlike most most packages that exist for this similar field. So most of this package are not really focusing on the real preprocessing of data, especially handling a lot of metadata and really offering this flexibility. So we have here on S4 class where we can, for example, import direct raw outputs from the instruments computers. So there's no manual reformatting needed. That's what really important for us that in case we have to go back to the instrument we can just explore it again and reanalyze it. And then we also can import metadata, metadata can be plain text files but having integrity of metadata so that they really match to the data it's a surprisingly massive challenge. So we have two systems we have the limbs start to use the limbs more and more for certain things as a kind of a lab notebook, and then also a colleague of me, former colleagues made an Excel template macro based to really help people annotate the data based on the notes and in Excel or even in the notebook to really make consistent and data that are keep the integrity. And then we quickly look at the print method of this object we will have no fuel what what is the important and not and unfortunately cut off here but below we also have an overview of which processing steps have been applied. And then we can continue with this object and then we can apply for example normalization with internal standards which are like deteriorated or heavy labeled standards that we spike in at the sample as early as possible. So we normalize them with quantity based on the amount that we know of internal standard and maybe sample amounts protein amounts. It will give me if you give us immediately the unit, but that's always a surprise always again a surprising thing that often the unit with so much concentration calculations is a little bit confusing for people. And then we can calculate QC metrics that will then and this will allow us then to filter the future that we measure which repeats in this case, based on analytical CV, for example, I can set the threshold of 20% CV here signal to blank ratio and the linearity of the measurements to dilution or response curves. And then here it will gives you back in an idea and so that every step should, if you have back an idea what happened to your data so that you're not blindly following something. And here we see that 400 measures futures that we end up in 95 futures so only 25% that's not good. So we can now do a batch correction. So first we correct for drifts. So we have the option to do a drift corrections there's will be a different algorithms there. And also here we'll give you an summary also it will warn you of any issues that you might follow up. To do a batch correction and then we filter again and now we see we have recalled three hundred thirty three species. So, in a quite short time that usually when this is done in Excel it's only possible to a certain stage, and if it's done by scripts by buying politicians, it still has to go again through the processes. So it can really be done by the people itself in the lab and is also being done now. So they have enough knowledge about our studio to run such simple things through, but still kind of black box we have three hundred spaces but we don't know really how it looks like so big focus on this package is also on really creating plots. And so there's a function to really play quality control plots that help us to understand what is in the plots. And this is now an example here so you can put some intensity over the runtime. And you see here, there's 200 samples and you see the different QC types and different colors in the yellow bar on top there. That's a way to cap as samples that are too high and would skew the whole plot so that we basically can focus on the most important thing which is the QC analysis and look for some drifts and indeed we see here this is uncorrected data. We see here really some batch effects here that happened in the first in the second and the third batch. With these plots, if I use Quarto presentation, you can actually scroll through very nicely on the same page with scrollbars. So here we have it on different pages and you can also be exported as a PDF. And we also also working on a shiny up to visualize this in an efficient way, which is still quite a challenge, but making some progress also there. And then we are now. So the idea is also that you can use these objects that are plots, for example, the ggplot objects that you can use them for your own things and customize it so you have a base you have the data you need. But you can continue to do your own annotations of the plots, make them ready for publication, or maybe do some own analyzes for your specific case and here we see here the batch correction before and after batch correction here. You can plot it two times and store it in a variable. And then you can just continue this and not possible here because it's not inductive PDF but you can just add this variable to a ggplotly function and you already get an active plot and you can see which samples are maybe you outliers. And then as a final stage is the writing of a report. So there's different options. One is the Excel report this is in the moment active there's also a PowerPoint version because at the end PowerPoint is still acceptable by many people and can easily be shared with a little bit QC plots and everything but here is the Excel sheet which contains different sheets. One is an information sheet that contains information who has analyzed the data when which package version and all the QC parameters that have been applied or QC steps. And then you see the different QC metrics and then the final data sets that can then be used by internal groups and then we share with others. And then as a second thing you can actually save this as for object as a serialized file, and then you can actually send it to someone else you can open it even without the package as possible to open it you see the different data sets. You still have to be working a little bit but they're quite readable, and then you can actually continue with midar to process and to expect the data. So, and since all the information is stored from metadata to data and all the processing steps are recorded, basically have a fully reproducible workflow stored in one data object file. The next step is really to make it work on it and maybe get some user feedback and to implement that plots that we use in the lab and function of using the lab at the moment to export them to the public version. And then shine up is also on the development time gives. Yeah, so we'll stop here with a quick thanks to our lab that is headed by Professor Marcus wing and my colleagues who constantly help each other to optimize this work source in the lab. And also a big thank you to the great conference. It's my first time I attended this conference. It's really an amazing conference. Thank you very much. And yeah, open for any questions. Thank you Dr Burla if you have any questions please drop them into the chat. I particularly love that in these in this kind of talk that you began with like giving us the details of the science about what we're about to do as someone who doesn't work with the state of very often I appreciated that a lot. And I also love that at the end you get an encapsulated object with the entire workflow in the process, you know, being outlined there I think that's fantastic. Yeah. I really like to have feedback on this and it's a big endeavor. All right, we have a question from Luca do you plan on an extension to untargeted metabolomics. I'm not great at pronouncing words I've never said out loud before. Yes. So this for me. Yeah, yeah, yeah, sure. We also do untargeted lipidomics or metabolomics. And we, yeah, we especially focus a little bit on ensuring that the, yeah, the idea of lipid or the metabolite identification and annotation is correct. Yeah, definitely. And glad to to to the data sets. Our lab is also working on this. That is fantastic and we have some other nice comments in the chat about the package people have used it. So, again, thank you very much, Dr Berlin I hope you can get some rest. It's pretty early.