 Ugh, it's really cold outside still. Okay, so welcome back also to people on Moodle. So like I told people already, first we will do some DNA barcoding or meta-barcoding. So as far as I understood it, DNA barcoding is a Marlin. Marlin is not a proper term. So as far as I understood it, right? Because I've been reading the Wikipedia and then I read through all of the different pages below. And DNA barcoding is a way to identify species, right? So you take a sample, for example, you go to a lake, you take a cup of water and you want to know what species are in the cup of water. And it is done by sequencing a very short fragment of a specific gene and then comparing this gene sequence that you get to a database to identify the species. So what you do is you have all kinds of species in your sample, like here. You extract the DNA, from the DNA you PCR out a very specific gene and then you make a library which you send in for sequencing and when you have sequence you get back all these sequences and then based on these sequences you can kind of define the barcode. How they exactly define the barcode was a little bit unclear to me, but that's because I hadn't had enough time to really look into detail. But I think it's just based on SNPs within the sequence that they sequenced. So why would you want to do DNA barcoding? So according to Wikipedia, and I haven't checked all of these claims and I'm not really certain that I agree with all of these claims, why you would want to do it. But these are the reasons that Wikipedia gives you. So it increases your taxonomic resolution, which might be true, right? Because if you would just look in a microscope, so you take a drop of the water that you got from the lake and you would look into that drop of water under a microscope then you can only identify so many species and of course many species are very similar because under a microscope like if you see a little animal with a couple of legs then hey, you have to be really good to kind of say that this is a completely different species than something else which looks very similar, right? So, and of course in DNA you don't have that problem. DNA can help you decide if there is a, if two species are equal or if they are different. So the next step is that you can relate this to environmental factors. So hey, you could go to a lake which is having a very high salt concentration. You take a cup of water there compared to the normal cup of water that you took from a normal lake and then the idea is that you can kind of find if there's a relationship between the salinity of the water and some of the animals living in the water, which I totally agree with. This is something that you can do, especially if you do this in kind of a gradient, right? So if you would have like five or six lakes with increasing water salinity or some other thing in the water which is different and of course this will tell you a very good combination of which animals are in the water and if a certain species is occurring more when the water salinity goes up. Although there is already a little bit of an issue with DNA meta barcoding or barcoding is that as far as I understood it, it's not a quantitative method and this is because of the PCR step. So there are some drawbacks to kind of determining how many of a certain animal or how many replicates of a certain animal are in your volume of water that you got. So as far as I understood it, the method is very good when you are describing it qualitatively. Like, is this animal here, yes or no? But to determine how many of a certain species there is relative to another species, that is where some bias comes in and this bias is due to the fact that some primers will work better but we will get to the drawbacks of the method. So increased comparability among regions. So if you have like water from six different lakes and six different countries, of course DNA barcoding will help you compare them much better than looking through a microscope. Inclusion of early life stages and fragmented species. That was a little bit unclear to me. Also what they meant with fragmented species. I'm not really sure about that but the inclusion of early life stages is definitely true. If you would just look under a microscope and you see a bunch of little kind of eggs, single cell things floating around. Of course you cannot really do a taxonomic qualification when you have only a single cell. Taxonomic qualification of an animal is only possible when you have like a fully grown animal and you can see oh it has six legs and a stripe on the back and these kinds of things. So DNA barcoding definitely helps when you are wanting to include like the very early life stages at which you cannot do an optical determination yet of which species it is. And it allows the limitation of cryptic slash rare species also there. This is true, right? If you would extract your DNA you would do your sequencing of this little gene that you are interested in. Then of course if you find a barcode which is not in the database then of course you can say well there's a new species here. So it allows you to do that more or less. It increases the number of samples which can be processed. That's definitely true because all of this is relatively high throughput. You could just take water and have some PhD students do it instead of having to have a fully qualified guy that knows all the taxonomy and looks at water or soil or these things. Then the next one I actually don't really agree with. Saying that it's non-invasive is like I don't see how it's more non-invasive or less non-invasive than taking the water and looking at it under a microscope. Because you're taking the water out, right? So in theory you could just be lucky and scoop up the last of this very rare species out of your lake and leave the lake without this. So and the DNA barcoding does not really change that. So I don't see how DNA barcoding is more non-invasive than doing the same thing but then looking at it under a microscope. So I don't know what that claim is based on but the claim was made. So I just wanted to show you guys that according to Wikipedia it's a non-invasive method which you can kind of agree on. You're not like scooping up half of the lake or something like that. So you're just taking a little sample from the lake but of course this might be invasive in itself. So I found this a really weird claim especially compared to the other methods that are used for determining which species are in your sample. So the way that it works is how to DNA barcode. Well it's like a four-step procedure so you do your DNA extraction. Well you do your sample collection. From your sample collection you do DNA extraction. For fish it is less invasive because the alternatives are electrophishing and gill nets but you're still scooping up fish out of the water, right? You can extract DNA directly from the water. And that's the thing that wondered me because you still need to get the fish to take the DNA sample from the fish. So you can't just take a cup of water and then look at the DNA in the water because you might when fish are spawning but normally fish don't leave traces of DNA floating around the water. And of course it's less invasive than electrophishing or gill nets. But in the end you have to take the fish out of the water and get the DNA from the fish. So that was the thing that confused me a little bit. But it might be that because if fish are spawning then the DNA of course of the fish is in the water because they're releasing sperm and eggs into the water. Just the water. Yes you can. I would like to see that because like are fish losing that much DNA into water? Because that would mean that if you have a fish which lives in the bottom of the lake and it lives into the mud then just scooping up some water from the top. I don't know if you get a very accurate quantification then of how many fish there are because maybe traces of urine. Well urine doesn't really have a lot of DNA in there. Like urine is relatively DNA less so. And poop also doesn't work because there's a lot of bacteria and mucous excrement. Yeah there's a lot of things floating in the water but does that really represent how many fish there were? If you have one fish which like poops 100 kilograms in the water and you have another fish which only poops like one kilogram in the water then you would say well there's 100 of the one fish and only one of the other fish, right? The quantification there is really difficult. So for me when I read it it seemed like a very good way to tell if there is one of these fish in the water. You only need a little thanks to PCR. Yeah but that's the big bias that you bring in right? And that's the thing that was a little bit strange. We will come back to that but had the DNA barcoding worked like this so you have for example your gene of interest and then you do DNA extraction, PCR amplification and for PCR amplification you actually need primers, right? So what they do is they look at the variability across the gene and then design primers in regions of the gene which are not highly variable. But this is a very big drawback because there might be a completely, there might be thousands of fish which have a mutation here which makes your primer completely non-functional for this species, right? So you might see that you might miss the crocodile in the water so to speak, right? You take your water sample, you do your PCR but you never amplify the crocodile DNA because the crocodile that's in the water is just having a two or three snips in this region, right? So this region is not zero variability as you estimated beforehand but has a little bit of variability and this little bit of variability will make so that primers cannot really bind there because primers need to be almost an exact match and as you can see is that to do proper DNA barcoding you need a gene which is relatively variable, right? So you can't take a gene which has no variability because then you cannot decide which species there are but by taking a gene which is variable you directly run the risk of excluding like a whole part of the tree of life and I found that really weird because then there can be like so you're testing your water and then you're saying, okay, so these are the species that are there but you could just miss like a whole whale or a crocodile which is in the water. So for assessing the biodiversity you need to know for sure that your primers are going to work on all of the species in the water but one of the things that they said is that it allows you to detect cryptic and rare species but if you've never seen these species before you cannot take these sequences into account when you are designing your primer. So it's kind of a catch 22 like you either only look at species that you know you can look at because you have their sequence already or you look at species that you don't know but then you are kind of limited to the amount of genes which you can take. So I found this a little bit weird and the PCR step is dangerous so to speak because of course primers if they are not a perfect match but have like a single base pair nucleotide difference then the efficiency of the primer will be different for different species. So some species will be amplified every round of PCR but other species which have a SNP will only be amplified like every other sequence. So in the end you end up with a massive skew towards a species which has a perfect match to your primer compared to species which have one or two SNPs there. And of course beforehand you can try and figure out which primers would work the best, right? By putting them in regions which are highly conserved across many many species but we know that there are, it is biology. So there's always one of these white crows there. So a species which does have SNPs in this highly conserved region. And that's the thing that I was a little bit wondering about with this method is how do you make sure that you don't miss something? And if you make sure that you don't miss something how do you then make sure that you are not biasing your analysis towards something that you want to see or towards something which is relatively uncommon but you make it very common because your primers are a perfect match to that. So I think that everything on this method is very sensitive to the primers that you choose and choosing slightly different primers will give you a completely different quantification because now you might be biased towards crocodiles instead of fish or you might be biased towards whales instead of fish. So that was the thing that I really wondered about. But the method itself is relatively easy. It's a four step method. So DNA extraction, PCR amplification sequencing and then you do data analysis. So data analysis is nothing more than looking at the sequences that you have comparing them to the database and then saying, okay, so I have a sequence here which matches to a species. So which genes do they generally look at? And that I also found a little bit strange because they use 16S and 12S and Koi and Cytochrome B for things like animals, for plants, they have different markers, for bacteria, they have different markers, for fungi, they have different markers and then for protists, they have different markers as well. And of course like there's not a lot of variability in some parts of the 16S genome but I would bet if you would just take ensemble and you would dump all known 16S species out of animals then I would bet that there is literally every base pair will have variability if you look across enough species because in the end it's biology and we know that every base pair in the genome can mutate and because it can mutate, it will mutate. And at the PCR step, yeah, it is, you only need a little bit but the problem is how much do the animals really, how much is there really in the sample? And that's something that I found a little bit strange. So these are the genes that are used commonly and of course if you look at a certain gene you also have to have a database, right, which uses this. So I found that there are three major databases which are mentioned on Wikipedia and also cited a lot in the papers that I read about DNA barcoding and that is the barcode of live data system also called bold and this is a database mainly for animals and it is based on the Koi genetic marker. So it only, only if you use the Koi marker and only if you look at animals can you use this bold database system. You have Unite which is the reference database from molecular identification of fungal species and they use this internal transcript spacers, genetic marker region. So they have this ITS marker and then you have diet.barcode. This is data for two different genetic markers and they don't really have a focus on any particular species but they are more or less multi-species. So they look at fungi and animals and diatoms and these kinds of things. But you can see that for example if I would take the CPN60 marker in bacteria, right, then is there a database for that? And like there doesn't seem to be one single database which spans all of these different markers and has like barcodes for all of these different animal groups. And of course when you're looking at biodiversity then of course taking a sample and then first doing PCR using 12S to detect all the animals then using the ITS marker for plants then using CPN60 for bacteria and for example RBP2 for fungi and then Koi for protists. There will be a lot of overlap between these things. So I'm a little bit uncertain about how does that will be. But at least there are some of these databases where you can look up your barcode and get a species assigned to your barcode. Again, also here, people make mistakes, right? So for every entry in the database, it's human work, there will be mismatches. So some of the sequences or barcodes in the database will be coupled to wrong species. So again here you have a source of error. And depending on which database you use you could get a slightly different answer because the barcodes might be not properly annotated or the annotation might vary from database to database. So again, of course you can circumvent that to use multiple databases and try and find out that way if it works. But I found it a little bit strange. So when they talk about DNA meta barcoding, it's the same as DNA barcoding, but now they do simultaneous identification of many taxas. And the thing is that for DNA barcoding I'm 100% sure that it will work because you're only looking at a single species, right? So you're looking at a certain barcode and see if that's there, right? So you can identify if there is a mushroom or if there is a certain fish in the water or not. But when you are starting to do this and do this kind of multiplex, right? And that's one of the things that bugs me the most about the method is that it shouldn't be called DNA meta barcoding. No, it should have been called multiplex DNA barcoding. Like there's a way of naming things in science, right? So it's a method for taxonomic identification but the method name is taxonomically wrong which is just a little bit strange. The idea is that barcoding is for a single species, meta barcoding is when you look at like your whole sample in one go. So it should have been called multiplex DNA barcoding but for some reason they decided that meta barcoding was a more buzzword name so they went with meta barcoding. But that's the only difference as far as I could find. And of course it doesn't really matter. Like sequencers nowadays can go up to such a high amount of sequencing data that you can sequence like millions of breeds. So all of these breeds might belong to a different species so that will have the thing. So the applications here are monitoring of biodiversity which I agree with. You can monitor biodiversity. You can say that this species is absent or the species is present but biodiversity is not just if a species is there or not. It also, if you monitor biodiversity you also want to keep an eye on population sizes. And that is something that you can definitely not do with meta barcoding. You cannot say there are a hundred fish or there are a thousand fish in the lake. You could just say well, there are three different types of fish and it seems to be that this type is the most common fish but even that is a very bold statement to make because your PCR step might skew your results to one of the three species. They also use it for paleontology and ancient ecosystems which again, yes, you can do that. Plant pollinator interactions. Yeah, I didn't really see how you would do that non-invasively, right? Because you need to catch the plant and the pollinator. So you just have to take a flower of a tree or of a plant somewhere and then mash it up and get the DNA out of it because like insects do not leave massive amounts of DNA in the environment. Diet analysis, yeah, definitely, definitely. If you just take a poop sample from a certain fish then you can see what the fish ate and that is definitely a very good method. You can't see what the quantities were but you can say, well, this fish predominantly eats insects and little algae or something like that. And they use it for food safety. Again, here, you might miss a massively pathogenic bacteria in your food which just has snips in the region that you're trying to amplify using your primers, right? Because if you have snips in your primer region then this method just doesn't work at all. So the choice of primers is critical when you do DNA bar coding. So these are the five physical limitations that Wikipedia read or listed. So let me, I made a little file for this what I thought would be there. Because I agree with a lot of these. So the physical parameters, the physical parameters are very difficult to define at least for me, right? Because one of the things is that it hangs in with the database being wrongly annotated. So if the database is wrongly annotated then of course you get wrong data out, right? If the database just made a mistake in annotating a certain species to a certain bar code then from now on you will always detect the wrong species and this is not something that is easily fixed. And the same thing holds for just dumping sequences out of GenBank or some other database. Also there a lot of sequences have been found but have never been annotated to a species or some of them are just wrongly annotated. And had they say that the best would be to just limit yourself to a number of species which you know to be true. But then the problem is that one of these advantages the cryptic species, right? Or the rare species you won't be able to find those. The technological bias is something which I really, really worry about. Primers cannot be designed universal across all branches of the tree of life. Every animal will have snips at positions where you do not expect them to be. And this kind of fits in with the physical parameters. If you use a certain primer, then this primer will not work for some animals. And that is the main problem that I have with the method. There's a lack of standardization but that is something that people are working on because depending on which PCR protocol you use, which primers you use and you get slightly different bar codes. And of course you can only use a method like this for monitoring biodiversity if you have fixed standards. You say we use these primers, this is the protocol, this is the way that we do our sequencing and this is the way that we go from sequencing to bar codes. So there are some projects, I read up on the DNA Aquanet or the Aquanet DNA kind of thing which has been done in Europe. So there it makes sense. So, yeah, standardization is very important and without standardization, this method is not going to take off. And then you still have the mismatches between morphological and DNA bar codes based identification which they gave on Wikipedia a couple of really interesting examples. Plus of course the fact that if you want to monitor biodiversity then the problem with bar code based identification is that animals still have DNA. So if everything in your lake is dead you will still identify species in the lake because the DNA will still be there and that was something that I was like, hmm, yeah, that's not something that you really want, right? You don't want to have like an empty cup of water with all kinds of dead little animals floating around in the water and still identify that there are and that there's a massive amount of biodiversity because you could, right? But if all of the animals are dead in your sample then DNA bar coding will still pick up their DNA it will still tell you that they're there while conventional methods just looking through a microscope will tell you that there are no animals living in your sample so it will tell you that the biodiversity is zero. So there have been instances where the estimations of the richness of biodiversity are completely different from looking through a microscope compared to looking or using DNA bar coding. So these were some very serious shortcomings that are kind of inherent to the method and these all come from where the method comes from. So like some remarks that I wrote down is, species can and will exchange DNA. If I think about like seaweed, seaweed has a propensity to kind of exchange DNA with its neighbors and the same thing holds for coral reefs, right? So if you have two corals and they have like separate DNA, right? And then they grow and when they touch each other they will merge their DNA. So you will get like a hybrid of the two. So all of a sudden you have a completely new species growing from where they touched. And if the target gene that you are doing is exchange divergent taxa will seem close together or actually the same while they're actually not. And that's also one of these drawbacks, right? Is that it works very well for species that are having like one or two DNA strands. But for things like onions, which have like 30 copies of each DNA strand and you will, it will not be a very good method because onions also like to exchange DNA with their environment and bacteria the same thing. Like bacteria, they kind of conjugate with each other and they just exchange the genetic material. So you can have like an E. coli bacteria having the 16S RNA of a bacillus subtilis. And then using DNA barcode and you would say, well, this is a bacillus subtilis. But when you would look through a microscope you would definitely say, no, this is an E. coli based on the way that it looks. The method is very, very susceptible to the gene chosen. And of course this is something that you can kind of circumvent by doing four or five different genes, right? So by looking at four or five different genes in tandem, you could get an estimation based on each of these genes and then kind of the overlap between it is the thing that you want to trust. But again, there are issues there because sequences don't compare it up. You have to decide which barcode belongs to which species. It could be that you get a completely different description of what is in your sample based on the gene that you're looking at. The thing that I read in none of the papers in which I had to think about was since you are doing DNA sequencing, but when we were talking about proteins and DNA and the tRNAs and these kinds of things, we already found that protein sequences are much, much more conserved in the scope of evolution. And the DNA sequence is not really, right? So the third base pair in the DNA can change anyway. So if you sequence a piece of DNA, right? And it's just hypothetical. If you sequence three pieces of DNA, right? So the first sequence and the second sequence, they have one single nucleotide polymorphism, right? So you would say, okay, these might be two different species. But then the third one has like 15 single nucleotide polymorphisms compared to the first one and 16 compared to the second one. But as soon as the protein encoded by the third one is identical to the first one, then it's actually more or less the same species because like the DNA is much, much less stable than proteins are. So you can change, hey, if you just take a protein which is 100 amino acids long, then you can change 300 base pairs without changing the protein. And of course within a species, this happens, mutations occur. And mutations are only selected for or against when they are changing protein structure. So when it is happening in the wobble base, then your barcode changes, but the species hasn't changed. And you can have a lot of differences on DNA level and have result resulting in no differences on the protein level, which then would not have any speciation thing, right? It would not lead to different species. Primers can and will fail. As you need to target a gene which is variable. So because you are targeting a gene which is variable, there will be variability in the region of your primers. And since there is variability in the region of your primers, your primers will be massively biased to species where they perfectly fit compared to species where there's one or two snips. And this is a massive, like a 10% drop in PCR efficiency will, because it's an exponential method, in the end make your quantification impossible. So the whole method is only qualitative. It is not quantitative. So in the end, you still need to look through a microscope and see how many of species A are there and how many of species B are there because DNA barcoding will never ever be able to tell you that. All right, then I had a couple more remarks. It seemed to me that this whole method is just to save money and just kind of be lazy in a way, right? You just take your sample, you throw it in the sequencer and then the sequencer will tell you what's going on. But this is not a good way to keep track of biodiversity. The main issue that I had is that the method is definitely not novel. It is just a new label on a 30 year old established technique and the original technique is called 16 us RNA sequencing to identify bacteria. And this was already published in the 1990s. So it's a 31 year old method in which they in 2015 or 16, they just put a new label on it and then just started selling it as being a new method. But it is not, it's just the same 16 S RNA sequencing method, but for 16 S RNA, there is a very, very good reason to use 16 S RNA in bacteria because the ribosomal RNA is kind of fixed in areas, right? There is some variability in 16 S RNA, but there are regions in the 16 S RNA where SNPs cannot occur because they will change the protein or now they will change the binding with the transfer RNAs. And so there are definitely regions in 16 S RNA which are like very stable across like thousands of species, just because of the way that kind of the DNA coding of life works. But for the other genes that they mentioned, like the Koi genes, I don't see how you can find like a stable region in this gene, which is like a million or which is preserved across like 100,000 different species. So those are the main drawbacks that I had with the method. It's not a bad method. It's a good method to kind of quickly quantify which species might be in your sample, right? They might all be dead and you might still pick up DNA which is floating around from fish that died like 10 days ago. So if there would be an oil spill in the lake that you're studying, then probably only after like a week or something will you find a drop in the biodiversity because like stuff will start dying off and only after a certain amount of time has passed, do you find that back in your sample? So yeah, it's a good method, but it would not be, I would not say that we should replace the standard method with it. Like the standard method of doing diversity analysis hangs on determining how many species are living in an ecosystem but also what their population sizes are. Yeah, because you get no idea if a species in your DNA barcode sample is really rare or if it's really common and those are my remarks. So is there anyone who has any comments or like thinks that I'm completely wrong and it's like the best method ever and it's just something completely different than 16S RNA sequencing. And of course the 16S RNA sequencing is still used, right? They still have 16S for bacteria and also for animals. But yeah, I think it's a good method and the problem is that there's no easy way out of the shortcomings. So you have to continue doing the classical kind of looking at your sample, counting how many of each species there are. You still have to do that in the end if you wanna get data which is good and reliable. All right, so if there's no comments or questions then let's just continue with the literature management section. And I did download a example data set by the way. So you can play with example data that's there. Which is good, I like that. I like when people have like a little example data set where you can play with, go to a database, figure out what's in the kind of example data set. So it does work, but it's just, I don't like the lack of quantification and that's because I am a quantitative geneticist so I don't really like qualitative methods but that's just my opinion. All right, so if there's no further questions then we will continue with bioinformatics, literature management and I have to hurry up a little bit because we're already again at 37 minutes recording. So this is the overview for today, PubMed, citations, web of science and these kinds of things. So I think everyone knows what PubMed is. PubMed is like the main database for biomedical research. All right, Jan Haga. I think it's all about study design and the questions. If you're interested in migration and biodiversity in stream inhabiting fish it can be quite nice. Yeah, of course it can be a very, very good method because you still get a lot of information, right? You get the information if a certain fish is there and you can sample over like a spatial and temporal range, yeah, no, that's definitely true. It has a lot of advantages, right? It's just if this would be your only method to monitor biodiversity then the shortcomings would be humongous, right? Because then you would get the wrong idea of what's going on and that's the thing that I don't really like. So there's still the need for the old method of just looking at the sample and quantifying it. But you're right, especially for things like fish if they really like shed a lot of their DNA into the water and you can kind of easily test it from the water samples then definitely it will work. And you can keep track and you can see, well, I have a river and I haven't found salmon DNA in the river in the last 10 years and now all of a sudden I find salmon DNA so that would say, well, it might be that salmon's are migrating back into the river or at a certain point in time. So it has its advantages and it has its place in looking at biodiversity. It's just that it should not be the only way and you should not kind of get rid of the old way of doing it because the old way of doing it was quantitative and this method is really qualitative. So it just gives you a yes no answer which is still better than not having data. And since it's relatively cheap and relatively, you can do a lot of samples. Yeah, you should combine it with truth in methods especially if you use it in things like paleontology. I think for paleontology, it really makes sense to do something like this. If you're interested in, for example, early human diet and then a method like this can tell you if a certain plant was eaten, yes or no, right? So that's very useful. This method only is being used in the Amazonas referred to tract species diversity. Yeah, yeah, yeah. And you can use it to a certain amount. It's just that this, the weird Amazonian fish which is highly endangered and only seen like two times in the last 20 years, this fish you cannot track in a way, right? Because you would have to have DNA from this fish and have it in your database barcoded and only then will you be able to follow it. Especially if this fish has kind of mutations in the primer regions that you are using. So, but by using many genes and by using different primers, these things would work. So I think it's an interesting method and I'm definitely certain that it has its place in science and I like the height through nature of it, right? That's one of the advantages. You can just go somewhere, scoop up a little bit of water all along like a river and then do the analysis when you get home instead of having to dredge up the whole river to kind of figure out what's there. So in that sense, it is more non-invasive, good. So yeah, it's interesting. And of course, it's not an awful method. It's 30 years old at the minimum. At least that's how far I could go back and find like reviews of 16S RNA. All right, so let's quickly talk about PubMed, right? And leave the DNA barcoding behind. We can talk about DNA barcoding more at five if you guys are still up for discussion. So, PubMed, it contains 26 plus million citations for biomedical literature, not so much citations, but it indexed a whole bunch of databases, right? And this is the main starting point for scientific literature. So the nice thing about PubMed is that you have this button which allows you to export a certain paper or the citation to a paper into your favorite reference manager. It is a free resource. I love free resources developed and maintained by the NCBI, the NLM, and the NIH. And it provides you with a whole bunch of advanced search options, which is really, really nice. So the advanced search options by PubMed, hey, you can search by keyword, by author, by journal name, by year of publication. And so, keywords allow you to identify key concepts in your search. So you enter the term in the search box and the nice thing about PubMed is that it will suggest things while you are typing. So if you're like me and you make or you're a little bit dyslexic and when you start typing, then it will say, well, did you actually mean pig with an I instead of an A? So hey, it's very good. And then you just click search and it will give you. So if you wanna find out what role pain has in sleeping disorders, then of course you use keywords like pain, sleep, and disorders and it will give you literature. But you can do a lot of things because you can use quotes when you have, if you don't want papers which have sleep and disorder in there because that's almost every paper ever written. But you only want sleep disorders, you have to quote them, so that's important. You can use standard Boolean operators like and or and not. So sleep disorder and pain will give you papers which are talking about sleep disorders and pain or will give you papers which talk about sleep disorders or papers that talk about pain. And you can construct very specific queries like pain and sleep disorders and apnea. And of course here, when you use the brackets, then you can kind of force it to be a single keyword. You can search across fields and fields are enclosed by square brackets. So hey, if you wanna search for all QTL papers published by me, then it's QTL and Arendt's and Arendt's of course has to be the author. So you don't find papers where they just talk about an Arendt's. And you can also do things like language which is really handy. These are all the available fields and I just wanted to mention that. So if you're interested in like all publications from a certain grant or certain ISBNs, then you can just search for that, which is nice. You have the query builder, which I like a lot. So if I have to do like a whole bunch of like very specific queries to figure out, especially when you do literature review, right? Then you want to have all available literature from a certain time range published in a certain field, in a certain language. And then the query builder helps you to kind of make a pre-selection of all the papers that are important. And then you still have to go through them manually to see if the papers are really about what you want. But if you want to see all of the QTL papers that I published in plus one in the English language, then at the query builder can help you and get a very short list, very specific list. If you search by date, you have to be aware that date is in this format. So you always specify year, month, day, which is not the regular way that we write dates. And the structure that you can use is from to and you have to say date as a date, right? Because this is one of the keywords that you can search by or the key fields that you can search by. But if you want to search publications between 2001 and 2015, then this is the way how you write it down, right? So you say 2001, date publication, 2015 date publication. And this, if you ever have to do a literature review, using some of these advanced search options will be really handy. All right, so a little bit about citations because citations is the bread and butter of your scientific career. If you want to have a career in science, then citations are more or less the thing that help you stay in science. And, but there are other reasons as well why citations are there and why they are important. So a citation is a reference to a published or unpublished source. So you can also cite stuff which is not publicly available or which hasn't even been published yet. And that's the thing that a lot of people forget. If you are talking to another scientist and he tells you something, right, and that is not published yet, then you can still use that in your paper. You can still write a citation to this conversation that you had with a certain scientist or with a certain politician. And so, but citations are there to uphold intellectually honestly and avoid plagiarism, to attribute prior work and ideas to the correct sources and to allow the reader to determine independently whether the reference material supports the author's argumentation. And so, citation needed is one of these quotes where you see everywhere, but citations are the thing that make science, science. Without citations, science would not be science. It would just be a collection of stories. But because all of these things build on each other, and science is more or less like a big graph in a way, right, so every paper points to other papers which point to other papers again, which go back all the way to when we started using citations. And so, a citation index is an indexing service of citations between publications and citation indexes are really, really old. So the first citation index is the 12th century index which was the Hebrew religious literature. So this was a kind of handwritten, more or less database, right, because they didn't have computers yet. But in the Jewish religion or in the Hebrew religion, however you wanna call it, they have a lot of these rules. And these rules, they are based on other rules. And these rules are again based on other rules. So all of these things go back. So the religious rules in Jewish and Hebrew don't know how you wanna call it. But these all have, they are all building on each other. It's like a judge doing a verdict, and then this verdict is used by other judges again. And this is also where the origin of citations come from and why it is so important because you are building on previous work, right? And in the 18th century, you had the legal citation indexes, but the first real scientific citation service or citation index was founded in the 1960s, which is the Institute for Scientific Information. And then of course in 1979, the first automated citation indexing service was provided by Sitesier. And nowadays a lot of people use Google Scholar or Elsevier Scopus, but there are drawbacks to using these. And so citations go back a long time and their origin is in religious, more or less laws and rules, which was taken up by more or less the legal, kind of the legal department, which then made citation indexes of rulings that judges did so that you keep it fair so that someone doesn't get punished like for 10 years for stealing a horse and another guy gets the death penalty, right? You kind of build your sentence based on what people got before. All right, so there are four major citation indexing service currently available. So there's the Web of Science by Clarifat Analytics, which was previously owned by Thomas Reuter. You have Scopus by Elsevier, which is nice because it's available online. You have Sitesier, which is the oldest citation index, and you have Google Scholar, which is more or less the youngest citation index, but it's run by Google. So that's why it's relatively popular because the search functions are just amazing compared to the other ones. Web of Science is an online subscription-based scientific indexing service, right? It provides a comprehensive database. It searches across many databases, and it's very cross-disciplinary. So it's not just for biology. It's not just for like economy or other things. No, Web of Science has an index coverage from around 1900 to present day. The big issue that I always have with Web of Science, and especially since I'm working in Germany for some time now, is that titles of foreign language publications are translated into English, and you cannot find them in the original language. So the really nice paper that we wrote for the German Diabetic Journal, right, which is in German, you cannot find that by the German title in Web of Science. You have to translate the title into English and then search for that, but there's often a mismatch. So it's really hard to find non-English literature. But Web of Science itself is really good, and it consists of more or less seven different databases. So there's the Conference Proceeding Citation Index, which covers more than 160,000 conference titles, and this ranges from like the 1990s. So if you're interested in the Wannsee Conference, then of course that won't be in there. That's way too old. So then you have the Scientific Citation Index Expanded, which is 8,500 notable journals encompassing 150 disciplines, and this ranges back from like 1900s. You have the Social Sciences, which are 3,000 journals in Social Sciences, also 1900 to modern day. And then Arts and Humanities is relatively novel or relatively novel. It is from 1975. And that's of course Arts and Humanities in 200 major scientific and societal science journals. But it is built up of seven different databases. The database that is most important for chemists is the Index Chemicus, and that lists more than 2.6 million compounds with their inventors, which is of course if you're a chemist and you make a new substance, then you wanna know, am I the first one to make this? And can I write a nice publication about it? And this goes back to around 1993. And of course older compounds like penicillin and stuff you can also find, but before 1993, it's relatively sparse and relatively hard. But hey, you can kind of have a good idea there. There's also current chemical reactions, and that is from 1986. And some of the older archives have also been indexed, but chemical reactions also are something that are citable in literature. And you have of course the Book Citation Index, and it covers more than 60,000 selected books, and that is the newest addition because books in Web of Science are only going back to like 2005. Why am I talking so much about Web of Science? That is because if it is not in Web of Science, it either does not exist or it is not science. So if you are publishing in a journal, and this journal is not indexed by Web of Science, then officially this is not a peer-reviewed paper. It can be peer-reviewed, but it's not scientific because it's only science when it is in Web of Science. So that is something to remember, especially with a lot of these predatory journals which have a name which looks very much like well-known journals, but the key is here, if it is not in Web of Science, it is not science or it does not exist. So also when you do citations, make sure that the things that you are citing are in Web of Science, and don't build like your citations on just random journals that you find online. I told you that it's subscription-based, but you can get free access to the database. And the only thing that you have to do is use Researcher ID. So if you make an account on Researcher ID, then you can sneakily via Researcher ID search in the Web of Science database. So that's a tip for you guys, if you ever wanna look into the Web of Science and you don't wanna pay for it or you're working at a university that doesn't have a subscription, go to Researcher ID, just make a personal account there and that will allow you to search through the database. And officially it only would allow you to search through your database for your own papers, but you can also search for other people's papers. So it is a good kind of backdoor into the Web of Science database. All right, so this is my Web of Science information. This is already very, very old because last update was 2017. Of course, this slide doesn't really been updated that much. So you can see the number of articles that are in my publication list, the articles which have citation data, which have been cited at least once. Then you get the sum of time cited, which in this case would be 344. And then you have the average citation index per article and then my age index and then the last updated. So here you can see that on average, when I write a publication, then it is cited around 20 times, which is not bad in a way. And the average citation per article, of course, had older articles rack up more citations. That's just normal, but this is some information that you would normally give in a job application for a professor position or a postdoc position. All right, so citation indices are built based on journal level metrics, which is called the journal impact factor. And the ones that I just showed you, those are author level metrics. So I am an author, so I have author level metrics. And that is, for example, the number of average citations per article or my age index or my iten index. And I've been recording again for 57 minutes, so we will do a break and then we will continue talking about the journal impact factor and the drawbacks and the advantages of having those. All right, so let's stop the re...