 So welcome back, everyone. Literature management today. So if you're watching this on Twitch, thank you for still being here. If you're watching it on YouTube, then thank you for watching it on YouTube. Good, so literature management. Like today, I wanted to talk about a couple of things. I wanted to talk about citations. I want to talk about Web of Science, Google Scholar, Research Gate. I want to introduce you to different measurements, like an H index, an I index, an I10 index, just because they are not directly related to bioinformatics. But in a way, they are more meta. It relates to science and how we as a scientific community rate output of scientists. I also wanted to talk about scientific reference managers because I think that it's one of these topics that often gets overlooked. And it's very important to be able to write papers efficiently. And for example, if you plan to submit your paper to nature and nature rejects your paper, then you want to go to a different journal, because you do want to have your thing published. And then you have to redo all of the citations because every journal has their own citation style that they use. And by using a reference manager, you will be able to do this without a massive investment of time. So that's why I wanted to talk about it. And I wanted to talk a little bit more about version control. I introduced version control in lecture number one, I think, which was way, way too early. And like I told you guys before, I wanted to use it during the assignments. But in the end, I just did the assignments in the old way. But that's what I wanted to talk about today. So let's just start. So I think everyone more or less knows what a citation is. So a citation is a reference to a published or an unpublished source. But there are three reasons why we use them. Like we don't use citations just because they feel good. And we use them because we want to uphold intellectual honesty and avoid plagiarism. Like if I'm saying something and I'm making a statement in my paper saying that based on research by these people, they found this, right? Then of course, I need to, the reader of my paper needs to be able to then go to the citation that I have and then see if that is really the thing that these authors found, right? Question from Chad. I have a folder named Science. Yeah, I probably have a folder named Science as well. You mean as in where you store all of your publications or where you store all of your citations? Yeah, yeah. Well, we'll get to that. There's better ways of storing these kinds of data, right? But the citation there is for three very main reasons. So it's to uphold intellectual honesty. So if I write my introduction and I make a certain claim, then I need to add a citation so that people can verify the claim to allow the reader to determine independently whether the reference material supports the author's argument, right? Because I could write down anything and then just cite any paper, right? But in a way, I need to make sure that the statement that I'm making, so the thing that I'm using to build my hypothesis is actually a conclusion from the work that the other people did, right? So upholding intellectual honesty, attributing work and ideas to the correct people, and then allowing the reader to determine independently whether the material that I am referring to is actually supporting my argument. So those are the three reasons to do citations, why we have citations. And the idea of citations is very old. And citation indexes, so databases where we have papers and citations registered, are going back to the 12th century. And the first citation index is the Hebrew religious literature. Because when the Jews had their first diaspora across Europe, they had all kinds of different laws, right? And these laws were slightly different in different areas. So because like in law, you want to be fair and you want to be just, right? So you don't want to give someone a punishment for a crime that is different from someone else in a similar situation for a similar crime. You don't want to all of a sudden give like 10 times the amount of prison time, right? So the religious literature from the Hebrew, from the Jewish community, contains a lot of laws and rules and regulations. And these of course, every time a new person came along, had their own interpretation, then they would cite other people. So they would give credit saying that, no, this is not my idea, but it's the idea of this old rabbi that said this. And in the 18th century, there's actually the first real legal citation index. So that's when more or less across Europe and across the US, we standardized more or less all of our rulings. That means that when you committed a crime, you would go to a judge. The judge would sentence you and the sentence that you got with the crime that you committed would be registered so that the next time that someone else did more or less a very similar crime, they could look up what was given in the past for these crimes so that there could be kind of a uniform judging system. So in science, we've actually not been using citations that long. Well, we've been using citations much longer, but the first citation index or the first database holding all of these publications and the citations to other publications was more or less founded in the 1960s with the Institute for Scientific Information. And then in 1979, we have the first automatic indexing. So in 1960s, people would just say, okay, so I'm reading nature, there's a new, okay, so I'll put it in the database. But in 1979, we get things like automated citation indexes and the oldest one is CiteSeer. So CiteSeer is the oldest citation index in science more or less in the world. Nowadays, we have things like Google Scholar which is relatively new and Elsevier Scopus which is also a relatively new database which tracks citations. So there are major citation indexes which are used nowadays. So we have Web of Science which is owned by Clarifat Analytics, previously was owned by Thomas Reuters. We have Scopus by Elsevier, CiteSeer, like I told you guys more or less the oldest database and of course we have Google Scholar which I think everyone knows because if you search for scientific literature on Google instead of going to google.com, you go to scholar.google.com just so that you don't get all of the BuzzFeed articles being read, right? So that just, you don't, BuzzFeed is not science. But Web of Science is more or less one of the main ones. We have CiteSeer which is the oldest one, Scopus which is for Elsevier, so the Elsevier journals which is available online only and we have Google Scholar which is also online only. It has a citation button too, very useful. Google Scholar, no, that's not very useful. That's not very useful. We'll get to that, we'll get to that, but citation buttons and no, never, ever, ever type citations. It makes no sense, makes no sense, we'll get to that. There's an example of how to use a reference manager to do this properly and then you only do it once and then you never, once you use the reference manager you'll never want to use anything else, so. Good, so a little bit more background about Web of Science which I think is the biggest one is that it's an online subscription based scientific citation index, it provides a comprehensive search, it has access to multiple databases and it's cross-disciplinary, right? So it's not only for biology or for medical sciences, it's also for chemistry and physics. So Web of Science has like an index covering from 1900. So that's when more or less, when they started with Web of Science in 1981 I think, they manually went back in time and inputted like 81 years of data. So the biggest issue that Web of Science has is that titles of foreign language publications are translated into English and you cannot find them in the original language. So for example, one of my publications, which I was involved in, which is a German publication in a German article, I cannot find in Web of Science, I can find it in Web of Science, but I have to translate the title before I can find it. So it's about Fett Leibich guide in Berlin, Fett Maus, and then of course you have to do things like Fett Maus and there's no one-to-one translation. And of course, like if you write things in German, then the direct translation in English might not make that much sense, but that's the way that Web of Science works. So they kind of run the title through Google Translate, well, not Google Translate, but they just translate the title and then they put the translated title into the database and not the original one. So Web of Science itself consists of seven different databases. So the first database is the Conference Proceeding Citation Index, which follows conferences, right? There's a lot of scientific conferences out there. Generally when you do a talk at a conference, you submit your abstract there and this abstract gets put as one of these kind of supplementary materials to a real journal, right? So conference titles of talks are actually indexed in Web of Science as well because of course you can cite a talk. Hey, if you hear a talk by another scientist and he claims something during the talk, then that is citable. And the way that it's citable is that you can cite the conference title so you can cite the presentation that he gave. We have the Science Citation Index expanded, which is the 8,500 most renowned journals across 150 disciplines. So this is the 1900 to present day. We have the Social Sciences Citation Index, which is just the top 3,000 journals in the Social Sciences discipline. So that's not biology, that's social sciences. So different journals than generally we publish in as biologists. But if you are in Social Sciences, then Web of Science holds it as well. There's the Arts and Humanities Science Index, which of course like art and human sciences is also science. And this one is not that... It's not all the way back to 1900. It starts at around 1975. So articles published in humanities journals from the 1950s are generally not readily accessible in Web of Science. We have the Index Chemicus in Web of Science. And Index Chemicus is for people who do chemistry. If I come up with a new chemical, I can publish this chemical as well. I can write a paper about the chemical, how I synthesized it, and what kind of compound it is. So since 1993, every new chemical being synthesized is more or less deposited in the Index Chemicus. So 2.6 million different chemical compounds. So if you use a chemical compound, you can always figure out using Web of Science who first synthesized it or who is more or less the inventor of the compound. You don't really invent compounds, but you get what I mean. And then we have current chemical reactions, which is more or less very similar to the Index Chemicus, but this is reactions. So it's not a single compound that you synthesize. No, it's a reaction saying, well, if I put water and alcohol, then they mix. And then you get a mixture of water and alcohol. If I take gunpowder and I add fire, then an explosion is created. So chemical reactions. And then of course, we have also have the Book Citation Index. So there are more than 60,000 editorial selected books from around 2005 to currently inside of Web of Science. So why do I want to talk so much about Web of Science? And that is because in science, there's this rule. If it is not in Web of Science, it does not exist or it is not science. So if you ever want to publish in a journal, make sure that the journal is indexed by EC Web of Science. There's a lot of predatory journals out there, like the Hindalu Indian Journal of Advanced Experimental Medicine. Right? That's just something that I came up with, like the Romanian Journal of Vampirology or the Journal for Yeti Research, right? Or whatever big food kind of criteria you want to search. So there's a lot of journals out there which seem like scientific journals. But if they are not indexed in Web of Science, it does not exist. It is not science, right? So a Buzzfeed article that you read online is not indexed in Web of Science, so it's not science. It's subscription-based. You have to pay to get access of Web of Science. Fortunately, there's this sneaky backdoor that you can use to get access to Web of Science. And that is by just making an account on Researcher ID. And when you make an account on Researcher ID, you get the ability via your account there to search through Web of Science. They provide this to search through your own articles, but you can also search through other articles. So articles of other people, which is not the idea, but you can. So if you want to get an overview or you want to search if something is in Web of Science, then go to Researcher ID, make an account there, and then with this account, you can search through the whole Web of Science for free. It is for free, so. Sounds legit. What sounds legit? The Romanian Journal of Vampirology or the other one, yeah? Yeah. It's really, really fun. You can come up with journal titles, but they actually exist, right? Just search for predatory journals on Google and you get a whole list of journals that are, for example, science, written with a three at the end instead of an E. Journals like this exist. And the reason why these journals exist is just to make money, right? Just to pretend that they're a big, international, well-known journal, but they're not. They just want you to pay them money to publish your articles. So remember, if it's not in Web of Science, it does not exist or it is not science. Good, so I wanted to show you my Web of Science citation info just so that you have an idea of how it looks. So in total, I have 51 publications in Web of Science. I've written more publications, but some of these are in journals that are not indexed in Web of Science. Some of them are, for example, on blogs or these kinds of things, so they don't get indexed. But if I would look at Google Scholar, right? Then Google Scholar will say you have 75 publications. For example, oh no, my thesis nowadays is indexed in Web of Science, but well, you can get a researcher ID, search for my name, see which are on researcher ID, so in Web of Science, and then you can look at my Google Scholar and see the difference. So according to Web of Science, I have 17 articles with citation data that doesn't seem right. That doesn't seem right. I think this is a mistake when I looked it up. I think it's like 47 instead of 71, but I had to type it in because I couldn't copy-paste it. Some of the times cited, according to Web of Science, I've insided 944 times and we will get to Google Scholar and then you can see the difference between it. So this is 944 publications that are also in Web of Science are citing my work. So if someone writes a master thesis, they could cite one of my papers as well. Google Scholar counts this as a citation. Web of Science doesn't because master thesis and bachelor thesis generally don't get indexed in Web of Science. So for each article that I publish, I get like 18 and a half citation, MIH index is 13, and they also provide this nice map where you have been cited. And of course these numbers are much bigger than the 944 because every author on the author list gets attributed. So if a paper which has 50 authors cites me, then of course it gets 50 numbers added to this one because one of the authors might be living in Northern Europe and one of them might be living in North America and then both numbers go up by one. But it's kind of an idea to see where most of your citations come from. So most of my citations come from like Europe. So Northern Europe, Northern America, a couple of people in South America cite me, some people in South Africa, some people in China, but not a lot from Oceania, for example. All right, so we have these citation indices, right? So why do we have them? Well, we have them because we want to judge scientific output. We just want to judge scientific output not just for authors, but also for journals, right? Because a journal can be a good journal because a lot of scientists are referring to that journal. But also for an author, it is very important that you as a scientist, hey, you have this publish or perish paradigm in science. So if you don't publish, then you're not a good scientist. That's the idea behind it, right? So there's two types. So you have journal-level metrics. So citation scores for journals, right? So nature has a very high citation score while, for example, the Romanian Journal of Vampirology does not. And then we have author-level metrics. So things like the average citations per article, like I just showed you guys, 18 and a half. We have the H-index. My H-index is actually 13, and we have the I-10 index. So, and I will talk more about this. But I first want to talk about the journal-level metrics, right? So the journal-level metrics are generally kind of combined in a way in that they say this is the journal impact factor. So how much or how much scientific weight a certain journal has, right? So the journal impact factor of an academic journal is a measurement reflecting the yearly average number of citations to recent publications, right? So the number of citations received in a year by articles published in that journal during the two preceding year, divided by the total number of articles published in the journal during the two years preceding, right? Because you have to normalize for the total amount of articles that they publish, right? So this means that if you start a new journal, then you have to wait two years to get an official Web of Science EC impact factor because the number of citations that you receive in year number three is the answer to citations in year number three to articles which you published in year number one and two give you the first impact factor that you have. So it's more or less just an average, right? So if people say to you that there's a journal impact factor of 30, that means that on average, all the publications that were published in the last two years are cited 30 times on average, right? There's a huge range in that could go from zero to like 500, but on average, every article published in the journal gets 30 citations. And of course, that is much higher than a journal impact factor of five where every article published only gets five citations. So there is a lot of controversy about this journal impact factor. And the main consensus within the scientific communities is that the journal impact factor is not really a reliable instrument to compare journals. And this is because if you make a list on the internet, people will try to get on top of that list. The same thing holds for journals. Although journals are not people, but they're groups of people, they will try to game the system, right? They will try to make themselves look better than they really are because the better your journal impact factor, the more money you can charge for articles being published there, right? If I submit a paper to nature or to one of the big nature groups, then I'm paying like $10,000 just to get my article published. Well, if I would submit it to, for example, the International Journal of Obesity, which has a much lower impact factor, still a very good impact factor for obesity research, you would pay around like $2,000. And if I would submit to a journal with an impact factor of like one or two, then in generally I pay much less, right? So because they make money out of this, journals will try to kind of poof themselves up, make themselves look more important than they really are. So journals may publish a large percentage of review articles, which are generally more cited than research reports, right? If you do original research, then of course only people that rely on your research will cite this paper. But if you write an article, which just describes all of the stuff published in the last 15 years about DNA sequencing, right? Then you're kind of summarizing hundreds and hundreds of these original reports into one paper. So these papers generally get much more cited than the original work. So big journals tend to like or tend to favor these review articles. It also means that a journal is very likely to decline an article that you send in if they think that it will not be cited that much, right? So if I am working on an animal and this animal is not really that researched, right? Then nature will not take my publications, just because they think, well, this guy is working on some little bug which lives in the tropics, which no one researches. So no one will cite his work. It might be very, I might have done like top scientific research and all of the results and conclusions might be rock solid, nature will never take it. So when you read things like nature and science, then you see articles which are based on hot topics like things that nature and science like to publish these things about chocolate. Why? Because everyone loves chocolate. It's about coffee because everyone loves coffee, right? So these things tend to end up, although the science might not be as good since it is popular and a lot of people are interested in chocolate, they will publish it more readily than someone working on this antioxidant which actually is proven to be effective for X, right? But if I write a paper saying that eating chocolate makes you immortal, then nature says, wow, we want that. Why? Because it's like one of these, even though the finding might not be that solid as I claim it to be, right? But if I claim that, then nature will say, oh, that's something people will cite. Not just because it's true, even if things are untrue, they will like it more because if I claim like chocolate makes you immortal and 500 other scientists say, well, this guy is wrong and we're going to write a paper saying that this guy is wrong, then they still cite the original article, right? So then there's a big decline of publishing articles that are unlikely to be cited. One of these other things is that what journals do is that I submit my paper, right? I submit my paper in March. Then, hey, it goes into review. This takes a couple of months. So in July, everything is fine. So all of the reviewers say this is a good paper. You should publish it as very high impact will probably change the field. Then nature will say, yeah, but if you publish this in July, right, then it only has like one and a half years to rack up citations. Let's wait. Let's publish this article first of January. So six months long, nothing can, they will keep this paper hidden. Why? Because of the fact that the impact factor is calculated over the last two whole calendar years, which means that it starts January 1st, right? So if a paper is published halfway through the year, it only has half a year to contribute to the scientific or to the impact factor of the journal. So a lot of like high impact publications are being more or less hidden for time when they say, ooh, now we can publish it. And that's generally January, February, because then they still have the rest of the year to get their initial citations. Then there's also one of these practices where you write your article, right? And you claim something. So you're saying that based on this scientific work done in International Journal of Obesity, they found this. And then you submit your paper to science or nature, right? I use them because they're really high impact journals and everyone knows them. It's not that I'm saying that they do bad work. That's not definitely not. Like I'm also writing a paper at the moment for Nature Genetics, so I'm not saying that they are doing this, but it happens, especially in the higher impact journals. So I'm writing stuff like saying blah, blah, blah, and I cite a relatively unknown journal for this, right? Because the original authors published in a relatively small journal or a journal with a relatively low impact factor. Then the editor might come to me and say, you know this citation that you use. Why don't you cite this nice review paper that we published a couple of months ago, right? Because that helps them. Because now you're not citing this little unknown journal. You're citing nature or science. So as an exercise for you guys, read a couple of nature and science papers in the coming week and look at the citations. Look who they cite. Then you can definitely see that the higher the journal, the higher impact factor of the journal, the more citations go to the same journal, kind of self-citations of the journal. And this is because an editor forces the author to add these citations. So they're saying if you don't cite our journal, we are not going to publish your article and you can go to one of these lower impact factor journals. And that is of course very bad because that corrupts the scientific process. Good. So that's what I wanted to say about journal impact factor. And it's a difficult field, right? No one really likes to talk about it in science, but I think that it's very important that you as an author need to be aware that you can do very, very good scientific work. Your findings can be very relevant to the field and still your article can be rejected by a whole bunch of higher impact factor journals because of these reasons, because it's the wrong time of year and publishing it now will only have it count like one and a half years, right? If you, a paper published in October only has three months and one year to count for the impact factor of the journal. So they will generally want to wait for January. Good. I will take a short break. So for, no, not if you're watching this on YouTube. I'm going to cut this and edit this out. So I'm going to stop the recording and take a little break guys and I will be back very soon. Let me actually look. My moderator put something in the chat or in the link in chat. Right, potential predatory scholarly open access publishers. So here you see a whole bunch of really interesting ones. So that's an interesting list. I never looked at that list but there's bound to be a whole bunch of predatory journals and we'll get back to predatory journals as well but let's take a break now and let it sink in and then we will talk about author level metrics and how these affect us as scientists. So I'll be back in 10 minutes, something like that and you will get animated GIFs for the meantime but I also want to talk about author level metrics because as an author it's important that you kind of look good in the scientific community, right? So one of the things that is used a lot is the average citations for article. Why? Because it's very simple to calculate. Of course when we talk about averages the problem with averages is that if you have like 100 papers and 99 of them have no citations and one of them has 10,000 citations then the average of course will look pretty good but the average is very sensitive to outliers. So to prevent that, George Hirsch published a paper in 2005 in PNAS and that introduced something which is called the H-index. So the H-index is an attempt to measure both the productivity and the citation impact of publications of a single scientist, right? So how kind of does it work? Well you have your papers, right? And each paper has a citation. So what you do is you order the paper with the highest citation first then the second highest, third highest and so forth. And then the H-index is defined as the point where this crosses, right? So in this case the H-index would be 1, 2, 3, 4, 5, right? Because there are five articles with more than five citations, right? So at the point where the number of citations is the number of papers, that is the H-index. And that just is a more fair way because you're not just taking the average so a single outlier can't really skew the whole thing, right? If you would have 99 publications with no citations and one publication with 10,000 citations then your H-index would still be 1, right? Because that's the point where the number of citations is the number of papers is the H-index. So just ranking all of your papers based on the number of citations and then you look to see where is the kind of threshold, right? So if you have 10 papers which as the first one is cited 20 times then the next one is cited five times and then the next one is cited three times then at that point your H-index will be three because the next one will be two citations, right? Because you ordered them from high to low. So really nice interesting measurement and to compensate for that we also have kind of the I-10 index, right? Because the H-index of course depends very much which field you are in, right? If you're publishing in a very small field of mathematics then of course it's much harder to get a high H-index than when you are publishing in a massive field as for example cancer research. So in Google Scholar they use the I-10 index which is the number of publications with at least 10 citations, right? So for Google they decided well if a publication is cited 10 times then it's probably a relatively okay publication and it's relatively simple and straightforward to calculate because you don't have to know all of the publications of an author. You can just take only the ones which have 10 or more citations and then it's simple and straightforward to calculate. The drawback of the I-10 index is that it's only used by Google Scholar. So a little bit about Google Scholar because I wanted to talk about different citation indexes. We already did the EC Web of Science is that it's one of these nice things because it uses the Google search engine. It's really good at searching for literature because it will really find what you're looking for. You can explore related work, different citations. You can click on the authors. They have author profiles. You can also upload the complete publications in your library. So if you make an account on Google Scholar and you publish an article then you can actually upload the text of the article if you're allowed to do that, right? If the article is open access you can actually upload them to Google Scholar and then people can directly read the article from one place. They don't have to go to external websites. No, they can just stay in Google Scholar. Hey, it keeps up with recent development in any area. Hey, you can see who is citing your publications and also see who's citing whom. And the nice thing is, is that you can create a public author profile. So let me guys just show you my profile. I don't know if I put in a slide for that. Might be a slide for that as well but just so that you can see how it works. So we just go to scholar.google.com and then we go to my library, right? Not my library, I want to go to my profile. Right, so here you see my Google Scholar profile. This is me, a very old photo here so I should update that. But then you can see that the best article that I wrote or the article that I was involved in is this article which has been cited 339 times was published in 2011. If I click on it, I can see all of the different authors. I can see the description or more or less the abstract. I can see where it was published and I can also see how it's being cited over time. And so you can see that last year we racked up 21 citations of this paper but in 2015 we had almost doubled. Right, so you can go and look at the whole list of publications. I have a list of co-authors, so people who I generally or have published within the past. It also gives you this overview of how many articles you have published with an open access license and how many are not available according to Google. And then you can fix this because if you published in an open access journal then of course the articles should be available as well. So relatively good tool. Hey, you also have a follow button so you can follow scientists that you like so that you can get updates when they publish new articles. So pretty fun, pretty useful. But the problem with Google Scholar is that it includes a lot of dirty data, right? I told you guys if it's not in Web of Science it's not science. But Google Scholar indexes everything and Google Scholar profiles include a lot of dirty data and of course they may not last. And that's one of these issues with Google is Google is a company that tries a lot of things and if it doesn't make them enough money then at a certain point it will just say, well, like it's gone, right? Remember Google Wave, like the Google Wave or Google Circles, the whole social media platform that Google had. If it doesn't make them enough money they will just pull the plug and it will be gone from one day to the other one. So that's one of these reasons why we shouldn't rely on Google Scholar always being there. And it only measures a narrow kind of Scholar, the impact, right? It only measures the citations and that's it. And there's this really nice article, sorry, there's this really nice article which was published in 2013 which a couple of authors actually explained how they became the most cited or the highest cited authors on Google Scholar. So manipulating Google Scholar citations and Google Scholar metrics, simple, easy and tempting. So if you have your own website you can just upload PDFs on there with nothing but citations to yourself. And Google will see your PDF and think, oh, this PDF is a scientific publication. So, and it will just add publications to all of them, right? I showed you guys my impact according to Web of Science, right? So 944 citations. If we go to my Google Scholar page it actually says that I've racked up 1,626 citations. So there's around 700 citations to articles that I wrote which Google Scholar thinks are real reliable citations which Web of Science says, no. These were published in other journals that we don't index so it's not science, right? So there's a big difference between Google Scholar and Web of Science. So, and Google Scholar is really easy to trick. If you wanna become the most cited author on Google Scholar it will take you an afternoon of work just generating PDFs with citations to yourself. You just upload it to your own website or you put it on Facebook even. Google will see the PDF, will assume it's a scientific publication and just add the citations to your profile. Although they are working on cleaning up their data but it's not a priority for Google, right? They're a search engine so they don't really care. So, all right. Then one of the other things that I wanted to talk about which is getting traction or not so much getting traction but it used to be very unknown but nowadays it's more or less the place to go to is ResearchGate. So ResearchGate is more or less a social networking site for scientists and researchers. It was founded in 2008 by Iliad Matik Madesk and it has very interesting features, right? You can upload things like papers but also data, book chapters, you can upload negative results, patents, research proposal, methods, presentation, software source code. Even if you did a poster presentation at a conference they will allow you to upload the poster and the nice thing about ResearchGate is that you have this ask and answer forum, right? It's really a social networking site. So you can find people that you work with but you can also ask questions on open forum saying that, well, I'm trying to do this thing and I have no idea how and then other scientists can more or less respond to that. And ResearchGate uses its own kind of score so every scientist on the platform is assigned a ResearchGate score which is not a citation score but it's kind of a weighted score between all of the things that you do on the platform first and combined with all of the things that you published. So there's a lot of criticism about ResearchGate. I do love the platform itself. It's a really nice tool to get answers to things that you're working on from people in the field but especially in the beginning there was a lot of unsolicited email of invitations of co-authors of a user. So I would join ResearchGate and then ResearchGate would send emails in my name to the people that I published with. And the people that I published with would think, oh, then he's sending me an email but actually it was ResearchGate just sending them an email to sign up with their website. So that was a big controversy especially in the beginning. They don't do that anymore, I think. The big issue with the ResearchGate score is that no one knows exactly how it's calculated. I can show you guys my ResearchGate. Let me show you the Firefox window. So this is my ResearchGate profile. So let's go to my profile. And according to ResearchGate, I have a score of 35.15. No idea how they calculate that. Right, I can go to scores and then I can see that most of my score comes from my publications. But I also asked questions on the platform in the past which again contributes to my ResearchGate score. I give some answers which are voted up or down by other people and based on the people that follow me on ResearchGate, I also get points added to my ResearchGate score which of course has nothing to do with my scientific output. Like the amount of people following me on a website should not have an impact on how good of a scientist that I am. But hey, it compares you to all of the other people on the profile or on the platform. So apparently I'm doing pretty well because like only 7.5% of people have a better score than me while 92% of people have a worse score. It does mention my age index and my age index. So the standard one and then excluding self citations. Best cited paper. And then you can boost your score by adding texts that are missing and by getting more followers or asking more questions or answering more questions. So all of these things which have nothing really to do with science but they do influence your ResearchGate score. So the big issue here is that the score that they give you is completely not transparent and you cannot reproduce or you don't even know what goes into that. There's a whole bunch of these profiles on ResearchGate that are actually not owned by people. They are created by ResearchGate because these people published in the past. Right? So you might think, oh, there's a scientist X is on ResearchGate. Let's send them a message and you never get a message back. And that's just because this author never made a profile there and this profile is just a manual profile. And there's heavy criticism that they are failing to provide safeguards against the dark side of academic writing. So the dark side of academic writing means that a lot of fake publishers are advertised on their platform. There's a lot of these ghost journals. So journals that don't really exist. And they allow publishers with predatory publication fees on their platform and there's fake impact ratings. But that just holds with all platforms. Right? As soon as you have a list on the internet where you rank people or groups of people will start trying to get to the top of the list no matter what. So it's an interesting website. I have a ResearchGate profile, of course, but it's something that you should be careful about. Like don't judge people by their ResearchGate score because someone could have a score of like 50 and never published before because they just ask a lot of questions and they answer a lot of questions and they have a bunch of followers on the platform. But of course that doesn't make them good scientists. Good. So that's kind of what I wanted to say about references and citations and these kinds of things. So how do we now do our own citations? Right? So scientific reference managers are the way to do citations. Misha, pay attention because I'm going to explain this and you will never, ever, ever write your own citations anymore. So a scientific reference manager has four goals. It manages publications, reviews. It allows you to bookmark and put notes on articles. Right? So it's there to collect, reading and integrate these references into a manuscript is a time consuming process, right? Because if you write an article, then keeping track of all your citations manually is work that you don't want to do and it's time wasted. Every time that you type the name of an author in your publication, you are wasting time. You should not do that, especially because every journal has a different citation style or almost every journal. So scientific reference managers do the rescue, right? Because you don't want your references to look like this, do it manually and have to organize everything yourself. You want to have it nice and clean. So you want to have citations, right? So the citations always come with unique identifiers, right? If I want to cite a book, I write down the ESPN number of the book because this number, the ESPN number of a book uniquely identifies the book that I'm trying to cite. If I'm wanting to cite things like specific volumes, article or other identifiable parts of a periodical, right, so something which periodically, then it gets a serial item contribution identifier, so a Siki number. Nowadays, almost all things published in scientific journals get a digital object identifier, a doi. But not just articles, also source code can get a doi. Also data can get a doi. And using this, you can cite data, for example. So you can say, I use data from blah, blah, blah at all, and then you cite it and then you just write down the doi. And then everyone who has the doi can find the citation or can find the source where the data is stored. And biomedical research articles get a PubMed identifier, so if you're in the biomedical field, then generally all of your articles get a PubMed ID as well as a doi. So a reference manager supports researchers in performing three basic steps in research. They help you search for relevant literature. They store the literature that you are interested in. Generally you have like a library that you can compose yourself. So hey, it allows you to store papers and a bibliographic metadata for later retrieval. And it allows you to write text with citations. So it allows you to insert citations and references in a chosen citation style when writing a text. And it allows you to quickly edit them as well and change them. So one of the things or one of these citation managers is EndNote. So EndNote is a commercial reference manager. You have to pay for it or your university has to pay for it. You can use it on Windows and Mac. It's not available for Linux. And the way that EndNote works is that it groups references into libraries. And these libraries have an ENL extension and each ENL library. So for example fishresearch.enl has a fishresearch.data folder. And then references can be added manually or you can export them from the web or you can import them or you can copy them from another EndNote library. And the nice thing is that nowadays you also have this cloud service from EndNote which is called EndNote web which is a web-placed implementation and it offers integration with the web of science or web of knowledge. So besides EndNote you have Mendeley. Mendeley is a free software. And it's also an academic social network very similar to ResearchGate but slightly different. The nice thing about Mendeley is that it's available for Windows, Mac and Linux. So if you're a Linux user then Mendeley is more or less the only option that you have. It allows you to backup and synchronize across multiple computers and hey you have an online account. So if you're somewhere where you don't have your computer with you but you still want to know which note you put on a certain paper, right? So you can highlight things and you can make notes and say oh this is important then you can also look that up when you're in a library somewhere. It has a PDF viewer integrated where you can do sticky notes, text highlighting, full screen search and there's an app if you want to use it for your phone or your iPad. And it tracks things like user-based readers strip statistics about papers, authors and publications. So you can see how many people viewed your article in the last year and how many people cited it and these kinds of things. So very very useful tool. So I wanted to give you guys a quick example of Mendeley on how you can use it in day-to-day research, right? So the first thing that you have to do of course when you want to use Mendeley is you go to Mendeley so you just type it in Google and you create an account and then you download the software because it's just a software tool for your phone or for your computer. So that's step number one, right? When you start the software you have to log into your account and then there's this plugin that you can install which then integrates with Microsoft Word or other like it also integrates I think with LibreOffice or OpenOffice and there's this option to add a bookmark into your browser and when you click the bookmark it will make a citation of the article that you're currently looking at which is very useful if you're reading stuff online and you think oh I want to save this quickly in my library because I might cite it later. You just click the bookmark and it will make a note in your library. So how does this work? Well after you've installed Mendeley you start it and then you log in and then this is the screen that you see, right? So if you have nothing if you've never used it before I've used it before so I have different folders right? I have for example a say Elegans folder for say Elegans research I have a chaos folder for chaos research historical for like older publications like the stuff from Mendel but also publications from like the 60s to 70s and 80s and that I sometimes use and I have another a couple ones like Howe Berlin Mouse and Howe Berlin Pig for the mouse and the pig word but the most important feature about Mendeley is the search feature and so when you click literature search then you can you can search right? For example you can search for cow genetics and then you get a list of papers which more or less matches what you're looking for right? So and we can then simply select the paper you can double click it and it will put it into your library very easy for example if we click the whole and so this is the whole genome assembly of domestic cows so that's the first build of the cow genome which was published in 2009 and if we might want to cite that if you're working on cow genetics yeah so if you double click it then you can say save reference and it also shows you the abstract of the paper right so that you can read and see if it's really what you mean yeah we can also you can also click this button to see if there's a PDF available to read the whole PDF so how do we now use it right? So for example imagine that I typed the sentence we used the whole genome assembly of boss daughters cows right so the paper that we just found right then in word when I install the plug in I get this references tab and I can just click insert citation right so I can I can go to where I want to have the citation I click insert citation and then a little window opens up and then this window allows me to search through my own library not through all of the literature but just through my own library right so I say for example cow genome and then I see okay so there's two which kind of match so I want to cite this one right so I click it and then I say okay then what happens is is that it inserts the citation as a special field which I cannot edit directly and this format is based on the format that I want my citation to be in right so I now have my my sentence I want to cite these people so I'm just putting in this special field um and then have I can then add the bibliography so I select the journal style which style do I want to have for example American psychology or BMC bioinformatics or nature genetics so I just from the drop down list select the style that I want and then I say insert bibliography and then it will insert the whole reference for you in the style of the journal nothing else needs to be done so it's easy to select different styles of citation hey you can references and citations are updated automatically so if I go from the citation style that we just had which was the American psychology journal and we want to cite it in the way that genome biology wants me to cite it I can just click the drop down button select genome biology you see that genome biology doesn't use the structure Zim et al 2009 no it just uses numbers and then here you see the reference the way that it that it should be as if I was that the way that genome biology wants right so if I if I have my paper written using a citation manager I submit it to nature genetics nature genetic says we reject your paper please send it to plus genetics and I send it to plus genetics the only thing that I have to do to update all of my citations is just select a different citation style from my drop down list and that's it it will reformat everything automatically in the way that it needs to be all right so hey of course when you click the citation style there are a couple of styles which are predetermined but you can just click more styles and then there's literally like I think almost 1500 different journals that are there the style of 1500 journals is included in in Mandalay for example good so that's how you do citations very easy never write it never copy never paste never make mistakes or spelling errors in the author's names no just use Mandalay or just use end note and there's a couple of other citation managers out there like for me it doesn't matter which one you use as long as you use one because if you don't use one I as a as a as a postdoc right for a lot of our PhD students I read their papers I I if you don't use a citation manager I am not going to read your paper because it's too much work because if I move a paragraph to a different piece and it uses a one two three numbering style I have to go through the whole paper and adjust all of the numbers and then I have to go through the citations and manually adjust them one by one and I'm not crazy that just costs way too many time that costs way too much time and I'm not going to do that so use a citation manager to do that for you good so that's what I wanted to say about citation managers I I like Mandalay a lot but there's a lot of people that use end note and there's a couple of other ones I think Zapatero and and and a couple of other but when you write an article always make sure that you use a citation a reference manager for your citations good so final part because we talked about version control right so version control is something that especially relates to software so source code and not so much to papers or publications so version control comes into two different flavors it comes into a centralized version control flavor which is one of the ways that you can do version control where you have a central repository you have different people working on different computers and everyone gets a copy of the repository and have when when they when they change something they have to commit their changes to the repository and then once if they want to get changes from other people they have to update we also have something which is distributed version control like it right svn is a is a centralized version control while hit is a distributed version control which means that you there is a central repository there doesn't have to be but generally there is a central server which has the whole repository and when you actually make a a when you when you clone or when you pull the repository you actually get the whole repository on your local computer so in theory in the biggest difference between centralized and distributed version control is the fact that if you're using centralized version control you only have a working copy of the current code while if you have a distributed version control system you have the whole repository and a working copy so hey your computer more or less functions as the server for you and then you can have a remote server so why do you want to use version control well there's two very good reasons or actually four but the the main reason is that if you have multiple people working on a single project you need to use version control because otherwise projects start running parallel to each other right you can't email source code from yourself to a collaborator and then assume that all of the changes that they did in the time that you did updates they need to integrate right and version control allows you to work together on big projects like the linux kernel with hundreds of people at the same time version control is also very useful when you are the only one working on a project for example i use version control even for my private projects and why is that because i use multiple computers i have a computer at work i have a computer at home i have a laptop and all of these i want to have the same version of the code so if i make changes at home then i want these changes to also be available when i when i come to work right and i also want them to be available when i'm working in a library and i'm using a library computer right so version control also allows single persons to work on multiple computers so the nice thing about version control is that it integrates work done simultaneously by different team members so if if i am working on file a and my friend is working on file b and someone else is working on code file c then the version control system have when we when we put our changes into the repository it will merge all of these changes as long as there are no conflicts right if i'm not doing something which directly if i'm touching line number 117 and my friend also changes line 117 then of course it cannot automatically resolve this but when when i'm working on line 10 and had the other guys working on line 100 then automatically it knows how to integrate the changes with each other version control gives you an access to historical versions of your project and this is especially important in science because in science we're looking for reproducible research that means that i have to be able to go back in time sometimes right sometimes i need to do the analysis as if it were 2018 because i want to redo a paper which was published in 2018 right so and of course code changes if you look at many different big software packages for the analysis of for example dna sequencing data that these packages are updated like every two or three months but to redo an analysis which i did last year that means that i have to go back in time and have the software as if it were that time of year that i did the original analysis right and that you can easily do using version control it allows you to go back and forth in the history and and run the code as if it were 2012 so some terminology because a repository is a database containing all of the changes right so the repository is the database of edits and different versions a working copy is the personal copy that you have of all of the files and a lot of people call this a checkout of the repository so it's the current state of a repository right it's just the code as as the way that it looks now but not with the change or not not the ability to go back and forth but the repository is a database so it can go back and forth in time because we can say database go five steps back or five commits back so a commit is a collection of edits on a working copy right so i i from a repository i get a working copy then i make changes and then these changes together are called a commit which i can then give to the repository and the repository will look through my commit and apply these changes to the current version of the database and then we have update or pool and that is a collection of edits which come from the repository which are which are changes made either by other people or by yourself on a different computer which you which you committed to the repository but which are not yet on your local computer so like i told you there's two versions centralized and distributed so in centralized version control there's only one repository um subversion svn is one of these examples of a centralized version control nowadays almost everyone uses distributed version control there's still a lot of people that use um svn especially in companies in companies svn because then there's only one company and everyone works they just have one server where the database is and every every person working at the company just makes changes there but in in in um academic software production or academic software development and generally use a distributed system um because people are from all over the world and you don't have a single server um where everyone has access to so hey it is the distributed parts make it more modern it runs a lot faster because the repository is on your local computer and not on some server miles and miles away um and it is less prone to errors and that is because everyone has a copy of the repository so if if in centralized version control if you are able to corrupt the server right send in a commit um which kind of breaks the code then the code will break for everyone but in a in a distributed system you're only breaking your own local repository right and not not the upstream repository of everyone else um but distributed means that it's somewhat more complex to understand because there are multiple repositories right you have your own repository on computer a your own repository on your home computer and then some guy that you're working with also has a version of his repository right so there's it's it it creates it creates a bigger chance of conflicts because everyone can work independently of each other distributed also has the nice thing because your repository is on your own local machine you can still work on code while you're in an airplane and that is very hard when you use a centralized version control because then if you're in an airplane you don't have access to the server so you can't push your changes you can do that once you have internet again but in a distributed system you can just make commits and update the current version so if you're eight hours in an airplane using distributed you could make like 10 different commits while centralized every all of the changes that you do during your eight hour flight are bundled into a single big commit which is then pushed to the central repository but that's of course has a higher chance of then creating conflicts because you touch more files so centralized version control we already saw the picture one central repository each user gets his own working copy as soon as a user commits others can see your changes so when I make a commit the repository gets updated everyone sees that the repository was updated that means that when you commit they have to update right because they had their working copy needs to be in sync with the repository so when I do a commit everyone else has to update and cannot do a commit at that point they have to stash their changes do an update and then apply their changes again and then commit to the repository as well which works well in relatively smaller teams and when you have a server locally then it's not the biggest deal in the world but distributed version control works in a different way because each user has their own repository and working copy so after a user commits right so I make changes to my working copy I make a commit and I push it to my local repository then nothing no one sees the changes only when I push my changes from my local repository to the remote repository do other people see that there are changes but they don't have to update at that point they can they can wait because they have their own repository so their own changes and their own commits are still perfectly in sync so when you update you do not get other changes unless you have first pulled in those changes into your repository so you're more independent in a way because you're you're working with your own little server which is on your own little on your own computer and if you want to get changes from other people you choose and you plan the time and you say well Friday 4 p.m. I'm going to pull in all the changes that all the other people did last week and then I'm going to fix the errors or the I'm going to fix the conflicts that are created Friday afternoon or I do this Monday morning while with SVN you have to do it at the moment that the other people pushed their changes right so you write a commit you push your commit to the central repository then they can continue updating and doing all these kinds of things until they decide I want to have your changes then they pull in your changes and then they have to update their working copy with your changes so it's it's an additional step that people have to take so a little note about distributed computer systems is commits and updates only are local move changes between a working copy and a local repository right so nothing happens to the outside world and so you do not affect any of the other repositories or any of the other people's code that they're working on only push and pull commands move changes between the local repository and the central repository so they do not affect your current working copy and this is a really nice system right because you split kind of the the the making of commits from pushing the commits or publishing the commits to other people so it's a it it provides with an extra tier a note because if people use git and that's the version control system that I had you guys use in the first lecture is that a git pool command actually is a pool and an update so if you say git pool it pools the new or it updates your repository right so it it pools in the repository from the server to your local machine but in git it also directly updates your working copy which might have which kind of breaks this nice separation of making changes and publishing changes which is a little bit of a shame so version control lets multiple users simultaneously edit their own copies of the data of the project so for each line right if you if if so if a version control system goes and gets a commit right so it checks each line of the commit and there can there can be changes right so for each line the new line is the original line if neither user edited right if there's two people working on the same file then there's a possibility that no one touched line 100 if I touched line 100 but my friend did not then of course there's no conflict because the new line that I created is then is the is the newest line right so my friend just has to live with the fact that I updated line 100 however if I update line 100 and my friend also update line 100 and then then we have a conflict so a conflict occurs when two different users like simultaneous right and simultaneous here can be a very broad thing especially in distributed route because I can I can work and make local changes to my local repository for a week and only publish my results or push my results on Friday right so then the whole week I every time that I touch line 100 then I can change it for myself but as soon as my friend also touched line 100 within that period then you get one of these conflicts and of course we need manual intervention to resolve conflicts because a version control cannot figure out if my line 100 is the correct one or if the one for my friend is the correct one so merging changes if you have a centralized system the update changes the working copy by applying any edits that appear in the repository but have not yet been applied to the working copy so in a centralized version control system you can update at any moment even if you have locally uncommitted changes and this may force you to resolve conflicts right so as soon as someone pushes a change to the server then I can update but as if I update I have to fix any conflicts that are now between my working copy and the new working copy that I get from the server in a distributed system it is different it's harder because in a distributed version control system if you have uncommitted changes then you cannot run update right so if I have changes in my working copy then I cannot update because I first need to commit these changes I first need to make a change relative to my working copy relative to my repository and only then can I update using the remote repository right so before you are allowed to update you must commit any changes after this I can run update which then creates conflicts and then I have to merge the two sets of edits and then make a commit saying that my line 100 is the one not my friends or my friend's line 100 is the one that I want but not mine and this again is a commit right so this is again a change so hey you have to merge the sets of edits and then commit the result when there are conflicts so how does this work in a way right so here we go with a typical workflow so the first first part the first two things is only once right so I have to get the remote repository and I have to make a local repository so this is called a clone operation so I can say hit clone right and then clone this repository and then I can go into it right but this this folder on my hard drive called Dano in this case which is my web server from now on I can work in my own copy and the best thing to do is to first get the changes done by others right so I do this once and then five days later I come back to my computer and I say okay so now I start working and I want to add a new feature right so the first thing that I do is do a hit pull get the changes that were done in the meantime because my my current repository is now five days old other people might have committed so I have to pull in the changes first that other people do then I start working and I repeat the following step right so I make local changes for example I add a feature so I add a new file or I change a file right and then I first examine my changes so I say hit status right then I see oh there is one file which has changed then I can do a hit diff and then I see what changed in the file right so it just shows me the difference so in green I see lines which are added and in red I see lines which have been deleted or parts of lines which have been added or deleted then I can add the files that I want to commit so I say hit add file one dot r for example if the file one is the one that that shows any changes then I have to create a meaningful commit message saying that hit commit minus m I added feature x right then I update my version so I update my version which changes pushed by others in the time that I was working right because between the fact that I make my changes and create a commit again this might be half of a working day I might have been working for four hours other people in the US might have been working as well in these four hours so I'm not sure that my current repositories up to date so I do get pool right this can create conflicts so at this point I have to fix my conflicts if there are no conflicts I can just push my changes to the to the distributed to the server right so from my local machine to the remote machine and then I start all over again so I make some local changes I do a status I do a diff I add the file I commit the message I pull the changes from other people and then I push my changes and you just go through this cycle over and over and over again every time adding a new feature and this is how version control so some best practices for version control is use descriptive commit messages right that the commit message goes into the history and is searchable so don't say bug fix right say something like this commit fixes the issue where uploading a file would crash the server hashtag issue number 15 right so write a good commit message say what you did and why you did it right so it's useful to someone examining the change because you have to write down what the purpose was why did you change the code right make sure that each of the commits is a logical unit right so if you if you make changes only do one thing add a new feature fix a bug don't do fixing a bug fixing another bug and adding a new feature so that's that's the way that it worked so make each commit a logical unit avoid indiscriminate commits right so do not commit all changes at once right so don't say oh I've been working on my local repository now for seven days I changed 50 files and just hit add all so hit add dot add all of the files and push the changes right because that doesn't make sense like because hey commits need to be independent so also make sure that you incorporate other people's changes frequently when you start working on a new feature or when you start fixing a bug make sure that you pull in all the changes that other people did also the other way around share your changes frequently right so if I if I fix a bug and I don't push these changes to the to the to the to the central server then no one will know that I fix the bug so if I wait for like two weeks and then say oh okay so now I have time and I'm pushing my changes then in the meantime someone else could have fixed the bug as well right creating conflicts so often use so often incorporate changes from other people and share your changes with other people as well frequently sorry I got a sip so one of the things is if you work with version control coordinate with your co-workers right if you are working on like a large project and I've worked on projects where like 25 30 people were working on a single software project just walking into the office and asking people what are you going to do today prevents a lot of these merge conflicts right if if my friend says oh I'm going to work on bug 15 then I shouldn't be working on bug 15 right so I had just asking people what are you doing what are your intentions can solve a lot of these conflicts version control tools are line-based they look at text files they compare text files line by line to integrate these changes so never ever ever add binary files to a repository so no word documents no power points no zip files no binaries no dll's no access these do not belong under version control version control is for plain text files containing source code also never commit generated files right so generated files so files which are generated for example by by for example a code documentation tool they should not be under version control because they change very frequently also don't write excessively long lines there's still a lot of people who use terminals so like logging in to a server via ssh and terminals have a maximum width I say here 80 characters I think the recommendation currently is to not have more than like 100 to 120 characters on a single line and this is of course because like if you if you are working on on a command window let me actually show you guys a command window and like this right then there's only a limited amount of space right so and a lot of people still use things like this right so they still use oh that's the wrong one that I'm dragging now they still use command windows especially when you were working on remote servers as you log into a server and then you you view your code but you also edit your code on a server so have make sure that that lines are not excessively long because that will just be really annoying for other people that don't have a widescreen monitor or that have a widescreen monitor but are editing code on a server so one of these best practices is also to tell your version control system to ignore certain files right so you can in many version control systems like hit you can do a hit ignore file and in this file you can say ignore files with the extension pdf or class or dot x or dot dll right so that's that this by putting them into the ignore system you cannot even accidentally accidentally commit them you have to explicitly give a command saying I really really want to add this file to the repository right so but you can exclude files saying that no never put pdfs which are binary into the system and then if you do a hit add all the all of the files it will never add a pdf in class files and never force it right if a version control refuses to do a particular action for example a push to hey so if I have changes in my local repository and I want to give them to the server right I can I can say oh push and then it gives me an error right figure out what's wrong never force push your local version over a remote version especially in distributed version control because you are then making it so then from that point on everyone else working on the version that they have is not your version right so so you're kind of screwing over everyone else so never ever force push so hey there's a hit command saying hit push minus f never ever use that because you're screwing over everyone else only use it when you're like very very advanced in using version control um and from that point on everyone else first has to update before they can start contributing again so never force push good so that was what I wanted to tell you guys today I told you guys about DNA meta barcoding we took almost an hour going through the assignments which is perfectly fine so we coded them live which is nice I didn't tell you about PubMed and Medline I don't know why this continues to be there but I told you about web of science about google scholar about research gate about h indexes i indexes um citations why we use them why they are so important for us scientists um I also told you about scientific reference manager so that it allows you to make citations in publications change them very easily um to a different style because you want to submit to a different journal or you you got rejected at your journal of choice and I have to go to your second choice um so use things like n node or mendeley and besides that use version control if you are ever going to write software in the future you you have to start becoming familiar with version control like I think almost all universities and and scientific kind of research group where software is being developed they have some kind of a version control system be it svn be it mercurial be it hit um so let's just read up on the documentation and also for hit there's very very good manuals out there um which I definitely would advise so for me that's it for today um oh I didn't put in a question slide um doesn't matter so um if you're watching this on youtube thank you so much for watching if you're watching it on twitch then um I'm just gonna end the recording now but I'm not gonna stop the stream so um I will see you guys um next time