 Ddweud weithio i gilywar y bwysig o newid a siaradau â'r ffordd o hynny. Yn gyfan ywroddi'r lleihau yn bwysig i gyd, dros wneud y gofyn yw'r Gweithredu yn Brisbane, gynnyf arlai pan efallai eich trifeig o'r bobl wneud hynny, yr unig fydd yn cynnig hefyd ar y llai gael. Felly, rydyn ni wedi gwybod, mae'n gwlad o fod 40 munud ar y cyhoeddwrkei atio â gwybod lleihau gwahanol, ond yn yn dweud o gwybod a'r ddweud o ddweud, Ac dyna'n gweithio ar y gweithio y cyfnod y ffaith yw'r blos, ac mae'r bwysig yn ysgrifennu'r gwahanol, sy'n gweithio, ond mynd i'w ddim yn bws i dda, mae'n gweithio i ddim yn bwysig. A'r gweithio'r bwysig yn ei wneud yn ymddangos i gael y cwysgiadau, rydw i'n ffrwng i'r gweithio, ddim yn dda'r gyfnod o'i gweithio a'n ddim yn eich gweithio. Rydyn ni'n gweithio'r gweithio? OK, so these are the three things I'm going to talk to you about. First is around data and new authorship ideas and metrics. And what all of these things are related to really the key untypening of them, I would say, is to do with the fact that we're getting much better with technology and publishing now. And that has changed many of the some of many of the kind of quite intractable issues that there were around publishing in the past. And as I said, I'm talking at this from the point of view of one open access publisher, which is PLOS, the Public Library of Science, which is the organisation that I work for. And I'll give you a little bit of history of PLOS just so that everybody is kind of on the same page as it were. So when I'm talking about the future of publishing, I feel like we need to go back to 2005. And I think that James Boyle, who has written a great deal on the future of the internet, wrote very eloquently back there about the fact that the internet is something that if we tried to invent it today, we probably wouldn't do anywhere near as well as it is right now. We'd almost certainly impose restrictions on it. In fact, as he noted, we might even declare it illegal. And I think the challenge for publishing right now and for everybody who's associated with it, and that includes organisations like ANDS and many other different organisations who have a vested interest in scientific dissemination, is that we're not really as yet exploiting the full power of the internet. Where are the things where technology is making a particular difference? So the first ones, I would say, is in open access to publications. And I'll talk briefly about that, but that's not the primary focus of this talk. Open access to data is one of the next frontiers. And of course, that's something where ANDS is at the absolute cornerstone of. And interestingly enough, for those of you who may not know a huge amount about publishing and certainly data in other countries, ANDS is actually quite far ahead of many other countries. The second one is about new ideas on authorship, which I think are really fascinating, and I'll talk about a couple of things there. I'm not going to talk about correction of the literature or post-publication period review, although they are fascinating topics for another time. But I am going to finish with talking about new ways of measuring impact. So open access is the revolutionary idea that the web really enabled and allowed all of this to happen. However, it is just one enabler of change, but it is an absolutely crucial one. So I think it's just worth spending a couple of minutes about understanding what that means. So the first thing is that open is greater than free. So what does that really mean? Open access, which I've defined here is, well actually I haven't defined it, this was defined back in 2003, is the free immediate access online, of course, but it is also unrestricted distributional reuse, and that is associated with a licence that makes that very explicit, both in human readable form and machine readable form. The author retains the right to attribution and papers are deposited in a public online archive, and you can imagine that if this is the case for papers themselves, then it is obviously also the case for data. So where does PLOS fit into this? So PLOS is a not-for-profit organisation. We were launched back in 2000, actually as an advocacy organisation. Since then, we now have seven journals, which range from ones that have a very specific focus through to ones that are published across the entire spectrum of medicine and biology. We also have, and our model is that we are funded with publication charges, which vary according to the journals. We have a publication fee assistance programme for people that can't afford those fees, and we also have some other sources of revenue, and we are now in surplus. We're one of many open access publishers, we weren't even the first open access publisher, however we are one that has allowed us really to build on the momentum of starting with selected journals through to journals that are much less selected. Here we are about the, here are the journals that we have. The first one was PLOS Biology back in 2003. PLOS Medicine, the journal that I started, along with two other editors, and then we have four, what we call community journals, which are all aimed at specific areas, and then we have PLOS One, which is the journal that publishes all of medicine and science. All these PLOS articles are called open access, and what does that mean? Well it means that we can essentially allow them to, we allow, it means that all the articles are able to be, can essentially be used in any way provided that the author is originally is properly cited. So just to hold on this slide for a moment, what does that mean? Well it means that if you want to pass it on to your students you can do that, if you want to put it in course packs you can do that, if you want to reproduce the figures you can do that, and of course critically it's, you can use the data from the papers in text mining or in other ways, and so that's why open access is really the cornerstone of where we're getting on to, which is the next stage, is how we move on and discuss our data. If you look at that in the context of open access, there is a continuum of open access, and one of the things that PLOS has done, we collaborated with another number of organisations, so we have an online, it's an online but also a paper tool, and you can use this to look at how open journals and articles are according to a number of different things, so for example, which start from reader rights on the left hand side, through to machine readability of the both of the article and the data itself, and what you can see here is what the PLOS journals come out with if you look at them in this way, and what you find is that actually we don't fully, we're not fully there with the machine readability on the right hand side, and this is around article being fully available by the text mining, but also the data that's associated with it, and why is that such an issue? It's this, the fact is that much of the technology is not quite there yet to have the data available in a form that is really usable in a transparent and seamless way, and that's why the types of thing again that ANS is doing by providing ways that data can be stored and passed on and shared is absolutely critical because this is the type of issue that not one publisher can solve on their own, but even if they have a commitment to it as we very much do. Okay, so let's move on to data, which is the first of the topics I'd like to cover. We all have this ideal of what a data site, a wonderful research cycle should look like, where we have complete integration of all our data from one, starting with the very early collection, through to the analysis, through to the narrative description, and that it's all stored in an accessible format, that it's linked to the publication and that it's made available post publication. This is an ideal that we would like. The truth we know is something rather different. There are some very specific things that actually cause this to be problematic. First of all, we are, as a scientific publishing industry, still rather fond of the PDF as the primary method of dissemination, and as we know PDFs are a pretty terrible way of sharing information. It's almost, it can be very hard to share the data even if you want to, because it's in a form that's not extractable. It may not be possible to share because of patient privacy concerns, or because the data is just so huge, or because perhaps you got data from somewhere else. And another very key issue is there is no good metadata associated with the data. So there is a large body of work that needs to be done before we can be really get to a position where we can share data properly. And the reason, of course, that we want to do this is because data availability allows all of these things to happen from, you know, the early replication, the validation of the studies through to very serious questions that we're all addressing as a scientific community now, which are around the reproducibility of research. And that comes down to the really fundamental public trust that everybody needs to have within science publishing. And arguably one of the problems that we have now is that we don't have that trust because it's simply not possible quite often to assess the data associated with papers and then assess how reproducible they are. So this is why everybody cares about this rather passionately because it is really fundamental to the research process. And this was shown really very eloquently and rather unfortunately back in the peer review Congress in 2013. Vines and colleagues looked at the availability of data after publication and they found that once you get out to, you know, even 10 years at publication, certainly once you're out at 20 years post publication, the availability of data is vanishingly small. This is a real, I think, shame and is something that as a community needs to be addressed. What do we do at Plos with this? So again, I would just like to just really highlight that we're absolutely not the first publisher just to do this. The British Medical Journal, for example, has led the way in this and other publishers, such as some of the medical journals have also had a data availability statements available for a while, but we wanted to take an approach that essentially crossed all of disciplines as far as possible and that made it very explicit what we were trying to address. So we've always had at Plos a data, a fairly strong data access policy and we have in the past declined to publish papers where there was proprietary information which were based on proprietary data which the authors were not willing to make available and we felt that we couldn't publish those papers because, you know, essentially it's not possible for us to verify whether or not the paper themselves are sound. This is our starting point. This was a project that began more than a year ago now. It was led by Theo Bloom who was our biology director of publishing who now works at the BMJ and very dedicated group of staff within Plos who spent a long time not only thinking about individual policies but also consulting about this and we prior to launching the policy on the 1st of March we had a couple of public consultations about this. We came from a starting point of where we've been quite strong in the past. That was our previous policy. What was our aims of the new policy? Well this it's basically this. We wanted to turn the idea that data was somehow peripheral to publishing to actually be intrinsically part of the publishing process and this requires us to essentially look at all steps starting with authoring right through to publication and post publication of course and capturing data and metadata and making it clear that they are presented in the optimal human and machine readable formats. Most of all we wanted to provide clarity to the authors about what we were looking for. Make it clear this was part of their responsibility when they published. This is a long-term plan that we are moving towards and the first step was really around the trying to change the mindset of how people think about data and also to make it clear what the author's obligations were here and so to make it really clear about what we're trying to do. The Plos data policy does not aim to say anything new about what data types, forms and amounts should be shared and I'm using these words very carefully. These are the exact words that are on our website and you're very welcome to go and look at them and come back and check with us. But what it does aim to do is to make it very apparent, transparent, where the data can be found and make it clear that it's not acceptable for it just to be in some place that only the author has available to them. For example the author's hard drive or a USP stick. It's much more about clarification at the very beginning. Denabling authors to work with third parties to make their data available if that's what it takes. Now I'm not going to go through this in huge detail. I am more than happy to discuss this offline. They just bring up two specific issues. The first is issues to do with privacy concerns and this is a very big concern for many of our authors, particularly those who work on clinical data. It also can be an issue around handling of sensitive ecological data for example and what we have absolutely asked in authors to do is to work with their applicable local and national laws to take advice from anyone in their area who or use accepted norms in their area and to make it clear when they are submitting these types of data how they worked with their, whether it was their funder or whether it was anyone else involved in the study to make sure that if data could be considered sensitive how the participant's privacy was preserved or how the data was de-identified in and so again we're getting a lot of questions about the short answer. There is a way to handle this but we absolutely understand this is a very difficult area for many authors. A second very specific issue is what you do if the data is just too huge and there are specific communities where this is a very big issue for example around brain mapping areas around geospatial data, data collected from Moulas physics experiments would also fall into this. We have again we are committed to working with institutions as far as this is possible. We're committed to working with group organisations such as ANS, increasing that we are finding that there are places where these data can be stored just again taking a slight step back this is not about PLOS or any other publisher saying you know we want your data we're going to hold it. What we're saying is that we want to enable you to share your data and we'll do that in any way that is possible but most of all you have to be clear to us about what your where the data are and what and how you propose to share it. To be really clear these are restrictions that we don't feel that are acceptable we don't feel it's appropriate for individuals to say we're not going to share this because of some theoretical future publication. Of course we understand that there are sensitivities around getting credit for data and I'll talk about that in a moment that on its own is not a good enough region to not share the data for a specific paper. If your analysis is only on proprietary data then it's very important that those are not the only data that are used to substantiate the conclusions and again this is not a new requirement this has been part of PLOS requirements for many years now. Give you an idea of what a data availability statement looks like this was a paper that was actually published at the sorry submitted at the end of december published in may as you can see I think this gives what in a snapshot really that this is how potentially useful the data are these statements are in that you can actually hyperlink back to the original data and since we implemented this policy back in the beginning of march we've had more than sixteen thousand papers submitted with data availability statements we haven't published sixteen thousand we have those ones going through the process we have a very active group at PLOS which is helping authors to figure out how to handle these issues at the moment we're getting around 10 queries a week so it's not overwhelming we're certainly not getting thousands of queries a week and you know we feel confident that the data this is something that authors are beginning to understand this requirement and beginning to respond to but we're absolutely committed to working with anyone who has specific questions so please I would say if there's a message another message to take away from this this webinar is if you have questions please ask us please don't just assume that it's going to be problematic for you if you don't have a place that you can store your data that's available with your institution or national there are organisations such as dried which are working to provide places for data to be stored and these are the big questions I think that we all need to address as a scientific community many of them can't actually be just addressed by publishers I think the third the number third one I would just highlight particularly is giving up a direct credit for data we use and data sharing is really important and is something that absolute has not yet been worked out by either institutions or by funders and I think will go a long way to helping authors feel much more comfortable about sharing their data I'm sure anyone here could come up with many more questions I'll just leave those there as something for you to think about and we can perhaps come back to them at the end of the webinar okay so I'm now going to turn to author identity I'm sure like everybody on this call or more many of you on this call you might have nicknames associated with your names if you're lucky you'll get your names right most of the time that doesn't always happen to me if I'm thinking about my identity as an author these are just three identities that I've been listed on this various in various places in various publications and that's before I even even contemplate using my married name which actually I don't use for one of these reasons which is that a long time ago I decided that it was important to attain some continuity that I didn't change my name so I have a you know a name that's not hugely uncommon but it's not it's certainly not as common as many individuals if for example you do a quick search on PubMed of a name that is relatively common so from somebody with a surname Wang what do you find then you find that the first four papers come up with four completely different author IDs it's clear that it's not the same author because person is going from cell biology and mitochondria through to urology these are not clearly not the same individuals but it gets even worse when you start to look enough to pioneer your physics so this is another paper which also which includes an author with the same name and this has often happened with physics papers has hundreds of authors I didn't actually count them all and somewhere buried right down in the middle are two authors with the same name and they're just one of many it's an extreme example but a not uncommon example of why authorship is so problematic nowadays what can we do about this so this is really a plea to for everyone to think about the need to identify themselves uniquely I like to think about orchid is as a DOI for people orchid has a great tagline and I think it goes to the heart of what they're trying to do here which is connecting researchers to their research in a way that is completely identifiable is completely trackable and that is permanent as the scientific literature expands this is becoming a critical problem it's not just a problem for authors it's a problem for institutions who want to be able to identify their researchers it's certainly a problem for publishers it's a problem for funders because they want to know what their academics are doing and in the end one can imagine that it might turn out to be in the future where we are looking at how we will have our identity online with multiple sources of information including blogs for example it may even become important there so this is the orchid site it has it's very easy to use I just took a snapshot from it recently our registration is very quick it is very easy to add your papers to it and it is something that is increasingly now being used and so you'll have seen at the beginning of my presentation that I had my orchid ID there I'm beginning to see them increasingly on email signatures that are going out we are now encouraging individuals at plus to submit them as part of their publication when they submit their publications and it may well be in the not too distant future that we start to require these but at this point the point of encouraging them so what are the core functions I would say that orchid does well these are the ones these are what they describe as being their core functions and the first is this registry and the second but the second which is even more important is having APIs that support system to system communication and so this is where the power of having a unique identifier lies and this is where it will I think potentially provide most value so this is what my orchid ID looks like it goes right back to papers that I published 10 or 15 years ago through to more up-to-date ones there's an ability to put keywords in for example you can import from multiple data sources and I think if you're thinking about what the advantages of this are I think they're very clear orchid is a not-for-profit organisation it is funded by a coalition of publishers and other organisations and it may well be in the future that it gets funding from for example for institutions and the importance really I think of this can't be underestimated at this point particularly as the literature begins to expand so if you feel after this public after this talk is worth going in having a look at orchid I would really encourage that the next part of what authorship is about is about really understanding who did what and this again is where orchid is a building block through to a larger issue and I think perhaps there's one theme that I like to develop is that right at the beginning I said that technology is something that allows us to do different things publishing in a very innovative way and to have to do that you have to have building blocks that underlie that so one of those is around is around unique ideas so I just again just to illustrate the complexities of who does what on papers this is a paper that I just picked very randomly from Plos Genetics it was published a couple of weeks ago it's got 33 authors it's a fairly complex paper you know who did what on it this is what the the author contributions looked like when you when you look at it and although obviously this is completely accurate with regard to what they did it doesn't give me as a reader a very clear idea of what each of these individuals did and so number of groups are thinking about different ways of working with authorship and moving towards something that is is closer to a proper contributorship now contributorship is something that has been thought about and discussed for quite some time it originally started more than 20 years ago with Drummond Rennie who is the one of the deputy editors at the Journal of the American Medical Association suggesting that this is what instead of just having authorship according to four three or four specific criteria that you actually asked authors to describe what they did the difficulty is as you see it's in complex papers like this this is what you end up with and it's really hard to actually know what people did and of course this is not and let alone have it tracked or attached to anything very electronically so there is now a movement to try and improve on contributorship this has been led by a group out of the Welcome Trust in the UK also academics from Harvard University and PLOS has also been involved into various stages and the idea is that they is that we should really as an organa as a publishing industry and authors think really what we're trying to achieve when we do when we think about authorship because of course a name on a paper is the credit is the currency of academia nowadays and you know messing with it I think is highly is something that has to be done or changing has to be done very carefully and so this group has got together to come up with a tool that essentially will allow easy entry of contributions most importantly it provides a consistent language to describe contributions across a number of different specialties and then it automatically generates a contribution statement and because that's electronic and because you know as one goes forward one can imagine that it's linked to orchid, ideas, etc all of a sudden you're in a position to really be able to understand what individuals did on individual papers and so expect to see more in this area in the future I'm going to now move on to the last part I'm going to talk about article level metrics and then as I say I'll leave some time for questions at the end I don't think many people would agree that having journal metrics in this day and age is a very good way of assessing what an article how appropriate an article is how good an article is whether it's relevant to your field, etc when I talk to authors about how they search for all papers that they want to read nowadays the number of people who go a start journal a table of contents and religiously read through it all is vanishingly small a large number of people start really with not even don't even start with specialist search engines but start with for example Google and so we clearly have to do a much better job of allowing the articles to speak for themselves and the idea behind this was was really what generated the PLOS article level metrics programme which has been now going for for about five six years and was one of the first programmes to systematically generate article level metrics what are we particularly trying to do here well this is a statistic or a set of statistics that always gives me pause when I look at it so this is the this is a slightly old slide now it's back from 2012 but it then well essentially was the entire corpus of PLOS papers that we published up to that point 63,000 papers and if you want to look at the activity around the paper and you're only looking at citations you're only looking at about 0.3% of the activity so 300,000 citations and that's on crossref the same would be true for scopers for web of science etc versus 124 million page views so you can see there is a large possibility there for actually looking at what the individual activity around papers is outside of just citations and so if you look at this a little bit more granular data this is data that is more recent this is up to January 2014 if you look at the activity that happens in papers what's quite interesting is that for example so this is the entire corpus of papers this is 100,000 papers now as you'd expect 100% of them have had views at PLOS that's kind of a relief 98% of them have been viewed at PubMed Central you remember that PubMed Central is the place where the papers are archived a high proportion of papers 90% of them have been shared on sites such as Mendeley quarter of them have been have been shared on Twitter 29% on Facebook a lot of activity some of them are some more relevant to some community than others and some you know somewhere really there's only very specialist interest and so for example the Reddit number at the top is quite interesting in fact probably would not be the case now it's not percent but it's that's only because it's rounded down as it were and so what we're seeing is you know if you want to pick a range of ways that papers are used and accessed etc focusing simply on citations is kind of a rather smorthy to do this is what we've done at PLOS we have a tool called ALM reports which is a anyone is able to use just to give you a highlight of what these look like I'm this was a screenshot I did recently taken from papers a selection of papers from one Australian university and I'll just walk you through what this means colours relate to which journal they're published in the size of the citation sorry the size of the bubble is the number of citations along the bottom we have the age of the paper in months up the side we have a number of total views so what you start to see is very interesting patterns emerge when you look at papers like this and so for example the one at the top which has had the most number of views was published in PLOS one which is orange colour it's had a relatively few number of citations so it's only had three on scopers that we don't in this little box share the full range of all the citations we have to just give a snapshot of it but as you can see it actually is a paper where this is likely to get high readership not just from the scientific community but also from people who are involved in perhaps conservation but who are not academics and also from the general public it's looking at issues around conservation the red list of endangered species and ecosystems so very interesting and likely to be of high public interest for example if you look at the this actually points to the green bubble which is the bubble which has the biggest number of citations this is a paper published in PLOS genetics and it's a genome wide association study of a fairly specific area one where you might imagine that it's going to be read by individuals who are in that area the citation will reflect how much it gets incorporated into future use but probably not too much public interest so what does this actually look like if you look at the papers themselves so I use here one of my favourite examples which is a paper that we published back in 2005 with as we're quite aware a rather catchy title which the author came up with but which has generated a huge amount of debate which is relevant I think to many of the things that I'm talking about in this webinar which is around how much can we rely on the scientific literature and it's the first of our papers to pass more than a million views and it is one where not only has it generated a large amount of scientific scientific interest but it has also generated a large amount of sort of public interest but just to show you why the metrics on this illustrate you can see that the metrics for the number of views which is in this graph at the bottom starting at one month and going up to around 100 months after publication it's gradually increased through time you can see here there's an indication of the number of article views that led to PDF downloads and that's quite a useful thing for us to look at because we tend to find that once you get above 12 to 15% of of that percentage being 12 to 15% that indicates there's a higher academic usage rather than just purely popular usage because that's academics taking a paper downloading it onto their computer with the intention that they're going to use it later and then on the side here this is I've actually cut I've actually put this together this is not exactly how it appears on the when you look at a paper but it will show you I've put this here so you've got it all on screen is the number of citations and as you can see we present them across a number of different sources where the paper's been saved site you like, Mendeley and discussions that are ongoing and these range from Wikipedia through to Twitter through to Reddit through to various other places and it gives a you know a very interesting I think snapshot of the activity of paper that's one thing that you can do with the metrics but you can also use the metrics if you set up searches within our system to give you an idea if you're looking for something in a particular area whether a paper how has the paper been which are the papers that you should be paying attention to as an academic and the example here is just one against around biodiversity you might want to say okay I'm going to as an academic choose to look at the papers which have had more than a certain number of citations more than a certain number of views and use that to help filter your reading and if you're particularly interested in the number of papers of those metrics for a university you can also use it to filter papers that way so this is a search that I was done papers from the University of Oklahoma and then the final thing that we've we're using metrics for at the moment is expanding post publication so we have recently developed a media curation application which means that not only do we as editors or internal plus staff look for coverage but we also can encourage anyone who's reading a paper to let us know about any coverage that has happened and we will we link to it on the paper itself and that we feel is a tremendously useful way of capturing post publication activity that isn't captured in the other sort of traditional metrics that I've been showing you okay so just I mentioned this is what the difference between the metrics when you're talking about a scholarly user perhaps with versus the border impact what we have taken a very strong position about a plus is we don't believe there is one number that you can do to tile this together we find it useful to present the entire suite of metrics and allow readers users to decide themselves what they actually want to take away from it it's also possible to download the entire data set should you wish please ask us because it's just a rather large and we'll happily provide it to you and just after we launched this programme back in 2007 rather a lot of people did actually download it and one of my favorite tweets from that time was this one which was an author who said rather delightfully that the metrics that we provided allowed him to quantify his insignificance I think actually he's rather a significant author but it was great to see the types of way that people were using these data and you know this is now a thriving industry we are delighted that this has taken off many different individuals are and organizations are using metrics to generate all sorts of interesting data and analyses and increasing whistling this incorporated for example into the repositories that institutions are using as well so pick your favorite to pick your favorite type of article metric and you know I'm sure somebody will be doing it okay so that's the end of my talk I hope that was useful and in the meantime I'll leave you with the kangaroo on George Street in Brisbane who's hopefully also been rather alert thanks very much