 Welcome everyone to the webinar series on licensing and research data. This is the fourth in our series. My name is Adrian Burton, I'm a Director of Services at ANZ. Today we have with us, as usual, Bayden Napoliad, the National Program Director of OSGAL. How are you, Bayden? Can I hear from you too? Yeah, and we'll be hearing from Bayden a little bit later as part of this discussion and then we'll get our regular update on OSGAL as well. Today we have a special guest, Dr Matthew Todd. Are you with us, Matthew? Hello there, yes. Hi, Matthew. Matthew's from the University of Sydney, an organic chemist, is that right? Is there a chemist in general? That's organic chemist, yes. Organic, yes. Please, not what you think. All right, that's not like organic foods and things like that, is that right? That's right, no, no, not. And you're with the School of Chemistry, that's right, at University of Sydney. But I see here, we've done our extensive research on Google on you. You've also been at the University of London as well as Cambridge, January, Berkeley, is that right? That's right, yes. Sounds like an interesting career and publications seem to be just filling pages here, so I won't even go into those. I see that you're part of the OpenWetWare organization, which sounds very interesting. Is that something you might be talking about a little bit later as well? Definitely, we'll be covering that, yes. That sounds fascinating. I know about Open Software and I'm fascinated to hear about OpenWetWare. The thing which took my attention was this statement where you were saying we're trying to make the right molecule in the right place at the right time. A very modern approach to things, that's what we're trying to do with information to get the right information at the right place in the right time. And I'm assuming that some of those problems with information will lead on here. So the reason we've got Matthew here is to talk about his research in general and his approach. So that's something about the organic chemistry that's behind all of this. But also in particular, we're interested in his approach to research and this whole idea of open science and how that's changing the way we do research. And in particular because it's part of our licensing webinar, how the free flow of information and the access to information and the clarity around reuse arrangements as exemplified in licenses and things like that, how that is affecting positively or negatively the world of research. So you have a presentation for us, is that right, Matthew? I do, yes. Now, by the miracles of modern science, modern information science, we can switch over to your desk, Tom. Okay, well thank you for that intro and thank you for the chance to come and talk to you about some of the stuff that we're doing, which is based in a scientific research project. And I'll talk about two different projects which are united in the way in which they try and handle information and in the way in which they try and do science in a slightly different way. One started many years ago, but as I'll describe it, sort of gathered pace more recently and another project today is about a year old. So I'll talk a little bit about the background to why we wanted to do these projects, which is an important bit of context. And then there'll be a little bit of chemistry but not much, just enough so that you understand what it is we're trying to do. And then I will spend some time, I guess throughout, but more in the second half throughout talking about how it is we're doing some of these projects. So the sort of technical things that we've adopted to try and make this research happen. So the main question that I had many years ago, back in about 2005, 2006, was I was fascinated by the differences in how we do things in science versus what was happening in software. So I was a post-doc back in about 2000, 1999, 2000, when the web really was starting to take off, I guess, amazing things were happening and then I was an academic for a few years and then I left the UK and came here and at the time I had a scientific problem, which I'll get to in a minute, which made me think about the way in which we do science and do research more generally. And I read around a little bit and I came across this analogy from software of the cathedral and the bazaar as a model of describing how it is we do work, how we try to achieve things. So with the top model being this cathedral model where we tend to work in a closed team and build beautiful things and we're all very highly trained and everything's very expensive, but we achieve amazing outcomes versus the bazaar model, which is maybe where we don't control participation and we allow anyone to show up and we don't really plan it and there's a very minimal overhead involved but something sort of emerges from kind of collective action with a minimum amount of planning. And both of these things, they're very different things, but both of them can be extremely effective at doing certain things, at achieving certain things. And I found this very interesting from the point of view of software and it made me think about the way in which we do research. So a lot of things are being highlighted in the last few years, I think a lot of things have been highlighted about limitations in the way that we do research. For various reasons. You can point to problems in incentivization of research that people are trying to get a lot of papers out very quickly or trying to demonstrate a result and get a patent and generate money. A lot of these kind of motivations are relevant and you think well, is the process of research working and is it working as well as it could? So these are two separate questions I guess. I think most people would think that research is working but one has to be very careful. And there are some pretty shocking cases where research, the process of research isn't really working very well and this was one of the most shocking from last year I think. Actually, I read this earlier this year, I think it was published last year, which was a study by a couple of scientists at Amgen, so one of the best known biotechnology companies in the States, in which they set out to reproduce the research findings of 53 papers in the area of oncology. So these are pre-clinical findings. So these are the kind of research that you do in academia or early stage industry, which is pre-clinical trials for a drug, but it's the work that you build on to make a drug discovery program. And in trying to reproduce this work, they found that only six of the 53 or 11% of the cases could be reproduced, which is pretty shocking. Given that a very promising study in pre-clinical oncology could set a line of research going for many years and involve the investment of many millions. And there are other studies like this where people are questioning the validity really or reproducibility of important research findings. And in biomedical research, because it's very important that we have things right and it can be reproduced and we're building on a very solid foundation. But it does make you wonder about reproducibility of science more generally and in general terms we don't really worry too much about reproducibility in the sense that we don't tend to often reproduce public work, partly because such work doesn't result in publications. So there's very little incentive to duplicate or try and reproduce known research findings. So there's a potential issue with research in terms of checking for validity and checking how reliable procedures are. And of course one of the reasons why it's a problem is because all the data associated with a research project, all the data are not necessarily available. An academic paper or a patent is rather like a press release in the sense we choose the data we'd like to release to the public and release those data. We don't necessarily release the other 95% of the data which indicated that the proposed method or research finding wasn't quite as smooth and neat as it's published, as is described in the publication. There's a big problem with drug discovery as well at the moment in the sense that what used to be an extremely profitable, it still is profitable, but extremely profitable and promising industry, major pharmaceutical industries in the world, are undergoing rather a downturn for a number of reasons. Some are economic, some are scientific. But there are significant issues, which means that the drug industry that I knew when I was a PhD student is no longer really there. It's not quite as big and as profitable as it used to be and there are significant challenges to that industry. A lot of fantastic work is still being done, obviously, but there is less sort of adventurous R&D perhaps and there are major economic challenges to the industry. So big research sites being closed, such as the one in Sandwich and Kent, the Pfizer research site that was discovered by Agra, has closed along with lots of others. So drug discovery is, in particular, undergoing a real downturn at the moment. So there are these issues. So I became interested in thinking about how we do science in the age of the web. And on the left here, there's a schematic of the typical process that's involved when you do research in science, where someone does some work and then publishes that and then someone else reads it and then responds to it, either by doing some work in the lab or applying for a grant or one of these things. And those blue arrows indicate a lot of time delay between those events. So it's a very serial process with lots of delays of grant review and paper review and so on. And what I was wondering, taking the leaf out of the software development book, was whether we could do science in a different way, where instead of being an information resource, we actually used it to work together, which is not a radical philosophy from the point of view of software, but it is from the point of view of science, where we tend to do this. So I was very inspired by these things I was reading about, both in information, sorry, Wikipedia, but also in software like Firefox and Linux and all these things. And I was amazed at the speed with which really reliable products were being developed. I mean, the assent of Wikipedia is just unbelievable how rapidly this thing was put together in many different languages, how many times bigger it is now than Britannica and other things and what a great resource it is. It apparently emerged with minimal coordination compared to the size of the project. So I was very inspired by that and thought, well, can we apply some of these principles of a network which is not necessarily strongly coordinated and which relies upon the inputs of many. Can we use that in science? Now there are a few things that have been done or were being done that are along these lines. So we've recently seen science being done on the web using citizen science, where you have a project which requires the inputs, the reason to be small inputs are a large number of people. So the Galaxy Zero Unfolded Projects is a spectacular project which resulted in meaningful high-impact scientific outcomes. Folded is about protein folding and Galaxy Zero is about classification of galaxies. Which involves thousands of people and applications have been written on the web which allow people to input. So citizens and untrained scientists in many cases have contributed to these projects which is really tremendous. The Polymath Project, which started at the same time, around the same time as the project I'm about to tell you about, was a problem started by the Fields Medalist, Tim Gower at Cambridge University and the idea was to solve a problem that could be solved using a block and whether that could be done. So again, without restricting, who could take part? You just open this problem up and say, okay, well, who wants to chip in? It's a very simple idea really. But against much of the kind of cathedral model of academia where someone is meant to be seen to be in charge and the project structure is meant to be quite restricted and well elucidated, an open project on the web is much more chaotic seeming. So there are a bunch of different things here and I haven't got time to talk about all these in detail, but there's a range of some things, the citizen science where you have participants from the public, just sort of smaller projects where you're expecting the participants to have greater expertise in the underlying science like the Polymath Project and like the thing that I'm going to tell you about now. So the idea, I like this quote that I found recently, the idea instead of the usual academic model of us working in secret and competing without knowing what each other is doing. Instead of that kind of feral competition, you embrace this idea of conscious cooperation which means that you know what everyone else is doing which is the difference. So you don't keep secrets. I mean it's the difference between a 100-meter sprint of the Olympics where everyone is on a level playing field and you can see how fast everyone's going versus this idea that you run in a kind of tunnel and you don't see what everyone else is doing and you just do your best, but you never really see what else is going on around you. And I think a sort of open competition which ironically results in conscious cooperation where everything is shared which was an interesting idea and one we wanted to try and apply. I like this paper I found recently too which implies that if you can develop a network of people who interact in a kind of light way with dynamics, links between people, not too strong, not too weak, the kind that you see I think in a lot of software projects that you can encourage people to cooperate. I like this because this kind of network feels like it exists behind the two projects that we've run that are open science that people can come and leave depending on the project need and depending on that person's enthusiasm for the project. So it's very dynamic. Okay, so there are a number of things about sharing data which are happening in science. So there are lots of initiatives in trying to share data and the idea of course is to get more eyeballs on a problem so you try and unlock data so that anyone can look at the data and can develop ways of searching the data and spotting patterns. And these are extremely important and a few of these are listed here but this is a lot of drug discovery stuff and of course there are lots of other initiatives around the world for promoting open data which are all very important. But just to be clear, open data is very important but it doesn't necessitate anybody actually working with anybody else. So you can take the data and download the data and then chew it on your own and then do something in secret and you don't have to release any of that. You don't have to work with anybody. And I guess the thing I was interested in was more specifying a problem or specifying something you need to be done and then actively working with people in an open arena where you don't keep any secrets which is a different thing which is actually a collaboration which is open rather than just having data that you have to build on data but it's not enough for a collaborative project. So the project we started and this was back in 2006 was on this disease called Schistosomyces and you don't need to know about this but it's a neglected tropical disease which is very significant in sub-Saharan Africa and elsewhere and it affects 400 million people worldwide but we don't tend to see it very much in Australia because it's not in Australia it's mainly in the most impoverished nations and it's a very nasty parasite that lives in your blood and lays eggs in your internal organs and does serious damage. The frequency doesn't kill you on its own which is one of the reasons why it's neglected because it's very sick and you don't develop it correctly and it's a massive economic and social burden on various societies. So there's a terrible disease there and thankfully there's this very good drug which you need to treat it which has this structure here it's been known since the 70s it's off pattern and it's a very nice compound so it's given at the moment to about 100 million people and needs to be given to more than that to give this drug to to whole populations in various countries in sub-Saharan Africa. Now there are a couple of issues with this drug one is that it's the only one that's available to treat this disease so we have to use it very carefully so that we develop resistance in the parasite otherwise that would be catastrophic. The other is that it tastes really terrible and that might seem like a cosmetic thing but it's not if you imagine trying to give this drug to people if it has a really bitter taste which it does, it's a dreadfully bitter taste then a lot of people just don't want to take it and so they either don't take it or they don't take enough of it and you can actually enhance resistance by doing that by giving sub-lethal doses of medicine so particularly for kids this is a problem if they don't take this drug it turns out that the molecule has a certain feature it has this thing that it exists in two forms like your hands so the molecule that's given is right there and that's actually two separate molecules shown beneath and these things are mirror images so like your hands they look the same but they're actually not in mirror images and the one on the bottom left there is the one that works and kills the parasite very effectively and the one on the right which is the mirror image of the molecule which is also in the pill tastes really awful and it's rather bad for you and doesn't kill the parasite so the World Health Organization we were talking to a few years ago it would be great if instead of making the one on the top there which is a really easy thing to make it has this sort of symmetrical feature which most inexpensive things have like bowls and mugs, they're symmetric and so they're kind of cheap to make it would be great if instead of making a symmetric thing we can make just the molecule on the bottom left the one that is great at killing the parasite that doesn't taste good and exclude the one on the bottom right so get rid of the taste of this thing and then market the drug like that it would be great because A it wouldn't taste bad and B it would be a smaller pill actually less molecule to take so your drug burden goes down and the pill is easier to take and so on and now there's a very difficult problem in chemistry I mean a lot of organic chemists I know spend their professional lives trying to solve problems like this and it's difficult because of course as soon as you start doing research on something you increase the price over it and the prices come down so much through generics manufacture and competition you can give someone a pill of this drug for about 10 cents and if you start doing research on it the price is going to go up and that means you're not going to be able to give it to millions of people so this was interesting from my perspective because it was a chemical problem an important chemical problem and one that I didn't see that we could solve in academia because in academia we don't normally worry about the price of things we normally just try and discover new things we don't worry about price and of course industry didn't want to solve it because there's no profit margin so we had a problem here which I thought was important to solve and we couldn't solve with traditional research mechanisms that was the idea so I was stewing on this and thinking wouldn't it be great if, because I didn't know how to do this wouldn't it be great if instead of me working on this in secret and occasionally releasing things that I've done but I put this out on the web and say well who's got a good idea here and who wants to help out experimentally that was the idea and that was the problem that we started with so to do that we had to get things online and to do that we had to start sharing data like I said it's important to have open data and I had experience with this previously in a drug repurposing project shown there where a bunch of work was done by a group and all the data were deposited online this is at the Tropical Disease Initiative but there was no project that followed up online so again it was open data and this thing on the bottom about Glaxo depositing anti-malarial drug data I'll come back to in a minute but again it's open data being deposited and it doesn't actually involve any collaboration that's happening online so we had to get a collaboration going a project where people contributed and the Synaptic Leap website was something that we started up in about 2005-2006 so I met someone called Ginger Taylor who just started this up and my project went up as one of the first projects that was there and so you know we posted this scientific problem as a thing that people could help us with or could contribute to for a solution of how we might prepare this drug as this single mirror image form without a price gain and people kind of helped when they knew that the problem was there but there's a suggestion there from a colleague of mine in Queensland who suggested something very good which is actually something we're trying but the problem was of course that there's no real incentive for people to participate because we're not really driving the project ourselves. I had no one in the lab who was on this project and so we weren't really seen as being an active kernel of the project and that was an issue so eventually I realized this and I got some funding from the World Health Organization as a linkage partner with the ARC we got this funded about 2008 and then it took about a year to sign the contract so we started in the lab in January 2010 and the first thing that we had to do was to make sure that the work that we were doing resulted in data being deposited in the public domain and that meant that our lab book which traditionally speaking is a paper thing that sits on your desk had to become electronic more than that it had to be a lab notebook that was on the web so we had to use an openly available electronic lab notebook to describe what we were doing now a lot of people around the world use electronic lab notebooks a lot of industry have their own and that's good but none of them are on the web so I was one of the first who was kind of sitting there on the web so every experiment is there and all the data are associated with an experiment there additionally we wanted to make sure that the platform itself was open source because if we wanted people to work with us they had to be able to do that with a zero overhead essentially so we didn't want to require that people who wanted to work on the project had to buy software so in this case this software called Lab Trove which was developed by the University of Southampton in the UK was perfect, it's a pretty simple logging type platform which has been adapted as an electronic lab notebook which is there which is pretty useful I'll show you a page of it in a moment, a live page so we started in the lab and got going and you start posting data and then people realise you're a bit more serious and you have to start publishing a little bit before you do the project so rather than publishing the paper and then talking about it you describe it as it's going on before it's done and these things all help so I tried to publicise it by doing a tour Google and Nature picked us up and a popular blog called in the pipeline picked us up and every time that happened you get a spike of activity where people find out what you're doing and then come along and you have to spend a lot of time trying to make it clear to people what's required in the project and what the next steps are but as long as you do that people are interested in suggesting things which is very useful so the thing which was amazing was when we hit a roadblock so we hit a very significant scientific roadblock in this project which essentially was that we didn't have the right piece of equipment in the lab here at Sydney and we didn't know what chemical to add into our chemical reaction to help us extract this near-image form that we wanted so we had advice from a lot of company guys that we should be doing something called a resolution approach which is something that's very old and well-established chemistry that Louis Pasteur would have used many years ago and we should be using that rather than the kind of highfalutin approaches that we were initially taking and so we changed the direction of the project to account for that and then we hit this roadblock where we didn't have the right equipment and we didn't know what to add into this chemical reaction to make this near-image crash out as a nice sort of salt and so I put out a request for help on various online fora including LinkedIn which is the networking website it has groups there of many people with shared interests which is about a thousand people who specialize in this area of chemistry, process chemistry which is large-scale chemistry and crystallization but a lot of very valuable advice back including some offers of help and from those offers of help we selected one company that offered to help us and sent them material in the mail rather low-tech approach of collaborating but we just nailed on the ground also with this material and they did some experiments and shared the data with us we posted that on the website that helped us get over this roadblock extremely quickly because these guys were specialists and they really showed in a very public arena how good they were at solving this problem so with that lead we then took out an optimized that changes a little bit in the lab for the rest of 2010 and then we ended up getting a process which was pretty good so the top line here is the chemical version of what we did the interesting thing about this was on the bottom a professional contract research organization were looking into the same problem at around the same time and they came up with a solution on the bottom starting with some intermediate molecule which we didn't know existed but which you can buy on a large scale from China so these two solutions even if you're not chemically minded these two solutions obviously look quite similar so the end result of course was that the open source approach found something that benchmarked pretty well and something that was discovered by a professional output. I'm not sure about the time and the money involved in both of those processes but these solutions end up being pretty similar so one key result of this of course from our point of view is academics but also as a way of publishing a milestone is to make sure that all the stuff that you've done is actually unpublished and so even though all of the data and all the lab books and everything were and all the discussions were already available on the web they weren't summarized anywhere as a form of paper so we wrote all of this up and published it in plots and eclectic tropical diseases with a description of how the project worked in interchemistry and that was an important proof of point in the journals such as these and others will take papers that consist of science that has already appeared in the public domain that was an important feature for us so releasing everything on the web before publication does not in any way stop you publishing the work subsequently it does in certain journals but it doesn't stop you publishing the paper outright so there were a few things just to quickly talk about some of the nice things about the way this project felt when we were doing it one was that it was very nice to have all the work being very transparent so it's not just the data that we share it's also the discussions of where the project's going and what we're going to do and when we don't know how to do something we talk about that so the whole process is transparent and I don't know I sense that the public are getting wise to the fact that they're funding a bunch of science which is happening behind closed doors they might actually be quite interested in seeing what's going on we don't really have the right to do it in secret if we're being funded by the public so it felt very nice that I could tell anyone who was interested that they could go and look at all experiments and see what it was that we were planning on doing there's obvious educational advantages which we can't talk about but if I was a kid watching one of these projects would be great to see to have the project changed every day and just follow it another issue is that the project doesn't rely on me either as a non-PI dependent so if something bad happened to me or to the team or to the grant funding it wouldn't really matter as far as the project goes because everything is there in the public domain so anyone can just pick it up and run with it it also felt extremely fast that was another feature of the project that we were really helped by people and accelerated the process of research, no question the research was faster because it was open because people found us and came to the project but we realized another lesson which I think anyone in software would know which is that you don't just sit back and expect people to help out you have to be pretty active in driving the project and every project should be the same you have to have someone who's driving it it doesn't have to be us it was us because we were funded but you need someone who's pushing things forward as we were doing this you come up against a lot of problems with data some issues with how you share data and how you interact with people but also with the web and one of the nice things about it was because you're coming to the open you can work with anybody the project has benefited a great deal from a lot of technical inputs by an undergraduate here at Sydney who's sort of modified the electronic lab notebook and built extensions to it and has shared those on the web including ways that we can Google molecules for example so when you load up Google there's a little search box there and it doesn't clock but you can't draw things in that box or you have to put words in so if you want to search for a molecule you can't just draw a molecule unfortunately which is the way we work as chemists and so Mike developed a way where some lab notebooks can be recognized by a bit of software and converted into a text string or smile string which is a very common way of representing molecules with letters and then you can search on that so he's developed a way in which our pictures of molecules can be inserted as text into our lab notebook entries which means that Google would pick them up very nice, I mean it's a beautiful little extension to the lab notebook which makes it much more usable and one of the nice things about working in the open is you can just work with people like this without worrying about confidentiality or patenting and so on we also experimented with writing papers in various ways so we have written a review on a set where we have nearly finished writing a review on a certain area of chemistry using a wiki where anybody can help us out again with writing this so this is written on open wetware which is a wiki based community for biomedical research we're using it for various things like writing papers and one of the nice things about a wiki of course is you can write with anybody and allow anybody to come along and contribute and help and particularly with the review that's quite useful because a lot of work goes into a review and the more in area so in this case Michael Tarselli here is a guy that we've never met but we've been working on this review with and he's been contributing and when we finally finish this thing off we will obviously be an author and a particularly nice thing about a wiki of course is that you have a record of all the revisions and you can monitor who's done what and it's very easy to do quality control because you can see who's out of the piece and then you can keep track of who has read what and who's checked what because of course with every wiki page there is an about page and you can write about the nature of writing the paper as well as the paper itself for data for other projects that we're using I'll get into the drug discovery one just at the end in a couple of minutes but we're using other sites that already exist for the sharing of data so we're not really building a lot of sites for data management we're using a lot of things that are already out there including so GitHub is the thing we've just started playing with as a way of sharing things like Excel sheets and Word documents which again keep the version histories which is very useful GitHub is meant for software but is useful for files too we've also been using something called Campbell which is an online database of a lot of biological activity data and Pubcam itself which is a wonderful resource of freely available chemical information so we don't have to build anything we can use stuff to read there and interface with those sites which is very powerful so we nearly have everything we have from the top there we've got an electronic lab notebook we share everything and then to the left we've got this thing called the synaptic leak which is a block where we describe what's going on on the bottom left where we can write together we've got journals on the bottom right where we can publish the work and then we've got other things like other blogs which talk about what we're doing so again we get commentary on what we're doing in an informal setting so it's a very useful sort of circle of how to do science on the web in a very interactive way and it's nearly right but we're missing something very crucial which I'll get back to right at the end so just in the last few slides I just want to mention the project that's going on now which is drug discovery which is much more complicated because there's IP involved in the sense we're discovering new molecules now and of course traditionally if you ask anybody they would expect you to say that you need to have a patent to discover a drug this is not true but it's the overwhelming opinion from anyone you speak to who is involved in discovering drugs it will be necessary we decided to try and show that that wasn't the case and that just means taking a compound which is in the public domain testing it, evaluating it and trying to optimize it as a compound that could be used to treat somebody for a disease because we had a wonderful support from the medicines malaria venture in Geneva we started a malaria project and the starting point is an extraordinary paper from 2010 from GlaxoSmithKwan who put in the public domain thousands of chemical starting points which are active against malaria really tremendous compounds which kill the parasite inside red blood cells and so we took some of those leads some of the molecules are shown and you start modifying them and evaluating them we started out with a set of principles about how the project would work and those are shown now some of these are crucial and all data and ideas must be shared about no flame wars ideally in online form is quite important trying to avoid email at all costs and then the sixth law is very crucial which is that the project exists as a thing and is not ours it's something that we are contributing to and leading at the moment but the idea is that it's meant to be beyond us and anybody can contribute to the project they don't have to join our team they join the project so this is a chemical slide but the point is that you make variations and we have collaborating labs in the world who are doing that and then you send those to biologists to test them on malaria parasite and we see if we can make improvements we've already found very potent compounds this way in the last year or so which unfortunately were metabolized so they're not perfect but the process is working in the sense of sharing data very effectively and people are offering advice and getting involved that's the basic message again it's not just the data so we have consultation sessions which are streamed rather like we're streaming this session now and then we record those and we put them on YouTube so again people can see where the project is going it's not where it's been, it's where it's going and we're at the moment in the middle of a consultation about which compounds to make and we're trying to get those compounds made by multiple people I'll show you an update about that in a minute we use informal things for dialogue in this project so Google Plus is proven to be actually quite intuitive we're coming conversations about results and data and other things it's a good way of being found it's very searchable but it's also very intuitive the dialogue is very good because it's peer to peer essentially so this is an example of a conversation which happened between an undergraduate who was working in my lab, Zoe and a professor who in Melbourne who has been very supportive of the project and who's an expert in medicinal chemistry and this is an example of a conversation based on a point of science between the two of them without me getting involved so again one of the nice things about a blog like this is that it's a level playing field and people can interact with each other without any hierarchy getting in the way which is what science should be all about so it was very nice when you see things like this happening but also other people can get involved so at the top there is a very long and evolved blog post by a pharmaceutical expert which came out the other day which is very useful and was full of useful information and suggestions and criticisms which is also very important and below is a lab notebook that started up on our site by a chemist from Lucknow in India who has been making molecules as part of the project so it's a very easy way to collaborate with someone overseas where you share all the raw data which we never got so one last point about the thing that's missing so we nearly have everything but actually we've done everything that we need to do real quality open science because we're the only ones doing this and unfortunately the rest of the data universe in chemistry is not open so what we really want of course which would be transformative is if everybody started doing this and sharing data and if chemistry had its human genome moment where we decided to put all of the chemical reactions that had ever been run into the public domain it doesn't happen at the moment because the databases of chemical information are expensive and proprietary and so we can't search data and we can't devise algorithms to effectively search chemical data at the moment which would be great for us so our data can be searched but of course we don't benefit from the searching of everyone else's lab notebooks because those lab notebooks are either secret or they are on paper unfortunately so there are some people I want to thank there and there are lots of people of course in this project because lots of people get involved people who pay the bills are shown there and people who have done a lot of the work are shown there too I wanted to just in the last couple of minutes just show you the live versions of a couple of things and then I'll stop talking just instead of having a slide there I was going to show you a couple of things which are live so this is an example of one of the lab notebook pages which is so it's a regular page there's a picture of a chemical reaction there's a table there of how we are working there's a hazard assessment so that you can see that the relevant safety proportions have been taken and then there's a bunch of descriptions and pictures and there will be some data this reaction is from the 26th of November so we're waiting for the data there and when it's acquired it will be uploaded it's very nice because it doesn't require any sign in you can comment on things such by date and other things so it's very intuitive and useful and it's open source and it's free to use the blog that we've got running on synaptically again is quite attractive you can post pictures here so this is an update that was done yesterday about where we are with getting the last few molecules that we need for this series in malaria and it's very easy to post comments at the end so again you can interact in this nice way of reading and thinking about something and then posting your thoughts and then the last thing is it's just to share the Google Plus site which is very intuitive and I just wanted to show you this because we need three more molecules for the completion of our current set and one of the postdocs, Alex Williamson who's worked on the project just posted this this morning which I think is fantastic which has got the three molecules that we need mocked up in a wanted poster and it's already getting people forwarding it because it's a really good idea it says wanted active or inactive preferably active though which is a really great thing to do so this is a way in which you can go from the lab notebook which is formal data to something which is much less formal and more kind of usable for people to bring attention to the fact that we need for the project so it's a range of things and we will use whatever sites are intuitive for people and popular in which to do the job yeah thanks Matthew that's absolutely fascinating kind of overview I hope you're not going to ask us any questions on those little smile diagrams is that right? no that's fine because I did study for a long time the difference between the two symmetrical opposites I'm sure I could answer even the difference between the two a question that occurred to me was you brought up the link between data and collaboration and you sort of said that open data was necessary for collaboration but not sufficient there were other things that were required for collaboration and you went through a lot of the social and technical tools there for the collaboration I just wanted to ask you about the well the link between the data and collaboration in the sense of the openness of data and how much that leads to more collaboration in the sense of that if I have a data set and I'm making it available is there is that likely to lead to more collaborative opportunities and I suppose as a follow up to that I'm assuming that collaboration is good and what is the link between collaboration and success in the sense of getting the right outcomes or getting the right applications the link between the data and collaboration and the link between collaboration and the final goals that you're trying to achieve yeah absolutely I take the long view here that in 500 years time we are not going to be doing science like we're doing it at the moment the world is going to be a very different place and the activities of every born on the planet are going to be linked in a way that we can't conceive of right now and it's going to be very easy to find anyone in the world who is working on what you're currently working on and to collaborate seamlessly with the best and most active people it's going to be it would be absurd to I think in the future it's going to be absurd to keep research sequence basically so I think the answer to your question is that I think if you if you share data you may stimulate activity the human genome project stimulates activity but if you share your activity itself you share what it is you're doing I think that tends to stimulate activity more because it's a human instinct to get involved with something that is interesting which is going on right now if I walk along the street and I see someone building something I'm more likely to think what's going on there and what can I do versus if I come to something which either is abandoned or which is being built it's the same with if you play with Lego people tend to join in talking about my son so I think it tends to bring out the best in people if people can think well I can actually go involved here and so I think that's a very different thing from sharing data if you share data it's never quite clear if the project is active if it's still going on if there's something that you can do which can make a difference whereas if something is being shared every day so there's a dribble feed of activity every day I think you will I don't know I think people will be more interested to see what's going on there it's going on right now so if I suggest something now and if I do something now I will actually make a difference so I think it's a human nature thing which is very important I think it's a very important point what about the acknowledgement is it necessarily diluted by openness your link with the success because the team is bigger sure I think so yes I mean at the moment I guess we have a luxury that we're the only project doing this kind of thing with open source discovery and putting out small molecules every day in the public domain so we have the luxury that if we find a drug if that actually happens then that's going to change everything and so that would be very high impact of course if everybody did it which is what's going to happen but if everyone does it then the impact it will be of course less because it's not going to be the first time it's happened it's still significant because you're helping people but it's not going to have the same sort of academic significance so I think the component then would reduce but then you think well why do people contribute to open source software in such huge numbers there are millions of people literally who on SourceForge posting things and doing things I think there's a natural human instinct just to help solve problems and you can do that in a public domain and demonstrate how good you are at doing something in a public domain with an open project so there's an incentive because it's kind of on a live stage and some of what you're talking about is a social hierarchy as well let's say you were talking about science without hierarchies and science with people from all over the globe and science with people from developing countries etc who may not have had the opportunity before and a lot of these are different it's a change in social attitude as well yes exactly so it's meritocratic it's genuinely meritocratic because you just need a webpage and a connection and you don't have to be called professor so you can just chip in based on your expertise and if you make a genuinely useful point that will be acknowledged it's how it should be it should be essentially blind that we don't really care who we are we just listen to the contribution I think that's one of the real strengths of this and one of the things that's most refreshing about it is that the contributions don't have to come up a change of command they can be given directly the educational point about that that's also extremely important is that students in different countries can educate each other both ways in a scalable way that doesn't rely on everything going through a small number of academics so with the access to information do you have a friction not knowing whether you can use information that seems to be in the public domain or where you're not 100% sure about the terms and conditions of being able to use something or hidden IP or things like that how is it working all of the data that we generate is the project unless although I stated it is governed by CC by 3 so you can use anything in a project that you want something with attribution the data are open for free reuse if we so that's all of our stuff if we use something else from someone else then we obviously can't take papers from journals that are subscription only and then post those on webpages we can't do that so we have to be a little careful of that if we get contributions by the only stumbling block usually contributions that I get by email questions or comments that I get by email that I then always have to go back and check with the original person if I can use the comment and put it in the public place and I have to then manually go back and check it's one of the reasons why I try to sway people at all times from using email because of that so I want people to be able to contribute in the public domain completely and it's pretty rare that there's a reason why the public domain is no good why email is something that is required and on a day to day basis we have no issues it's only with the occasional email or if we need to share a paper in some way we have to be creative in the way that we describe it without infringing any of the copyright there's a question from Catherine we might try and transfer Catherine to the mic here Catherine can you hear me about our read Catherine's question have you seen a spin-off effect with folk outside your research team but within the same science community for example other research teams now taking a similar approach not yet no we haven't I mean there are massive barriers in place to mostly incentives and metric measures that would dissuade people from doing this the metrics of science, academic and industrial science at the moment encourage competition and secrecy so we need to publish in certain journals and those journals often don't accept what this public domain and there is a competitive advantage at the moment in publishing something at the expense of other people so there is not too much incentive to share I think those are changing gradually there are significant changes coming from the top down which are going to make a big difference in terms of the requirements we are seeing now being mandated for open active publishing but also the sharing of data I think those are going to make a big difference to the way people work but at the moment not really no there is an open source directory consortium in India which has been very active in the annotation of a TB genome which has not so far put a lot of information about drugs in the public domain so as far as I can tell William is doing this small molecule of drug discovery in the public domain I think Jean had a similar kind of question Todd she was asking about is there resistance from institutions about their researchers being involved so what does University of Sydney think about Ewingville for example let's not make it personal but that's the kind of sure yeah the University of Sydney was very good about for going IP on the two projects that I was talking about of course because they wanted other projects to happen so built into the contracts there is an unusual IP clause which says that basically there isn't any because it's going to be open it's going to be open source so there's no delay between the experiment being done really and the data being released so there's no way you can patent anything on that even if you wanted to so they've been very good about that now I don't know if that can be broadened out of course beyond just our project that's a very interesting discussion though about whether universities are monitoring IP correctly or whether they're being too aggressive in monitoring all of their IP I don't know the answer to that question I'm not an expert on that I would advocate endlessly that if you really want to innovate you need to stop thinking and worrying about intellectual property too much but if you really want to do things quickly you need to work more quickly and have more eyeballs on the problem so what you lose in IP control you gain in speed and innovation but that's a pretty big discussion that we can have over a beer yeah a very large beer let's do that one in Germany alright I might just take the opportunity to cut over to Baden and just check well, unless Baden had any other questions on this particular topic there's an update from the world of Osgole, Baden thanks, Adrian I'd just like to say thanks, Matthew that was a terrific presentation I do confess to the world at large I had seen it before in Helsinki or something similar to it certainly not as up to date as it was now and one of the reasons why I was so happy that Matthew was able to attend today with me to share that all with you well then I thought you should know what those diagrams mean these things I've seen it twice you just beat me to it I was just about to say but still I can't get over the chemistry well you'll have to see it three times then, Baden I think I need to have a really big beer with Matthew he's a basic chemistry don't worry, no test but in any event I thought it was interesting to note some of the principles there that you were talking about on that slide Matthew I thought one of the the first law was open and from where I come from that translates into making sure we've got all our house in order and as you've pointed out copyright and nested copyrights can be a pain in the neck but if you think about them first they don't necessarily have to be as painful as they perhaps otherwise could be but aside from that what's happening in Oscar World probably not a great deal to discuss on an academic side a lot of stuff is happening in government but on the academic side of things not a great deal I did note however the release yesterday of the NRIP the National Research Investment Plan and in fact I counted no less the 19 references to open access in that plan that was released yesterday the day before from the minister so I have high hopes as Matthew said that things are changing from the top down and it would be very good if we were prepared with licensing as these things did commence to change now the only other thing I would point out is sorry on the CC side of things version 4 is about to be released mid-December version 4 licenses of the Creative Commons that's the final I suppose version of the drafts that have been circulated across the year the process thereafter is a little bit perhaps not well defined in terms of time frames as to when the new version 4 licenses will be comported or ported as it's referred to across to the Australian versions but I have every expectation that won't be too long after that the other thing was there is a good conference coming up in New Zealand in Auckland in February the New Zealand Australia Open Research I think that's what's called New Zealand Open Research Conference Matt is part of the organizing committee I think and I'm pretty sure I'm also there as well so if you've got interest in open access and research and how open access is deployed in research the research field there it is up on the screen it's from the thank you for refreshing my memory Matt it's the 6th to the 7th of February 2013 in Auckland so I look forward to seeing some people there if they happen to be listening to this online or part of the participant group currently with us I haven't really got much to add I think there was a question on notice from Catherine Unsworth do the participants get to see those questions or how would you like me to attend to that the participants don't see those questions and I can't see the one you're talking about so why don't you address it okay well Catherine asked if I just briefly go to my email so I'll try to be as brief with the question as I possibly can the query related to the Creative Commons Attribution non-derivative license which as you may recall allows you to freely copy and distribute and display or perform the material and you can make commercial reuse of the material on the proviso that you attribute as you would normally with all the other CC licenses and also in particular do not make a derivative of the material i.e. you do not alter transform another word might be adapt the work you may not build upon it with the exception under the license that it may be incorporated into a collection so for example if a paper was written and somebody wanted to make that chapter in a book you could include the paper in the book provided that it wasn't interfered with in any way shape or form modified or otherwise and Catherine wrote to us saying that the scenario is this that a data visualization tool is being developed that pulls in at this point data collections from four different data sources each of which have aggregated data deposited by members of a number of scientific communities two of the input data sources have licensed their data under the credit commons non-derivative license the visualization tool does not in any way modify the data but instead it combines all four data sources into what might be considered a summary or compilation to display an overview of the current evidence supporting the identification of various gene products across chromosomes such as protein expression modification and disease association with the ability to drill down to the original data and she writes does this constitute building upon the work would you need to request more explicitly permissions to use the data in this way or does the definition around the term derivative in the legal code and the exception for collections allow us to continue using this data in this way even under the credit commons non-derivative license and she goes and very helpfully outlines the definition of what derivative work is which I won't go into and she says I hope this makes sense famous last words as they say look it does make sense and effectively what we're having here is is four data sources being incorporated into one repository and with some things that can be done to that data I confess I ran Catherine because she left an open invitation to contact her and not that it didn't make sense I think she expressed it very well but I needed some further information as a general rule I don't think the non-derivative license is an appropriate license for data you simply can't do much with it and the test for this is is this visualization tool effectively creating a collection if it is then that is a complete defense against infringement of the non-derivative license if it's not it may well infringe I think it's sailing close to the wind my preference would be for Catherine to go to the people supplying the material under the non-derivative license and invite them to make an alternative licensing decision which they can do and in fact this happens all the time and one of the features and strengths of Creative Commons is that the licenses are non-exclusive and the licensor can reconsider their position at any time as often they do with the non-commercial licenses for example where material is made available under a non-commercial license but then a commercial re-user comes along and says I think that's something I could really do with and they contact the licensor and say can we negotiate an appropriate arrangement for commercial reuse so in a nutshell that's where I think that one's at I think I unfortunately don't know enough about the tool, the data visualization tool, I very much like to have a look at it one day and perhaps I can make a more refined response based upon that observation but until then I would much prefer if Catherine and others in this predicament a renegotiate the material or indeed encourage people in the research space not to apply the non-derivative license it's very very restrictive and in fact that's one of the comments that's been made about Osgoal they say our acronym says Open Access and Licensing Framework but most of our licenses are highly restrictive and that's true but we prefer we certainly prefer the least restrictive ones and over time we do and we are actively reconsidering the licenses that we have and in fact I'm looking at a couple of other licenses right now that's good I think that's a good advice given the it's like being a doctor being asked to diagnose over the phone you can give I suppose general guidance I should remind everyone that there's a partnership between Osgoal and ANS whereby research organizations specifically if you're looking at selection policy or these kind of tricky questions about being able to what kind of materials can be used in the research integration kind of scenarios Baden works with ANS on these kind of things, we've got a couple of ANS stuff we work on these kind of questions and if any of the people in the Australian research community would like to look more carefully at the way they're doing things and want some advice and guidance then feel free to contact us and we're happy to work with you on some more targeted questions that's both at the research group area and at level and also at the organizational level policies and selection policies and things like that that's certainly happy to travel to you yeah that's great, yes and Baden is the sort of license pollinating bee that goes around from jurisdiction to jurisdiction seeing as though he's the national program director so he's probably at a capital city near you sometimes seen so be very keen to work in partnership with some of the research organizations to help them make some significant steps alright so there's now Baden if how would people follow you or get in contact in that sense what's the best way sure well the osgole website www osgole.gov.au if you hit the contact us link you've got links to my number, my email my LinkedIn more than happy to hear from anyone anytime as well as that you can follow us on twitter and we also have a LinkedIn group as Matthew said LinkedIn groups are really good spaces to get people who are working on similar issues together and sharing information so we've got that we've also got it Google plus pages and other things like that up and running they take a long time sometimes all these different channels to keep maintained but I guess twitter is my tool of choice at the moment but but yeah free feel free to give me a call on the phone or via email and I can give you a call well those channels certainly open access in practice and you know good luck if you can keep up with Baden's twitter feed is the second advice there if you'll get a lot of very interesting things coming through there from the ANS point of view we the ANS website has a lot of material about licensing and you can always contact ANS through that website and again very keen to working in partnership with research groups and research organizations Matthew if people are interested in working with you or following your kind of thoughts how would they do that well I think there's a twitter account for the malaria project probably Google plus is a good way of doing this so I'm I'm on there just as my name and there is an OSTD malaria account as well that would probably be the easiest way of getting in touch and following what we're doing also it's very easy to sign up as a member of the synaptic community where a lot of the updates are posted so any of those those kind of places would would be fine if there's still if it's not clear about what to do then email is always a possibility and then I can forward some ways of doing it yeah you'll forward that email onto your blog is that right not automatically okay good good well thank you very much we're really absolutely fascinating presentation today and some really good discussion on the panel thank you Baden thank you very much Matthew for making time good for us to see how the information principles that we work with your access to information and access to data and collaboration around information how they actually better down in real research so thank you very much for that and thank you all for participating thanks for the people with insight for questions we will have another series in another item in our licensing webinar series early next year so just watch the ANS website and there's a number of other interesting series here for example our research data management series of webinars if you're interested again just have a look at the ANS website to check out the upcoming sessions thanks for all for that and we'll see you soon thank you and thanks for technical production from Alex Hayes we'll see you soon