 Felly yr ysgolfydd yw'r reu yw yw'r haesiau a'n mynd dweud beth yw'r hyn yn hygynnu y bydd i ddweud. Rydyn ni'n ffordd i'r hyffrnewid a gwahfyr ystafell, magefyd, mae mynd i'n gwneud lleol agos o'n mynd i'r hyffrnewid. Yr ysgolfydd yw'n gwneud lleol agos o'n mynd i'r hyffr initially ac mae'n mynd i'n mynd i'n mynd i'n gwneud, ac mae hyn yn ein bod yn gweithio'n fryb yn eich bod yn gwneud. ac mae'r rhaid i'r rhaid i'r rhaid i'r ffordd i'r wahanol, ond rhaid i'n gwneud i'r rhaid i'r ffordd. Ac yna'r ffordd yn ei ddechrau'r gynhyrchiol, mae'r contentmind.org. Mae'n gwybod, rhaid i'r ysbryd i'r ysbryd o'r ysbryd. Okay. So mae wedi dweud i dda i'r gwybod yn y ffordd yn cael y ffordd ar gyfer y Llywodraeth a mwy o'ch rhaid i'r Yw Llywodraeth, Cereddau i'r yw eu ddwylo, ac mae oedd yn gweithio erioed o ddwylo a'r ysgol yn ymdyn nhw'n gwybod, ond y ddwylo yn ymdyn nhw o'r eu ddwylo. Felly rydyn ni'n gweithio... Oh, ddysg, rydyn ni'n gweithio. Rydyn ni'n gweithio i'r llwy fath o'r pethau cyfnodd, ei wneud â'r fath o'r wneud, i wneud â'r ddwylo a'r fath o'r pethau yn gweithio i'r ddwylo. Dw i gael i ddwy i ddyn nhw ar dwy'r cwysig werthmotig, ac bydd yna bod ti'n gweithio'r gweithio. A dyma, dyma'n golygu. Rwy'r unrhyw o'r obesityd a dyma yw hwnnw, yn gwneud yn gwahysgwysynydd, unrhyw hunain. Rhyw hunain都有 unerhyw hunain, mirwyd cyfraet. Cyn gweld y Llyfridog, ac penderfynodd o'r 10,000 papyrs ond ydych chi'n ddysgu'n dda'r ddysgu'n gymhau. Rwy'n hoffi 20 ysgolion yr oedd ymddych chi'n ddysgu'n ddysgu'n ddysgu'n ddysgu'n oedd yn ddysgu'n blaenau o'r mawr. Ie, mae'n gweithio'n ddysgu'n ddysgu'n ddysgu'n gweithio, oherwydd mae'n gweld yr oeddau, mae'n gweithio'n ddysgu'n gweithio, mae'n gweithio'n ddysgu'n ddysgu'n gweithio. yn ysgol iawn. Rwy'n gweithio'r oedden nhw'n gwneud o'r ddweud o ddweud yn ddweud hynny. Ysgol iawn ychydig 10 oed. Pauli – rwy'n ddweud hynny – gweithio'r ddweud hynny. Mae ymdilyniadau am 10,000 o'r rwyf, ac mae'r ddweud y ddweud hynny'n cynghwyntau. Roeddwch yn ychydig, mae'n 1600 papys yn dweud o'n ddaeth gweithio. Mae hyn wedi'u haes. Byddai'n mynd i'n gynyddu'r maen nhw i'r cyflawn. Mae'r cyflawnio'n gweithio'r ddweud yn cyfrifio'n mynd. Mae'n gweithio'n gwneud gyda'r pryd yma. Mae'n ddod o'r ddod o'r llysgrifol. Mae'n gweithio'n gweithio'n gwneud. Mae'n gallu wneud hynny yn gweithio. felly dyn nhw'n meddwl yn roeddi'r cyfan o'r rhan o'r hyn sy'n ei ddweud. Ond ymddorol ymddych chi'n ffordd gyda'r hyn ymddurol, ac mae'n cael ei ddweud yma sy'n goll sydd ymddych chi'n gael. Mae'n ddweud o'r hyn yn rhan o'r hyn o'r hyn o'r hyn o'r hyn o'r hyn o'r hyn. Felly mae pobl yn dweud o'r hyn o'r hyn o'r hyn o'r hyn o'r hyn o'r hyn o'r hyn. Here's another one. EU clinical trials. There are 400,000 clinical trials published in government repositories, of which the EU clinical trials are one. Ben Goldacre in the UK has been pushing for trials to be open. Many clinical trials are not open and they're not even published. This is what we're pushing for, because if you only have the positive trial about a drug a wedi bod y negatif wedi fod yn gondol i gael. Yn ymddi'r hyn, mae'r gwahau o'r hyn yn gallu unig ar gyfer ac mae'n ffrwng yn y regisgrwyd yng Nghymru yn 2009. Yn ymddi'r cyflogion yn y 6 yma, mae'r regisgrwyd yn unrhyw gydag ar gyfer y maen nhw'n ymddi'r i'r wyf. Yn ymddi'r llwyddiadau ac mae'n syni'n gweithio i'n dweud, ond mae'n rhaid i'u cyflogion gwneud o'r llwyddiadau, gallwch i'n mynd i chi. Felly, eu bod yn sifftu'r nifer, neu yn ystod, yn ymdegwch yn yr ymdegwch ar y clywedau llinigol i ddim yn gweithio ar fynd yn digwydd. Felly, rydyn ni'n gweithio'n gwneud arwain yn ei ddweud, ond mae'r ddysgufyrd yn gwneud ymdeg? Mae'r ddysgufyrd yn gwneud. Yr ystafell ar ddysgufyrd, ymdegwch ar y ddysgufyrd o'u ddysgufyrd. Felly, rydyn ni'n gweithio gwybod i'w Google gyda diweddwyd o'r ddweud yma, ac yn ddweud. Ond, y Gwgol maen nhw'n fagorol. Rwy'r gweld yn gwlad Gwgol gwybod i g wedi'i gydag'u gweld. Rwy'n gwybod i'r gweld, yn y Gymru, wedi'i gydag eu gweld i'r Unedig i'w gweld. Cyngorol, mae yma yn y Europea'r Yn Llyfr. Dwi'n credu'n gweld i'r gweld i gydag, mae'n credu'n gweld i'r gweld i'n credu, neu'n credu'n gweld i'n credu'n gweld. A rydym hynny, os ydych chi eisiau ei ddysgufynio! The next thing is to scrape the papers and we do this off the publishers' websites. Now they say we are not allowed to do it because it's violating something but we are doing it. It's the primary publication. We then index it and you'll see that in a minute. We search and analyze papers and we carry out complex transformations on this. And this is what in the US is known as transformative. ac mae'n ddau o'r ysgolwyr ei gŵr. Mae'n ddweud o'r document a'r adnoddau, ac mae yna'r gŵr yn ddoddau. Mae'n ddoddau'r gŵr yn ddoddau, ac mae'n ddoddau'n mynd i'w ddoddau'r gŵr yn ddoddau. Yn y cwmteniad, y pwyntio'n ddau yn diogel ac mae'n cael ei ddau'u ddau. Mae'r ei ddau'u ddau yw. But most of the rest of the paper is diagrams, or mathematical equations, or maps or tables and that is often the most valuable bit in the paper. And that's why I use the word content rather than text and data mining because if I don't publish it will claim that because this is a creative graphic work it belongs to them and I claim that Mae'r ddweudio'r sgwrn yn fwyfyrdd o'r oran contracts, am gweithio'r ysgolwyr yn y ddiogel, oherwydd mae'r ddweudio ydym yn ei sgwrn sy'n gyffredig sy'n gwneud bod oedd yn gallu'n misgafent, ond yn meddwl ni wedi'i fyndelu wychio mae'n meddwl tuwyng agorogi, mae hyn yn ymryd gennym o gyflym findol. Mae'n clywed greud sefydliadau eich gweld i'r ddod amgeisio cyfaint. Efallai eisiau cyfrïddiaid ynghylch, I put a several years of my life into building this, and my young colleagues have built the same amount again. I'm not going to go through this in detail, but just trust me, this is a complete infrastructure. We've deployed it, it will do everything you need in mining, and it will allow you to add bits on particularly at the end. So if you want to mine for species, you add a special tool at the bottom where it says Amy here, and that will mine for species. If you wanted to mine the proceedings of the European Parliament, the tool would also work for that. It's not stuck on science, it will do really any subject that you want. I'm going to give you a demo now, this is the only interactive demo, but look at that, that's chemistry. Most of you won't understand it, but it's a recipe, it's like baking a cake. We added this to this, we heated it, we stirred it, we boiled it and so on, just as you do in the kitchen. Now that's quite difficult to understand, but I'm now going to go to Cambridge, so this is a real live demo. We're going to put the same thing into Cambridge, and you can do this at home, and we've sent it off to Cambridge, and that is how quick mining is when you're allowed to do it. It's almost instantaneous, and not only has it worked out all the phrases, so I'm going to show here's the wash phrase, and here's the dry phrase, and here's the dissolve phase, but also it's worked out what the chemical compounds are. It did all of that while you were watching, so that's what the power of machines is, if you are allowed to do it. Back to the slides, we've done this for patents, we've done this for half a million patents, and we get a very high degree of accuracy, but we can't do it on the chemical literature because the chemical publishers claim that it's their copyright. Elsevier, if you mine their stuff, specifically says that you must not compete with our products in the marketplace, and one of their products is called Reaxis, which is a chemical database, and so if I sign their permission, which I am not going to, then I would be debarred from publishing any chemistry because it might impact on their product in the marketplace. And this, in my view, is totally unreasonable and why I've challenged their copyright, their text and data mining licenses. We're doing endangered species, I'm not going to spend much time on this, but you can see here the need to put a fact in context, so this is a fact, it's a ribbon seal, and you'll see that the text run is critical to understand it. And we mine facts every day, and this is today's fact, which is E. coli, and very quickly I'm going to show you what we can do about aggregating information. So you may be familiar with the tree of life. Here's us animals, okay, and here is an endangered species, that's a Danish polar bear in Greenland, and here is an experimental animal from one of the labs that we're working with, I won't give names. We want to build a tree of bacteria, so we've taken 5,000 papers, we've read those from the literature, we've extracted the image out of each of them, we've done some, I'm quite proud of this, some very advanced mining of this, because this is all pixels, as you can see. You don't have to understand the details, but the point is we can transform it automatically into something here, it goes through lots of stages, and ultimately we end up with a tree of all bacteria that have been published in the literature. And here's a rather pretty picture of it, so it's pulled all those papers together, but technically during that I have almost certainly violated copyright. I cannot do this work without violating copyright, that's the point. And there we are, we can't do reproducible mining without it. And the Hargreaves legislation, which is very important because it gives us the moral authority to go ahead, it still gives us very little power to publish. We can't tell you what we did, because if we published it, we would publish stuff where copyright was claimed, and that's the real problem for a scientist, so they actually have to say, well we can't tell you how we did it, because otherwise we would be breaking copyright. And that's a terrible thing for a scientist to have to do, not publish what they feel they want to publish. I'm sorry that the publishing industry has been so unhelpful over this. After the legislation in the UK came out, they spent their effort trying to produce licences for Europe, trying to discredit everything that was done. There's not a single major publisher who's done anything positive in here. And I simply use this. If you trust Microsoft, then trust Elsevier. If you trust Facebook, trust Mendele, it's as simple as that. I don't think any major sector can be trusted implicitly without serious checks on it. And there are no checks. There's no government on scientific publishing. So some of the things they're doing, we get masses of FUD and disinformation. The publishers may say that they're helping you, but actually I spent five years debating with Elsevier. It's wasted five years of my life and got nowhere other than FUD. There are monopolies on the infrastructure. I think this is very dangerous that you will end up with something similar to Apple's universal infrastructure if Macmillan and the APIs from the publishers are allowed to happen. And all libraries are persuaded in signing restrictive contracts. If the university libraries, and here I would commend Leroux and Lieber and other organisations, if they can make the members say, no, we are not going to sign these restrictive clauses in the contracts, because they don't have to, then we would be a long way forward. Here's a typical example. Why need to be helpful to researchers has introduced a capture. So every 25 papers you have to type in a capture. And I have a limit of 100 papers a day. Imagine we were going through 100,000 papers for clinical trials, as you've seen. That's three years to go through Lieb widely because you would only be allowed to mine 100 a day. That's the issue. And of course they say, well, we'd be terribly helpful and give you exceptions. But everybody who has dealt with publishers knows that it takes years for them to respond to individuals asking for things. And there are hundreds of publishers. So my last slide, we see libraries as one of the major solutions here. But libraries have got to stand up and fight for our rights. So we're working in Cambridge. We've got some very forward looking libraries there. We're working with Cochran next week on systematic trials. We're very grateful to Lieber, which has done a wonderful job in pushing this forward. And we are part of their future TDM programme and H2020 programme. And that's wonderful. And ContentMine provides workshop and training in this area. So if any of you are interested in having internal or public training, we'd be delighted to do it. And thank you very much for the invitation.