 Mae'r llesor yng Nghyrch gyda Gavyn Brown, y Llywodraeth Maesmae Llywodraeth yma yn y Cymru Ddullianeth. Mae'r llesor yng Nghyrch gyda'r llesor, mae'r llesor yng Nghyrch gyda'r llesor gyda'r llesor. Mae'n gwybod i ddisgu llesor yng nghyrch gyda'r llesor gyda'r llesor yng Nghyrch gyda'r llesor, i'n gavenadau, ac rhaid i'r maesin arall. Rhyw oedd wedi'i gwrthodaeth o'r sefydlu yn y Ffaptiw Ffaptiw, ac mae'n ymddangos i'r rhan o gwybodaeth yn y rhan o'r newid. A rhaid i'n rhan o'r rhan o'r newid. Mae'n gwrthod yma o'r Rhan o Gavenadau. A i'r rhan o'r newid. Dwi'n gweld ei ffordd. Rwy'n cael ei wneud i fynd yn ddweud, rydych chi'n gweld ei ffordd yn y mynd yn rhanol. Rydyn ni'n gydw i'n gweithio'r wneud. Dyna yw'r pari yw'r cyflwyno gwaith. Gareth yn ymgyrch i fynd i'n gweithio'r ffordd. So, Gareth yn ymgyrch yn ymddiadu'r eich Cymru o'r ddau cymryd ar ysgolodau ar gyfer ymgylchedd University of Birmingham, starting in 1998, also did his PhD with Jeremy White, again in artificial intelligence, graduating in 2004. He won the BCS Distinguished Dissipatial Award for his assignment of things to come. He arrived here in 2004 as a temporary Java lecturer, and then got a Career Development Fellowship in 2005, a Lectureship in 2010, Senior Lectureship in 2012, Reaver in 2015 and Professor in 2017. So a rather splendid progression, I think. So, during his time as a lecturer, academic here, Gavin has put himself with full spirit into teaching, a very enthusiastic teacher, working with undergraduates. He's established a really nice culture in this school of undergraduates entering into competitions, and this has now become essentially a self-sustaining culture where our students self-organise and enter into national and international coding and hackathon competitions, serially winning them as well, and that culture is all down to Gavin. Also, he's established our current context for how we do our recruitment of new undergraduates and rapidly revised that. Research-wise, a stream of PhD students, notable, is another one of his students who won the BCS Distinguished Dissipatial Award, making Gavin the only person of both ranks and supervised a winning student. But again, I tend to call rather splendid. Over his 20 years, the machine learning has done a lot of work on unifying various fields, particularly getting a unified understanding of machine learning, a feature selection side, unifying certain of the whole of machine learning, rather good if you just pull that off next week. But he's also branching out from looking at fundamentals of machine learning into application areas, particularly recently with AstraZeneca looking at some very quite impactful stuff in looking at clinical trials and even more recently working with the School of Law looking at predicting and understanding domestic violence reports. So, you know, from fundamentals to impact is all exceedingly good. So today Gavin's going to give us his inaugural lecture on new ways of thinking. Thank you very much Gavin. Thank you Robert. I'm just going to turn the lights down if nobody minds a little bit. You can see the slides. How's that? Wow. Thank you all for coming. You've come from all sorts of places all over the country, so I really appreciate it. OK, yeah, 20 years it seems like it was just yesterday right here in Manchester. So, when I was young, I used to really enjoy playing this game of looking up into the summer sky and seeing the clouds and seeing shapes in clouds and seeing you play that game of trying to see if you could see a shape like a car or an elephant or your favourite superhero or something like that. And if you look really closely and you have the right amount of imagination and you think in the right way, you might see something really exciting in the amount of the clouds. But two people won't see the same cloud in the same way. You might see something else if you have a different background and different perspective. And I think the same thing applies to scientific ideas. So, if you look up into the sky, out into the academic literature and you see one scientific idea, maybe your favourite scientific artefact that you've been working on recently. If you look at it from one perspective, you might see something familiar, you might see something you've seen before. If you see two ideas, if you think about them in the right way, you ignore some details, just look at the big picture. You might be able to build some bridges between them, focusing on the similarities between the ideas instead of the differences. And from there, you might build bridges going on to even more interesting places. So, this talk is inspired by my favourite quote, one of my two favourite quotes, the second one will come to the end. The most important thing in science is not so much to obtain new facts, but to discover new ways of thinking about them. William Henry Bragg, Nobel Prize in Physics, 1915. So, I have structured this talk around three ways of thinking, three strategies that me and my team, who are on the front row here this evening, which is very nice, focused on over the past 20 years or so, I guess, opening black boxes, similarities versus differences, and building bridges. And I'll come into what I mean by them in due course. But first of all, I have to focus on, give you some background into the area where I've been looking for the last 20 years, really longer than that, artificial intelligence. So, this is an area that has become so embroiled in science fiction and media press that it deserves this rather ominous white, glowing white font with a black background. This scary idea that everyone thinks is this terminator future. So, what is that subject? What is it really about? Well, before we get into what artificial intelligence is, let's just focus on what intelligence might be. Now, you know what that is, or at least you can characterize it from your own perspective. Someone is intelligent if they can do one or more of these things. They can see, hear, if you're lucky, both. You can think and reason about the world around you. You can listen and you can move around intelligently without bumping into things. The field of artificial intelligence can be characterised by taking each of these things and noting that somewhere in the world right now, there's a group of scientists and engineers trying to give computers the ability to do these things. The idea of computers seeing the world around them is called computer vision and there are a load of computer vision researchers in the field in the audience here today. Automated reasoning is a field where people try to prove things about the way in which the world works. Reasoning about interaction with the world. Natural language processing. Learning, the idea of computers learning from examples that you might provide to is machine learning and that's the area where I've focused. So I could go into all sorts of interesting technical details about this, but you have to admit there's still a few of us still thinking like this. It's still thinking that Gavin's got a terminator behind the electric terminal. It's impossible to separate these two because Hollywood likes making disaster movies around AI. And me, as a child of the 80s, I grew up on thoughts like this. You know, Star Trek and this quantum leap and movies like War Games and Short Circuit with science fiction, artificial intelligence characters. And I thought it was fascinating, but the reality that has come about in the 21st century and that it's going to continue is equally interesting but just not quite as scary. The most likely place you'll find AI applications nowadays is here on your phone. You have phones organising your photos for you, organising your news, suggesting news stories, suggesting modifications to your behaviour in order to get healthier. This is quite intelligent behaviour coming from your phone. There's a future as well in potentially self-driving cars. These are going to impress a lot at the moment. It's a few years away still. But these are the most visible aspects of AI that are out in the press right now. There are less visible things that are equally important like AI for personalised medicine. That's an area where I have focused over the past few years, where we have AI systems that suggest, predict what is the right medicine for the right person to help them get better and fastest the best possible way. The particular area where I focus, as I said, is this topic called machine learning. Before I get into some details about my ways to think, I want to go into a couple of details about what this subject is and why I find it interesting. Machine learning starts with a computer, obviously. We start with examples of what we want this computer to do. This is the first big assumption, the first myth about AI. You provide this thing with examples of what you want it to do, and then it kind of does that sort of thing. It's not the case that computers and AI systems don't wake up in the morning and decide what they're going to do with their day. Computers don't wake up and go, you know, I'm not going to be a self-driving car today. I think I'll be a terminator and take over the world or something like that. You train a system, you provide examples of what you want it to do, and then it adapts itself towards what you want it to do. Inside this computer is a thing that we might call a mathematical model. To go into a bit more technical detail, we call it a big fat equation. That's pretty much all it is. Probabilities, algebra, calculus, things you will know from school. This is the make-up of artificial intelligence and machine learning in particular now. It's just things, matrix algebra, calculus. It's all that sort of machinery working away. Inside this mathematical model, the thing can adapt itself in response to human feedback. A human supervisor will interact with this and provide a thumbs up or a thumbs down saying, you're doing well, you're doing not so well, that's correct, that's not so correct. This might be from historical data from humans, but that's the basic mechanism. Once the human is satisfied that the computer is doing what I want it to do, the system can spit out a mathematical model and deploy it onto your favourite mobile phone. That's pretty much the mechanism behind most of machine learning ideas, but it's only one of many sub-fields of artificial intelligence. The whole field is just ballooning at the moment. It's absolutely crazy how much it's in the press. You have games that previously required absolutely genius experts and level players that are now being beaten by computers. You have the major companies, Google, Facebook, et cetera, all investing absolutely millions into this topic. I think for academics, the most visible part is this. The number of attendees at conferences is almost doubling every single year at the moment. 8,000 or 9,000 people at the last gathering, whereas five years ago it was under 1,000. It's crazy. With that as a brief overview, I want to go into my three ways of thinking. The very first one is opening black boxes. I had the good fortune to start my career here at the University of Birmingham. I started my undergraduate in 1995, went on to my PhD in 1998, and met lots of amazing people who really changed the course of my life and influenced me in different ways. Some of them are here today, some of them not, unfortunately, so I thank you all. But chief amongst them was one person. When I rocked up at his office, not this person, this is data from Star Trek, but chief amongst them was one person. And when I rocked up at his office in 1998 saying, oh, robots, AI, oh, yeah, it's amazing, brilliant. Did you see that TV show last night? Oh, I'm going to build that. He said this, think harder, open the black box. Jeremy was my PhD supervisor, he sat over there, tore Menton later on if you want it. And he really set me this challenge of getting over my own self-obsession with AI and this magical thing. And he set me these questions, why, when, how, where, why does the thing work? Why are you fascinated with this thing? Instead of looking at the surface, the behavioural properties of this thing, delve into the guts of it. And this is really something every good academic does. It's not special to my particular career. Every good academic delves into the details. So I want to tell you about my first black box that I encountered. So we start with this mathematical model where we just look inside it. Let's imagine we set a challenge for this mathematical model of asking a very difficult question, such as some sort of advanced medical diagnosis or some sort of self-driving car system, set a really hard challenge for it. So one option here is to make one very clever model. I'm using the analogy of one very clever person, Stephen here. Unfortunately people like Stephen Hawking are very rare. And it's very difficult to build really impressive, big mathematical models. So an alternative is, if we keep with the human analogy, to build a committee of models, a committee of models that would vote on what they think is the right answer, the right direction to go. And so overall you might hope that you'll end up going the right direction with this group of people. But the thing about a committee is that it's only as good as the people that make it up, only as good as the components. And so we know that we don't want all the components to be identical. We know we don't want components to be like this. By Boris Johnson's is a rather terrifying prospect. And it's likely a wrong decision will be taken. There's no benefit of having multiple things for all the same thing. We also know we don't want this differences for the sake of differences. There's no point in finding those differences. The right amount of diversity as it were comes from good communication within a group. And this metaphor, this way of thinking about committees as communicating is effectively what we focused on during my PhD time. So obviously we didn't make committees of people like this, but we instead focused on neural networks, or if you're born from 2000 onwards, deep learning. This fascinating new field everyone thinks is amazing and new, and in fact it's been around for quite a while. But before it was cool we were playing with it and we made these groups of learners who interacted with each other. So going a little bit technical now, the algorithm that we were working with was this negative correlation learning thing. And it had this loss function, this cost function that defined the way it worked. And it has another component. So this is the prediction of the first network and the target aiming to approximate and then the average prediction here. And there are two components to this algorithm. Forgive the technical stuff for now. But the system tries to minimise this first term and also this second term. And by minimising this first term it tries to get the individual network to be as accurate as possible, as close as possible to the target D. And by maximising this second term, because we've got a minus sign there, it tries to force the individual components of the committee to be as different from each other as possible. So what we were left with was this algorithm that worked very, very well. But we didn't know why. And the people that had published it, one of whom was actually one of my most supervisors at Birmingham, we just didn't know why the thing worked. So Jeremy sent me the challenge of why does this thing work. So we opened the black box, we dug inside and we did lots of really nice few, really lovely stuff linking it to previous statistical theories that were out there in the community, including the 1952 Nobel Prize in Economics, weirdly, managed to get citation into that. Bounds on the parameters and ways it works and new theories generalising it. And, as Robert said, we won the 2004 Distinguished Distantional Award in the British Computer Society, which is very nice. We went down to the Royal Society and had this nice party in 2004. But shortly after this, I had the good fortune to move to Manchester, recruited by Graham and a few other people round up there and I, and a few other people to be a temporary lecturer in Java programming. It was fun. But then shortly after that, I was further recruited to become a permanent, I was only here about two months before Ian Watson over there started prompting me saying, has anyone spoken to you about permanent position? And then Steve and some other people got me to stay on permanent. Which is very nice. By coming here, it turned out to have some good positioning for me. I met lots of amazing people who really have again changed the course of my life. So here's some of you, thank you very much. And here's a whole bunch of us at graduation ceremony. It's very nice. I also, as a result of coming here, I hit my all-time career high when I was interviewed on Children's BBC. That was a serious event. That was so much fun. Being asked to judge children's pictures of robots and saying whether I thought they were plausible or not was genuinely fun. So, yeah, that was great. But another thing about coming to Manchester is fantastic PhD students. So I had some great ones. Here's some who have gone in years gone by who helped develop new methods on this same line of thought that we were doing. Really deep new insights which fuelled later work. Manuela here produced some work that fuelled stuff that I'll come on to later on. Richard here, who does something very impressive, he's around here somewhere. Do torment him later as well. He took this concept that I had been assuming was one thing. And I thought I had answered all the questions about it with my PhD. And then he opened the black box even further and showed that it was in fact equivalent to something else. Margin theory that appeared in the literature many years before. So it was even deeper stuff going deeper into that black box. I've also got some PhDs in progress following up on these ideas further. So Henry, who's around here somewhere as well, who I've just donated to the University of Birmingham. So I feel I'm repaying my debts by sending him back there. He's got a fellowship going there starting next week. And Georgiana and Charlie, who are around here somewhere over there, doing fantastic new work looking at biological plausibility for these ideas. New directions in modular learning, which is what Charlie's doing. All in the context of some nice funding which the UPSRC decided to throw us last year. So we're also hiring so if anyone wants a job. Come and join Andrew Webb, who's over there, and develop some new stuff with us. But this brings me onto my second way of thinking. We talked about this idea of just delving deep an opening black box. So now I want to talk about this second point. Similarities versus differences. And what I mean by this can be really characterised by a phenomenon every academic in the audience will know. This thing of publish or perish. There's this overwhelming variety of papers and huge numbers of papers that just keep appearing. And it's really difficult to keep up. And it's very intimidating, especially for new PhD students to find their place amongst this jungle. And it's very, very hard to distinguish yourself in some sort of way. So one approach to doing this is to look for so-called differences from some existing paper that you find out there. And you could ask yourself what's the minimum possible change that we could make to this scientific idea that would get us a new paper. Attacked it known as salami slicing, sometimes, or the minimum publishable unit. It's useful for getting started, but it doesn't often yield the deepest of insights. It can sometimes fuel radical new directions, but I guess they're quite rare. They require quite special people. I feel, at least, that it's more rewarding to look deeper and think about similarities between ideas. So I like to take two seemingly disparate parts of the literature. And instead of trying to extend this one, I like to see what are the bridges between them, what are the links, what are the similarities between them, how can we build, how can we interpolate between these ideas. And my first exposure to this concept kind of came in a meta-level fashion when, in 2005 or 2006, I can't remember exactly when, I was encouraged, forced to share an office with this chap here, who's around here somewhere, where are you? There he is, hiding. So thank you to Hilary Karm and Steve Herber again for putting us here in an office together. It was a long shot at the time, encouraging new staff to share offices. But he used to call me his second wife because he saw me more often than he saw his own wife. We were new academics, really trying to make our mark, and we spent many, many hours in that office together. But he was, and is, a specialist in this area called multi-core computing, high-performance computing systems. And he used to sit on the other side of the office saying stuff like multi-core, speculative threading, cash coherence and protocols to me. And I'd sit on the other side of the office going base theorem, gradient descent, feature selection and machine learning words. And we really had to focus on our similarities rather than our differences. We really had to think about how could we work together rather than hating each other. But it turned out quite well. We worked together for a while and we applied and EPSRC worked kind enough to shower us with money. So we got a number of grants including, we were included in a very early grant that Steve led, the so-called platform grant. And that gave us a really good platform, a launch pad into our careers, a kind of security. So thanks again to Steve for that. But this project here that we started on, I want to go into a little more detail about that. And it was a number of different areas, but one topic in particular we ended up focusing on. And it can be summarised with this metaphor. So have I asked you to guess the price of a car? What's relevant in that particular task? What matters? You might think, or you might be able to say quite straightforwardly, what matters in guessing the price of a car? You can say the engine size, the make or the model of the thing, the top speed of it. Lots of different things are relevant. What's irrelevant? What doesn't matter? You might think the colour, whether it has some furry dice in the window. Some stickers, all those things are probably irrelevant. But there's a third class of things around this task. If you take the age of a car and the mileage on the clock, it's probably the case that you don't need both at the same time. You can probably guess one from the other. So they're kind of contextual or redundant in the context of each other. If you have the age of the car, you can probably have a guess as to what the mileage would be and in turn guess what the price of the car would be. If you have the mileage, you probably don't need the age and you can probably guess what the age would be, what the other one would be and what the price would be. So with these three classes of features of a car, you can probably pick out these relevant ones, pick out one from this contextual set and tell me what your guess would be for the price of the car. Maybe there's one on the left. But you can do this because you know what a car is. You know how cars work, you know the system. What about if I ask you what is relevant, irrelevant and contextual or redundant for diagnosing the recurrence of a lung cancer? That's going to require a lot of medical training and expertise to pick out from this massive measurements that could be taken the small set of relevant things that really make a difference in diagnosing this. Which bits of your genes and lifestyle matter the most? What's the irrelevant, relevant contextual? The particular task that Michele was interested in was a bit more technical. Some speculative threading, this word coming back. Basically given a big complicated computer, what matters for this task of speculative threading? You don't need to know what it is but it's something to do with multi-core computing and what matters, what is relevant, what is irrelevant and obviously if you're not an expert in this particular area you just won't know and there's an overwhelming variety of things you could measure that make it more complicated. I had a student Adam here who worked on this topic between the two of us. We used to enjoy tormenting him. We used to pin him from opposite sides of the lab, come in and ask him questions on both topics, machine learning and computer systems. But the topic we focused on was this generic scientific problem of what's relevant, what's irrelevant and what's contextual. Let's give each one of these features a score to answer this question. We could characterise the score of the feature that is how useful it's going to be to us as how relevant it is, measure some sort of relevancy and then subtract away this redundancy, this contextual bit. Therefore if the contextual redundancy bit is really big it's going to subtract off from this and the overall score is going to be quite low. You could do it another way and you could say relevancy divided by redundancy and if this denominator is really big then overall this number is going to be really small so if something is highly redundant it's going to make the score very low. Third way is you could plug together bits and pieces like this and think about compatibility of features. So you could plug together different equations that come up with giving scores for features in which overall would tell you which ones you need, which ones are relevant. Obviously we didn't just think about the high level stuff. We used a mathematical framework known as information theory to quantify this. That was fun getting into that and that has turned out to be an incredibly fruitful area in my career looking at this type of theory. But it turns out that we weren't the first people to start thinking about how to plug these things together. It turns out that over the last 20, 30 years or so loads of people have suggested different scoring measures for this problem and people have done it in all sorts of machine learning conferences throughout the early 2000s. They've proposed new scoring measures in network security, in electronics, in bioinformatics in medicine. We're left with this question of which of these should we believe? Which ones are useful in which context? How do they relate to each other? There was very little cross comparison at the time. So what we did is write them on a whiteboard and this is our whiteboard in the office in July 2009 and that was pretty much me thinking very hard about these equations and this is really what we did. Wrote them on the whiteboard. Michael will recall me sitting staring at that whiteboard for hours on end just looking at them. And what we were trying to do is identify from all of these equations that are up on the board what was the commonality? What were the similarities between them? What was the root method that could explain all of these methods that were out there? Eventually, after a little while, we managed to do it and we published a nice paper. It turns out that this thing, this probabilistic model as it's called and a new proof explained exactly where all these things came from and we published a nice paper eventually after a low correction in 2012. So that was fun and indeed Adam won that same Distinguished Distation Award in 2013 which is very nice. Another book for the Society. But all of this came because we were thinking about the similarity between the ideas looking at what was out there and thinking how could we build upon that by linking ideas. And this turned out to be a very, very good foundation. It inspired a whole new generation of research, cost us here has generated a tsunami of new papers. He's down here, so I'll format him later. Extending this in lots of different directions and in particular, he's worked over the last couple of years with AstraZeneca and he's got himself a Data Science Fellowship funded by them. We're working with Jim Weatherall here in the head of the Marx Analytics for AZ and we are looking at the challenge of extending this whole framework looking at personalised medicine but in particular looking at two types of features. Ones that are useful for assigning treatment, saying what pill you should get and one for assigning care protocols. Never mind what treatments you give, how should the person be looked after long term, what's their prognosis, what's going to happen even without treatment. And this in turn has fueled new collaborations. Caroline Jay here unfortunately couldn't come along and Alan Davies and cost us a few days down there somewhere are all working on really fascinating areas around clinical trials and working very closely with AstraZeneca. It's very, very fruitful, I like it a lot. So this was kind of one arm of the research I've been doing for the last few years and I'll carry on doing it. But then there's been another arm of research that concluded very recently but this same strategy of looking for the similarities. Nikos who's down there has been working on an idea called boosting. It's a mathematical framework within machine learning. You'll recall earlier on we talked about having some examples of what we want this machine learning system to do and providing it to this mathematical model where it adapts itself. Now the idea of boosting recognises that this mathematical model is not perfect. So it will make some mistakes. So we can check where this system makes some mistakes and the logical thing to do next is to build another model that corrects the mistakes of the one that came before. And you can continue this chain, each model correcting the mistakes of the one that comes before it and end up with a set of models. And then you can use those as a committee to combine their predictions and go further from there. And you can see that obviously the first model there might make some mistakes that will be corrected by these letters. But what was left out of this framework in the relatively original form was the idea of so-called costly mistakes. So if we imagine we deploy some sort of artificial intelligence system in the future and it makes a prediction about the world in front of it and it makes a prediction that says maybe it predicts that this thing is a pussycat. I don't know if you can see it from back there but that's kind of a very very furry dog. And so the consequences of that, what are the consequences? Well you know if it's a nice dog you might go hey no problem man, you know that's chill out, you know no problem. That's not a problem. However if you made the prediction here the consequences are going to be a bit bigger. Getting the head bit off the robot. This might be a jokey way to explain the idea of costly mistakes but they happen in real life. If you have a medical scenario and you predict a disease is going to come back and you are wrong well there will be some treatments and the person might have to go on to go a period of time in hospital and it will cost a lot of money, fair enough. But if you predict a disease will not come back that's going to cost a lot more. It will cost the person they love potentially. So you want to avoid this second. And so the question arose as to how you should do this challenge of cost sensitive boosting. How you should build this chain to do this. And it turned out that there were a lot of people that have suggested modifications to the original algorithm. They had attacked different parts of this pipeline modifying in different ways saying that we should change this bit and this bit and each had their own suggestions their own direction from which they approached it. But what was quite surprising is that there was very little if any cross comparison between the methods published from 99 till 2015 or 16 on this. And it turned out by looking deeper that several of these were exactly equivalent to each other. I mean not even just they look similar. They are the same algorithm but just written in different notation and nobody realised. And by looking deeper we saw that. So the question arises again. Which of these should we trust in what situation? How do they relate to each other? So the really nice thing about boosting as a framework is that it's a very, very elegant and deep rich theory and it has a lot of explanations as to why it works. And each of these mathematical frameworks around the edge different areas of maths and machine learning you can start from any one of these mathematical frameworks independently of the others and arrive at the same place this in the middle which are the equations that define the boosting algorithm. So you start from any direction you want and you follow the rules of this framework and you end up in the same place all roadly to road as it were. And that's really surprising but that is one of the most powerful parts about this algorithm. So what we thought is given that each of these frameworks can be used to derive the algorithm if we're going to modify this thing in the middle if we're going to change it such that it is not the original boosting but it's some kind of cost sensitive thing we should maintain consistency between the frameworks. Now that's just a theory we didn't know whether that would be true or not so what we did is check whether each of the theoretical frameworks still held for some sort of new algorithm if we modify it here will it still hold for these four frameworks and we expect to get four tips in the boxes. And so we checked the original algorithm whether all yes of course it satisfies its own properties but then we looked at the rest of the literature and found that that original was the only one that stuck to the rules that still stuck to the frameworks that explain it and all the others given the modifications effectively broke consistency with these theories. Now that's not necessarily a bad thing it might be that there's some deeper theory behind it that explains the cost sensitive thing and the non cost sensitive thing we didn't know but we took this as a hypothesis and did a whole bunch of experiments testing all these algorithms on loaded different data sets and found out that the winner overall was the original 1997 algorithm so it was just a theory but turned out to be quite nice and we published this rather controversial sounding paper cost sensitive boosting algorithms but the editors actually encouraged us to use that title and the spoiler if you haven't already figured it out is that no but that was quite a nice conclusion to come to and Nick Oscar a nice PhD after that and we worked with Peter Millis and Nara here Nara, former postdoc and Peter from the University of Bristol and then came Sarah down here same sort of tactic completely different area this challenge of feature selection that we talked about figuring out given a whole pile of data what matters and what doesn't matter what's relevant and what's irrelevant for a given problem so if I have 27 let's say patient health records this is quite a small number but health records are difficult to get hold of people don't often donate but if you have such a small sample and you ask the question what's relevant for predicting a particular disease and then you run your algorithm whatever it may be and it says this thing this particular measurement that is relevant and all this stuff is irrelevant that's quite a strong conclusion to make from such a small sample so let's say we perturb the original sample a little bit we hypothesize in an alternative universe that 26 patients turned up on the day we just delete one person from the pile or maybe we perturbed them in other ways now we run the whole system again and maybe it says oh no no wait a minute that thing is now relevant changes its mind completely now that could tell us something quite important in that maybe that one patient was critical and so should be investigated further but it could also tell us something else about the algorithm in the middle that's detecting what matters if we repeat this many many many times and it keeps changing its mind no matter what and you keep changing this original sample dropping people out changing them in subtle ways that really shouldn't matter and it keeps changing its mind saying what's relevant changing its mind all the time then it's probably not very reliable but this is the area that Sarah focused on how should you measure how should you quantify this stability so it turned out that a lot of people have proposed measurements that should be used in this area lots of different perspectives some conflicted opinions on what was useful in what scenarios but very interesting literature to delve into and there was one thing paper in particular that stood out from the crowd this one by Concheva 2007 and Lucy is over up there somewhere and this for us looking at the literature was really a breath of fresh air in the way she approached the problem very elegant so she posed the question should we invent another arbitrary new measure of stability no instead we should state the desirable properties axioms that this thing should have that should follow and derive one and so that's what she did state some properties and derive whatever followed from those very nice but there were some cases where this didn't always work so we took this analysed very deeply and we found so a very simple time which is always the case in certain algorithms and some of these properties turn out to conflict a little bit in rare but important cases it was a fantastic platform start from for Sarah and she met Lucy at a very early conference in her career and I can say that she definitely was inspired by that meeting Lucy Lucy was kind enough to come over last year and be the external examiner where she proposed a new set of properties which generalized Lucy's work out to cope with this variation in the number of features and we checked again the literature for what measurements had these properties and found that nothing had everything we wanted so we derived a new one which had everything that we needed and it followed it turned out to be a generalisation and had some new and interesting properties and we have published a nice paper you can find it on the interwebs next month so this brings me with this thought to the last of my three ways of thinking the opening black boxes was really something that every good academic does take something that was complicated that amazes you on the surface the shiny magic thing and rip it to pieces delve inside there ask why it works everybody should be doing that everybody does that the second is similarities versus differences that is everything's connected in some way or another and if you squint and look at things in just the right way you can build ridges between areas if you focus and so here's my final building ridges again by my second favourite quote opportunity favours to prepared in mind by Luke Pasteur really like this quote and here are two opportunities that came along because my team was prepared another good collaboration that is just about to launch a choice already happening really there's this really nice piece of work that Michele's team have done recently on something called memory caching protocols policies making effectively your computer memory more efficient the computer effectively predicts what you want from the memory before you even realise you want it yourself I like to think about it as a super efficient super library if you imagine you go to the library and the library has already got the book you want in the hand before you walk in the door that's effectively what this does and it did it in a very particular way the hardware overhead that is the amount of electronics you have to build into the thing by five times which is very impressive but again they didn't know exactly why it was so successful there was some real gut instinct in engineering intuition that went into it and it works amazingly well but sometime towards the end of Mohsons PhD they asked us why does this thing work so what we've started doing recently is building bridges a framework that Nikos was doing and asking why does this thing work given that other framework and Nikos has been working on this thing a new theory for why this method called happy works very well a machine learning perspective and it seems to explain it so far and it's suggesting some new directions which it's going another opportunity that has come along when I was over there and we've been working we've got this very nice funded ESRC project working with Greater Manchester Police and the challenge here is really quite serious one and I never imagined that I would be working with a school of law but it turns out that these sort of chances opportunities do come up it turns out that there are 63,000 domestic abuse incidents in Greater Manchester every single year for the past five or six years and this is a very big problem obviously this is just Manchester as well and the police and social services have a number of processes for risk assessment where they try to say is there a risk in this household is there a risk in that household on the basis of that makes social service interventions or something like that to perhaps offset potential problems further down the line one of the mechanisms they use for this is a thing called the dash questionnaire domestic abuse stalking and on a base violence and a number of questions are asked of the victim and of the perpetrator when on the site where the incident has occurred and obviously some of these questions 27 questions are very very serious and there are there are very distressing situations and it's not always possible to know the truth about perhaps someone's background and it's not always perhaps desirable for a person in a distressed situation to tell the truth so there might be an incentive to not tell the police officer the truth about someone you love someone you love and that's the awful situation to be in but when these when this data is recorded questions like this one become what are called under reported questions in the true incidents of the answer yes for this particular question is likely to be higher than is actually reported in the data so what we started doing is looking at the data and let's say we have a data set so we measure the correlation between has someone has this person ever hurt the children and is there a recurrence of violence in that particular household we measure that correlation and it comes out from the data the answer is no but this is against other evidence that may be available so what can you do go back after victims the truth did you tell the truth no you can't do that so what we've done is build bridges out towards work that Costas has been doing recently developed for the medical scenario the biomedical all the work with AstraZeneca and that fueled from all the work on feature selection which came out of the work with McKell many years ago this has built bridges and we're asking how can we correct the data but without ever going back to individuals and asking them another question correcting at the population level incorporating knowledge from other sources and it turns out it's possible to do that and if we ask that same question again at the data we get something that is more expected and on the basis of this perhaps policy decisions might be taken on the way in which resources are distributed within police forces so it's a very important problem to solve so this brings me to the end my conclusions and my three ways of thinking black boxes why does that thing work instead of thinking it's nice and shiny delve into it similarities and differences everything's connected and the last thing building bridges opportunity phase of the paired mind think deeply enough about these first two things and we're really finding the opportunities keep coming along and the important thing in science in my last two or three minutes is not so much to obtain new facts but to discover new ways of thinking about them and to thank those who help you do it I've said a few thank yous on the way because you're in this last 55 minutes that you've been sitting patiently before your champagne upside but there is a team that I've mentioned a few times that I want to say a special thank you to I've had some amazing interviews here a few of the very first ones I had I mean a Manuela Richard and Adam the others can be here Richard is down here some later ones Postdocs, Naracostus, Sarah and Nicos are all here as you can see I've got into a habit of taking photos of them in distressing situations in front of whiteboards I like this one in particular like all that math is just pouring out of her brain and yeah I mean I think some of these really take photos of people in the most distressing situations often to prove something on a whiteboard it's fun there's also the current generation haven't managed to capture them in front of a whiteboard yet but I'm sure it will do some time quite soon and of course we are part of the wider machine learning group here we are at one of our Christmas doos and here's another one we have a lot of fun to get to thank my wider colleagues around the world who've got me into some interesting situations over the years so finally I want to say one final thank you since it's my inaugural I get to make a fool of myself and I want to present to you my career heroes people who have taught me five lessons five people who taught me five lessons and without whom I wouldn't be here first is my parents thank you my mum said to me years ago do your love and I'll support you unconditionally and that made a difference that set me on this path the second person James he's round here somewhere I spent many years when I was about 16, 17 years old hiking around the school playground we've known each other that long talking about neural networks with him at that age and that was fine third person you've met him already Jeremy he taught me that very simple lesson think harder open that black box the fourth is really two people Lucy and Fabio now these two people taught me something very important something I hope every young PhD student will get a chance to learn about collegiality if you have someone that will open the doors for you encourage and support you the final person you've met him already again Michele he taught me something very simple a ruthless almost blind optimism yes we can the end