 Right, thanks for going to talk. First of all, can everyone hear me? Okay, it's like wave at the back, you know. Cool. Okay, so I'm Emma, I am the lead python developer at Cambridge Medical Robotics, we're a start-up working on developing the next generation in robotics for keyhole surgery. That is pretty much as cool as it sounds, to be honest. That is pretty much as cool-as-jobs it sounds to be honest, so if you are interested, we are hiring, I'll put a link up at the end. So, how do we get from that to the topic of this talk? I'm also on the organisers of the Cambridge programme study group. We meet up about once a week to study topics in computer science. About a year and a half ago we were looking at machine learning and in particular genetic algorithms and how we could use those to evolve some code. Having managed to get that to work, I pretty much instantly became fairly concerned about my job security. So, if we think about it over the course of history, the kind of way we've lived our lives and the kind of work we've done has massively changed the result of technology. So, for instance, in agriculture we used to have large majority of the population working in the fields, sowing seeds, harvesting crops. We developed things like this and now we've kind of freed up most of us to work on other things. A more recent example, I'll be walking into a supermarket. I'm sure you've all seen rows of these self-checkouts where you might have had lots of people working at Till's. Now you can just have a single person staffing these. You might think, OK, well, that's quite repetitive work, things that are easy to automate, but increasingly, with artificial intelligence, we're seeing even higher-skilled work coming to the firing line. So this is IBM's Watson machine. If you've not heard of this before, it was developed a few years ago as a supercomputer to compete at the US quiz show Jeopardy. This is a contest in which the contestants are given the answers and they have to respond with what the question was. So figuring this out involves kind of a mixture of natural language processing, having a big store of information or general knowledge that you kind of retrieve data from, and also kind of applying some machine learning techniques to figure out kind of connecting all this together. So this is definitely one at Jeopardy. It's defeating the human champions at that. And it's now been applied to some other areas as well. So we have things like medical diagnostics, where there's a huge volume of information. So new clinical studies being published all the time, journal entries, the patient's own medical history. This is a lot of data for a doctor to deal with when diagnosing a patient. This machine can instantly have all the latest journal articles. It can read them all in a time that a doctor simply wouldn't have and have all that information available to it. So it's potentially able to make more accurate diagnoses, particularly for uncommon conditions that a doctor may not have come across before in their working lives. You can also apply it to things like legal work. So preparing a case for a trial, you have lots of prior judgments to trawl through, lots of legal studies, that kind of thing. It can also be used as a teaching assistant in the classroom, because of its natural language processing abilities. You can understand questions that students are asking and come back to them with their answers. So now we have people like doctors, lawyers, teachers, also having their jobs coming into Fire and Iron and this rising technology. It's all about our software developers then. So you might think, okay, hopefully we're okay, hopefully we're needed to write the code that makes this work. But as I said, it's looking at ways that you could generate that code rather than write it yourself. So how might you go about that? So one approach you could take is something called a genetic algorithm. So this is a type of guided random search algorithm. So it's opposed to a systematic search where you've got your kind of data structure and you're going to look through it, this starting one end of going through until you find what you're looking for. Or a random search where you might say, okay, I'm going to look over there, now over here, maybe over there again, have I found it yet? Nope. A guided random search is where you look in a few places and then based on what you find there, you make a decision as to where you should look next. So this makes some assumptions about the data being in some way sorted or continuous around the points of interest. Okay. And in particular, as a genetic algorithm, it's a guided random search algorithm in biology, particularly from evolution, where we have this idea of natural selection and that if you have some characteristic that makes you more like to survive, say you're quicker at running around from tigers or something like this and we have survival at the fittest, you're more likely to pass on your genes to the next generation and those features will get propagated. So how do we apply that to a genetic algorithm? So if we take the example of evolving a string, so not programmes yet, just a regular text string. So say we want to evolve one containing the title of this talk. Okay. So what we're going to do first is we're going to generate some initial population strings, so 100 of them, 1000 of them, just completely random strings or different lengths, different characters. We're then going to evaluate those and see how much they look like the target we're trying to get to. So this is where our analogy of evolution has somewhat completely gone out the window because we now have an end goal in mind and we're going to construct our fitness function so we are steering this algorithm towards reaching that. Okay. So having done that, we're going to select which members of this population should survive and pass on their genes. So which initial strings looked the most like the target? Okay, and naively we might just say we're going to take the half with the best fitness score, just the top half and throw the rest away. We're then going to cross them over so this step is going to be analogous to you getting half your chromosomes from each parent. So again a simple solution we might just say I'll take the first half of one string, the other. There is then a mutation step, so in the same way as some genes get a random new mutated during reproduction to introduce new characteristics, we're going to pick a random character in this string and then just change it to a different one. Okay, so we've now got a population consisting half of the fittest individuals from the previous round and half of their sort of children as it were. We want to now check how we reached our target. If so, great, we can stop. If not, we're going to loop that round. Okay, and that's the general flow of it. So let's try and run that then. Okay, so what we're going to see when we run this is that for each generation we're going to print out some statistics about sort of the fitness scores that we've got and we're going to print out the best string that we've got in that generation. Which to start with is probably going to be something that's about the right length because in a collection of random strings that would be the best. Okay, so as this goes we should see some features that start to emerge, the spaces go in the right places, the characters get correct and indeed we've got to their talk title. Okay, so that's all well and good for strings but now so we want to evolve programs. So there are some extra challenges here. So whereas with string evolution I could just dump some random characters into a string and it's a valid string. It might not be very close to what I want but it is a string. With program evolution if I just put a load of random characters into a file the odds that's going to be a valid Python program a valid C program or what have you are pretty slim. So there are some rules about syntax and things that we're going to need to conform to. So you might think okay well I'll have some templates like this thing looks vaguely like a four block and I'll have some blanks and I can fill those in and I'll have templates for if statements and that sort of thing and I'll combine them like that. And fair enough you will generate your initial population will have lots of valid programs in and that'll be fine. But when you come to cross those over and to mutate them you'll very quickly get back to invalid stuff again. So you might say okay well to make crossover easy I'll have my own language and a load of symbols and you'll be able to put them in any order you like. Okay and that might work the problem you're going to have there is that we want to be able to program arbitrary things in this language that we're going to be evolving we don't want to limit what we can create. So what we need is for the language to be something called Turing Complete and the idea and the language that you kind of naively make may not be so. So what do I mean by something being Turing Complete? Okay so Alan Turing did a lot of work on the theory of computation and in particular the idea of reprogramal machines that could be used to compute arbitrary stuff. And so you count this concept of a Turing machine as a sort of model to think about this to help him mathematically reason about them and all it is is it's an infinitely long piece of tape it's divided up into a series of cells you have like an arrow that points at the cell that you currently look at and you can move that arrow backwards and forwards and in each cell you can put a symbol in there you can read out the current symbol or you can change the value of that symbol. That's all it is. It's a very simple device but it can be shown mathematically that anything that is computable and not everything is there are certain things that are mathematically impossible to compute but if it is computable it can be computed on this device. Okay and when we say a language is Turing Complete all we mean is that it can be used to simulate any single-taped Turing machine aka anything that's computable you can compute in that language. So it enables us to program whatever we like basically. So those are our requirements then we're looking for a language that's very synthetically simple and it is Turing Complete. So at this point I thought well that probably already exists let's not reinvent the wheel so like any good developer I googled it came across this site Cory Becker site primaryobjects.com this is a great site she's got lots of articles on machine learning one of them was a project where she'd been working on pretty much exactly this problem so evolving code rather than writing it. So what is this language used? So it's got only eight characters and it's pretty much a model of a Turing machine. So the interest of not getting myself thrown out the conference for swearing repeatedly for the rest of the talk I've left the name of it slightly as an exercise to the reader but I'm sure you can guess. So the first six characters are pretty much just a Turing machine so this is moving the pointer backwards and forwards with the value at the cell reading it in and reading it out and putting a value in. The last two characters, the square braces are where things get interesting. This is what gives us our control flow. So the ability to say based on the value I've got here I'm going to make a decision, I'm going to keep looping or I'm going to break out of that loop. So that's what allows us to give these languages power. So what does a program in brain F look like? So this for example is Hello World so your classic first program just print Hello World to the screen. You can see this is a fairly steric language you wouldn't really want to write anything complicated in it particularly when you get past just printing output it gets rapidly quite complicated keeping track of what you're doing. You certainly wouldn't want to try and debug or maintain any code written in this. But to a computer this is just as intelligible I'd say if it's a Python code, if it's a C code it's fine like this. From our point of view we already know how to generate strings we already know how to cross them over and you take them and this is just a string so we can apply all the same techniques. So let's try then and evolve a program in this. So I was going to do Hello World but I realised there's probably not time in this talk for that to converge on a solution. So we're just going to go for a slightly abbreviated one it's just going to say Hi. Okay. So what we're going to see is for each generation we're going to print out the output of each program and we're also going to print out the program strings so we're going to have the actual brain F code itself for the best individual in that generation. Okay. So we have some new issues to contend with here. So we now have the potential to have an invalid program so while brain F has very few syntactical requirements you do need to match up those square braces so if those aren't matched the program's invalid. We also have the potential to time out so we can have a valid program that gets stuck in an infinite loop so we need some cutoff point after which we judge that as an error as well. We also because real machines don't have infinite amount of memory we have to use a finite length tape rather than the model's infinite one so our program might go off the end of that memory in which case again we need to notice that so all of these cases we spot we give them a very low fitness score so we try and weed them out so they're not selected to the future generations. The next thing to note is that we get quite close to the target output and then don't seem to get very much further on the string evolution where the thing we're evolving and the thing we're measuring it's fitness of all one and the same here we're measuring the fitness of the output but we're actually evolving the program string so while the output might be very close it might require quite a big jump in the program code to get to where we want so it might be stuck in a slightly awkward loop or require quite a big mutation and so to try and help with that we've increased the number of times the mutation will allow so rather than just having replacement you can now sort of have mutations or delete them and in terms of the replacement one rather than just being a single character that we might flip we're kind of going to go over the whole program string and maybe change any of those we also need to change our selection metric so if we just selected the top half that would be a bit too elitist we'd be very narrowly driving towards something that looked like the output start with and sort of cutting off avenues that didn't look so promising originally but would come back round so what we want is some chance to include a sort of lower fitness score so that we can increase our chance of getting out of these local minimum work that's stuck in so what we're going to do is rather than selecting the top half we're going to have sort of like a roulette wheel spinner so if you imagine like a pie chart where the size of each wedge is going to be bigger if this individual has a higher fitness score and then we're going to spin it and sort of select one so lower fitness individuals may still be selected but with a lower probability and so hopefully we will then actually get to the solution running a random demo is a bit of fun on that's fun a bit of fun on a presentation one thing you will notice is that some of these programmes are quite suboptimal so you'll see lots of pluses in a row followed by lots of minuses so clearly there's things that we could do to make that more efficient so we have high so we have our programme so it's worth saying that this can be taken further so Cary Beckasite she took this and she made evolved programmes that generated the first few Fibonacci numbers or that took user input and added those numbers up so more interesting things can be done I think it might also be quite interesting to rather than evolving Brenneth programmes to say take the absent syntax tree of Python and try and evolve that and see if you get that to work so I certainly did some stuff with function evolution modelling those as trees and that did work so I think that could work so if you're interested all the code for this talk is up on github I hope you found it interesting I'd like to thank again Cambridge Medical Robotics for enabling me to be here and give you this talk as I said we are hiring so if you too would like to work with knife-warding robots it does make you sound quite interesting at parties and in all seriousness is very rewarding work and we're interested in Python developers software test engineers if they're any embedded to developers in the room or if you could do any of those it would be great more than one even better but yeah alright thank you very much Emma do we have any questions hey thanks so you're generating lots of programmes and then running them is there any danger of this doing anything weird on your computer or I don't know what's included in this brain F library in terms of APIs what it can do on your system yes it can do arbitrary things so it could potentially do anything yeah that would be bad questions hi thank you for your talk so what I got from this is that you sort of revolve a search space and you set the software to find it now in the highest the high version probably it wasn't very efficient but like performance wise would you sort of go for this in order to make also evolve programmes or go further I mean that do you have proof that these genetic algorithms really are worth dealing I mean going deeper into that so I didn't quite hear that have you actually tried it somewhere was that a question about how you could make the performance better I didn't quite hear no it's like do you have any actual sort of scenario where you have tried this and you got a better result on your own sorry I still quite hear you have I tried it do you have an actual scenario where you have used this oh in real life probably not it was mainly for fun and interesting educational value of trying it out I guess if you could improve the performance it would work better then yeah why not certainly if for some reason you wanted to write brandf programmes you don't want to have to write those on this you can certainly write them quicker than I can so for dealing with something like that then yes this would be better but why would you do that normally but I think it's certainly interesting more from a sort of academic point of view of kind of what could you do if you push them out of this rather than any practical applications right now hi great talk are you using generic algorithms for robotics development no I'm not using them at work this was just a side project yeah I think when it comes to surgical robots we probably want them to be slightly more deterministic but yeah thank you have you ever experimented with evolving the syntax tree of the programme instead of the text representation maybe this is much more efficient I don't know yeah so I haven't tried that it's one of the questions that came up when I gave this talk at Pycon UK last year and certainly I think that would be a really interesting thing to do with this so it did look at function evolution so evolving polynomial functions which model as trees in kind of a similar way that you would have syntax tree and that did work for evolving them so I think you could definitely do it quite tempted to try any more questions oh hi any specific reason that you use genetic algorithm instead of any other optimisation techniques not particularly so we were just studying a range of different topics in machine learning and this happened to be the one that I was doing at the time which is why I used this I'm not not for instance saying this is necessarily the most optimal way of doing this but it was just interesting good right if there are no more questions oh yeah why wouldn't you use genetic algorithms or any kind of for the robots because I mean we have self-driving cars that kind of go in that direction I mean I understand that you want deterministic behaviour but I guess you could achieve good results so why would you use them in general I think if you're a useful search algorithm so if you have like I said data that is in some way sorted that you can tell from looking in one area if the thing next might be useful it can be a more efficient way to search a large space I don't think you would I'm not aware anyway of any applications for self-driving cars but I think it's more searching large amounts of data that it might come to play ok right it's lunchtime everybody thank you very much