 or check, check. Hey, there we are. Welcome, everyone. I hope you're all as excited as I am to hear about Claude Shannon. Yeah, that's what I like to hear. We are still looking for volunteers and sometimes people like me just get roped into doing things, but if you want to actually choose the job that you do to help out at EMF, you can go to the volunteer tent and help out. I would like to introduce Matthew Ireland, 100 years of Shannon. Give a warm welcome to Matthew. Thanks, Mike. So we're nearing the end of what's been a fascinating weekend of really wide-ranging and hugely interesting content, and I'd like to spend this 30 minutes trying to persuade you that the origins of much of what we've seen over the course of these three days can be traced back to this man, Claude Shannon, who was born 100 years ago this year, and he was a mathematician and engineer and worked in a huge number of fields over his lifetime, and mathematician and engineer aren't necessarily two words that you expect to hear together, certainly not at the same time, but I think it's the combination of these two pursuits that really means that his work has survived, been so influential over a huge number of fields, and also been practically applicable right over the last 18, 19 years. And think Shannon and you think information theory, and think information theory, and you think Shannon, and so you should. He really founded the field and all the language that we use today, and all the concepts are his work, but he also formed, shaped, contributed to a huge number of other fields, and I don't know whether publication count really is significant on its own, but it gives a rough picture of how long he spent working on each area. Computers and circuits, he, well, will come on to see he did a huge amount of work in. Games and artificial intelligence, I hadn't realized just how much he'd done in this field until I attended a talk on game theory just after the last EMF, and all the concepts in that could be traced back to Shannon. Then during the war he did a lot of work on improving the infrastructure for Churchill and Roosevelt to talk to each other, and also his doctoral dissertation was a developing algebra for theoretical genetics, so just like information theory has, he directly touched biology as well. And really as a result of the huge number of topics that he's worked on, if this talk feels as if it's degenerating into an arbitrary selection of topics, then that's kind of the point just to give the picture of how much he did. And the first area I'd like to touch on is relay and switching circuits, and the reason I'd like to cover this is because it gives a really good picture of both his mathematical insight and also his practical engineering ability. It's a really good example of that. And the relay can be traced back to nearly 100 years before Shannon came along to work on it, and it was developed as an amplifying device for telegraph networks. There's quite a pleasing parallel there I think between the transistor, the valve, and the relay, all originally developed as amplifiers, and later going on to be switching logic devices. But even 100 years later in Shannon's day, people weren't building hugely complicated systems out of these, and the design mechanisms were ad hoc. They were very widely deployed, particularly in telephone exchanges, but building complicated circuits was difficult. For example, the first one bit adder had been built in 1937, and engineers generally thought it was prohibitively expensive to build this kind of adding circuit up to any significant number of inputs. But Shannon drew on the earlier work of George Bull, which he'd come across in his mathematics undergraduate degree. He'd come across a huge number of complicated algebras, and he developed the theoretical underpinnings that meant that relays could start to be used in these complex switching systems and really started to build the first computing circuits that we see today. Really, this work was all done in his master's thesis, which was published at 21 years old, and really what he said in this thesis was that if one day we can develop intelligent computing devices, then they'll be strung together out of a series of switching devices and we'll analyze them using this algebra of George Bull. That's really an incredible piece of work for 21 years old. What he did was he took a closed relay and an open relay and said that, well, this looks a lot like the two states that exist in this algebra, the zero and the one. He developed mappings between doing things with the relays, for example, putting them in series, which looks a lot like the O-ring operation in this algebra, and putting them in parallel, which looks a lot like the ANDing operation in this algebra. He developed this series of links between what we today call a propositional logic and the interpretation in relay circuits that could be really used in synthesis and analysis of very complicated mechanisms. We'll tackle simplification first. If we take this circuit, which has 13 relays in it, we can, using his correspondence between the physical devices and the algebra, we can write down a mathematical description of this circuit, and then we can apply the operations, which were well established in the algebra, to this circuit to gradually reduce its size until we end up with something that's really quite small, and if we translate this back into a circuit, then it looks like this, only using six relays. So we've got a 50% reduction in both area and cost as a result of five lines of mathematics. Now you may wonder why this is so influential, because thank goodness we don't use relays in computers today, but we do use logic gates, and exactly the same theory is still applicable. Here we've got a fairly complicated circuit. Well, it's complicated to analyze by hand, it's got five inputs, and some fairly large logic gates that we wouldn't like to implement there, and we can write down a mathematical description of this circuit, and a few simple lines later, we're left with something that looks a lot nicer to implement. So again, an example of the mathematical brilliance of generality coupled with the engineering brilliance of application. So that's analysis. Now let's look at synthesis of electronic circuits. So let's take the problem of an electronic lock, which is something we want to work, we want it to abide by a formal specification, and we want to know exactly how this thing will behave. So we can take an English description of this circuit, and from the English description we can write down these lines of mathematics that describe how it should work. We can then minimize these equations, and then draw out the circuit from that, and these diagrams are directly from Shannon's master's thesis. The final thing I'd like to explore from this amazing thesis actually, which moves away from Boolean algebra, but has been more influential if anything, is the electronic adder, and he describes a four-bit adder. So this is something that's found in every single computing device, from your mobile telephone to your desktop computer to a supercomputer today. But really, this was 10 years before anyone had started using valves. It was 10 years before EDSAC. It was at least five years before Colossus even, and people weren't familiar with these standard building blocks that we know today, and this was the first time anyone had written down this adder. So here's a simpler form, and you can chain these to get an arbitrary number of firm bits in the way we would today. Right. So moving on from relay and switching circuits to really what everyone associates Shannon with, which is information theory, and Wikipedia defines information theory today as how to quantify, store, and transmit information. Really, the two words that best describe it we want to communicate effectively and efficiently. Those are the two goals of what Shannon was trying to accomplish, although these aren't his words, effectively and efficiently. And at the time, this was 10 years or so after his master's thesis and after the war, and he'd gone to work at Bell Labs, so he was working on telephone circuits. And the problem at the time was the telephone circuits in use were analog, so as you increase the distance between a transmitter and a receiver, you also increase the noise on the line, the noise that this line picks up. And if you want to amplify the signal so that it'll work over larger distances, then it'll also amplify the noise. So if you try to communicate over very large distances, you end up with a signal, which sounds very hissy, and you'll almost certainly be familiar with this sort of phenomenon. That's effectively how to communicate effectively, get the signal through, but try and suppress the noise, but also efficiently as important for telephone circuits because you want to be able to have as many people communicating over the same set of wires. And that's pretty much it, really. You want to get the best use out of the hardware that you can. And as an example of just how influential this work was, he published the paper first in 48, and then it was republished in Bookform a year later in 49. And between the two publications, it changed its name from a mathematical theory of communication to the mathematical theory of communication. And that's really quite a good description of how influential it's been. So before we can really examine in detail what Shannon did, we need to answer the question of what is communication? And there are a number of levels at which we can address this problem. The first thing we could think about is if I'm trying to communicate, then how effectively do you understand and act in the way that I'm trying to get you to act as a result of communicating with you? Or we can analyze how effectively my meaning is transmitted in the symbols that I actually send on the wire. Or we can throw away the semantics and just analyze the engineering problem of maybe communication is just if I put a set of symbols into my communication system, then are those exactly the set of symbols that I get out? And Shannon said that this is really the problem that we need to be solving. It's a necessary problem to be able to understand communication at the higher levels. And also, the higher levels don't actually influence the engineering problem at all. They're irrelevant to it. Although the converse certainly isn't true. And in fact, Warren Weaver stated that who co-authored the later Bookform of Shannon's work stated that really analyzing communication in this way by throwing away the semantic problem and the effectiveness problem has really cleared the air for a true theory of meaning. So taking this level A communication, the physical problem, what is it? So I'm trying to get the message from a source to a destination. So let's say we set off with a message in the source. And then Shannon said we should compress this signal to the smallest possible size that it can take. So when the signal comes out of the compressor, it should look like a random message because if it didn't look random, then there'd exist statistical correlation in this message that we could exploit to make the message smaller. And then once we've got the message at its smallest possible compressed size, the random message, then we should somehow add some redundancy to this so that when this message is corrupted in the channel, so the channel adds noise to this message and what we get out isn't exactly what we put in. Thanks to the redundancy, we can then decode the signal into what we actually put in. Then the decompressor undoes the action of the compressor so that at the destination we can receive exactly this signal that we put in at the source. And let's just have a look at what was happening in the field at the time that Shannon came along. The first hints of information theory can really be traced back to Clausius in 1856 and through his statement of the second law of thermodynamics, he identified the concept of entropy. And later on in physics, entropy was identified with missing information. And really this link between information and entropy helped solve quite a number of really quite deep problems in physics, for example Maxwell's demon. But actually this work didn't influence Shannon directly. The biggest influence on Shannon was the work of Norbert Wiener, who was primarily interested in the biological applications of information theory. And he developed all the underlying probability theory that's absolutely central to Shannon's work. But actually in quite a pleasing loop, Wiener later said that actually it was Shannon's earlier work on relay and switching circuits, the stuff we were looking at before, that inspired his work on probability theory and information theory. So that's quite a pleasing cycle there. Also influencing Shannon, who was at this time working at Bell Labs, were two other people who at least some time had worked at Bell Labs, Nyquist, who through his statement of the sampling theorem came up with the first bridge between analog and digital signals. And also Hartley, who identified that the information content of a band limited signal, so if I give you a signal that contains an upper frequency, then the information content of that signal must be limited. And they both influenced Shannon quite profoundly as well. So I mentioned that Wiener had developed various probabilistic results for Shannon's work. And I just want to take a moment to examine why probability might be a useful thing to look at in analyzing information theory. So suppose I'm taking the role of the transmitter, the source that we saw in the earlier diagram. And I'm transmitting a message to you. And I'll tell you that the message is just in English. And the first letter of the message is Q. So what's the next letter going to be? U. Yeah. What's the next letter? E. Yeah. Next one. So I heard S. That's one incorrect guess. R. No. Two incorrect guesses. Say again. E. Yeah. And the final letter? Yeah. Exactly. So actually, all I needed to send to you, because you could use your knowledge off the English language and perhaps some a priori knowledge that we're also in England, to identify that the message that I was sending to you was Q. So actually, all I needed to tell you was the first letter and the fact that on the third and fourth letters, there'd be one incorrect guess before you got the correct one. And that's all the information you need to reconstruct the message. And really, this is how most modern compression systems, most modern lossless compression systems work. And they work by doing a series of guesses and trying to reconstruct the message in this way. And actually, we can be a bit more mathematical about this analysis if you'll forgive me for taking a frequentist notion of probability. But so when I transmit Q, there are 444 possibilities. This was a very long evening with the dictionary of what could come next. And if I tell you that the next letter is U, then that only reduces the number by 399 possibilities. However, if I'd have instead told you that the next letter were A, then there'd only be 12 letters. So somehow, if I were sending QU, it would be insane to use the same number of bits to transmit the U as it would to transmit the Q. Because clearly, the U has a lot less information content than the A, for example. So perhaps I should use a few bits to transmit it. And if I then go on and transmit an N, then that reduces the number of possibilities for what this word could be to 2. And then if I next send an A, there's only one possibility for what could come next. So it's absolutely certain what the rest of this word is. And there's absolutely no need to transmit it. And in fact, this is a word that's made it into English through the Persian language. It's a kind of irrigation system used in the Middle East. So when Shannon came along, people knew that probabilities were related to information. And they also knew that the correct measure of information would be some sort of logarithmic measure. And this makes sense. If I have a single outcome and I want to measure its information content, which today in modern parlance is known as the surprising of that outcome, it makes sense. If I take two independent events and apply them successively, then I multiply their probabilities and thanks to the laws of logarithms, that means that I add the logarithms. So somehow, if I'm doing successive independent events, I can add together their information content. And also, this means that say if I have one switching device, be it a relay, a normal switch, a transistor, or a valve, then that can represent two possibilities, zero or one. If I take three switches, then that could represent eight possibilities, zero, zero, zero, zero, zero, zero, one, and so on, three to one, one, one. And if I use this logarithmic measure, then log of eight is two, sorry, log of eight is three, what am I thinking? So somehow three switches could convey three times the information as one switch with this logarithmic measure. So it's quite nice. But at the time Shannon came along, it wasn't well established what the base of this logarithm could be. They knew it was related to the number of signal levels on the wire, but this wasn't well standardized. And actually, this is an excellent example of Shannon's engineering brilliance in realizing the correspondence between the binary switching levels of the relays that he was familiar with and the information. And the fact that he standardized on the base two really corresponds to the realization that all information and all communication is essentially digital. And Shannon then took this logarithmic measure for a single outcome and used it to define how much information is coming from a source. So if a source is omitting a string of outcomes, then he came up with a measure for the information content of that source. And he defined it in this way as the average of the surprises. So the weighted sum of the surprises by their probability. And actually he didn't approach it from the averaging point of view though. He proved that given a set of conditions that our information measure should satisfy, then this is the unique family of functions that satisfies our requirements for the information measure. And the K in here just corresponds to the units that we measure information in. So if we use base two logarithms and measure it in bits, then the K disappears. But actually just while we've got this form, then anyone familiar with the either the statistical entropy or the Boltzmann entropy will recognize this form as very similar to that concept in physics. And although Shannon didn't point this out in the original paper version of his work, he did later when it was republished in book form. So just before we move on to Shannon's two major theorems that are still influenced in communication today, I just want to wrap up this entropy discussion. And Shannon's work brought to the end several hundred years struggle of physicists to come up with what really entropy was. So there was this theme at the beginning of the 20th century of operationalism, which states that unless we can measure what a concept is, so unless we can measure temperature, for example, then we don't understand what that concept is. And of course we can measure temperature using a gas thermometer. But nobody really understood entropy in these operational terms. And it was Shannon who came up with the operational definition of entropy as the minimum possible size that you can compress a string of outcomes to from an information source as we'll see in next in the source coding theorem. And this really brought to the end this struggle from the physicists and also went back and influenced physics. So there's been a lot of later work if you look at James and Maximum Entropy and things that have really strung from Shannon's work in information theory. Please in contribution back to physics there. Right. So Shannon's two major theorems. Firstly, the source coding theorem. How many people have come across the source coding theorem? Yeah. Excellent. So this states that if we've got a source of information, then we can compress the string of information coming from that source to arbitrarily close to the entropy times the length of the string. So this makes sense. We're saying that we can compress the information coming from a source to arbitrarily close to the average rate at which that information is being emitted. And this establishes a fundamental limit on how well we can compress data. So if we wanted to design a new compression algorithm, for example, then we could compare how well our algorithm forms to the entropy of the source. And also if our algorithm happened to compress to smaller than the entropy of the source, then we'd be certain that loss had occurred. So it'd be a lossy compression algorithm. So that solves how to do communication efficiently. And although Shannon gives us some hints how to achieve the entropy for the compression, it wasn't until quite recently that people think they have developed codes that actually achieve this compression limit. And even then they haven't proved that they do. So let's go on to examine how we can communicate effectively over a noisy channel. And this is the noisy channel coding theorem. So here's a plot. I'm not going to state it mathematically. It's a little involved for our purposes here. But we can understand it. So on the x-axis of this plot, I've got some measure of how much redundancy that I'm having to add to this message. And so over here I'm adding more redundancy to the message. And here I've got the probability that there's an error after I try to do the error correction on whatever I'm getting out of the channel. So each one of these crosses represents a particular algorithm for doing this noisy channel coding, for deciding what I want to put in this thing, for a particular way of encoding my random stream of information. So from these five codes, five algorithms for adding redundancy, which do you think would be our favorite if we could use it? Which is the most useful of these? Just yell it out. Louder? E, yes, E. So E really achieves a decent trade-off between the probability, low probability of error, and also a decent code word size. So we're not adding more redundancy than we need to. So more useful codes down into the right. But clearly there's going to be some boundary between codes that we can achieve and codes that we can't achieve, because we can't get something for nothing. We can't just put our random stream into the channel and expect to be able to decode it correctly, because the channel is going to flip some of our bits. And before Shannon came along, people thought that, yes, sure, we can achieve error-free communication, but we'll also have to add infinite redundancy in order to be able to do so. So, yes, we can reduce the probability of error to zero, but we'd have to increase the size of the stuff we send to infinity. And Shannon came along and said, no, actually, that's not true. We can achieve error-free communication at any rate up to some finite non-zero number known as the channel capacity. And he gave a form in which a mathematical equation in which this channel capacity can be calculated. And he said that we can transmit without error at any rate up to this channel capacity, which is really quite profound. We can achieve error-free communication without sending infinite redundant stream. We can guarantee that what we're going to receive from this channel is correct. And it wasn't until, although Shannon gave an outline proof, it wasn't until several years later that this was actually definitively proven. Right, so to sum up information theory, Shannon's key insights were to split the stages of communication into the source coding, which achieves the efficient communication, and the channel coding, which achieves the effective communication. And if you want an analogy for these two stages, the analogy that Shannon came up with is in the same way that for an electronic circuit, we maximise power transfer by impedance matching the transmitter to the receiver. Then in information theory, we maximise the information transmission between source and receiver by matching the source to the channel in the way that he describes. Another example of his practical nature. He decided we should use source statistics in the compression, and also use the source and channel statistics in the channel coding. And really, this has formed the theoretical basis for every modern communication system that we have, and has applications throughout everything, really. Right, so the last area that I want to very briefly give my whistle-stop tour off is game theory. And Shannon was really interested in studying this from the theoretical side, so he wanted to understand the nature of what sorts of things computers can do, and ultimately, how to make computers think. But he also had a lighter side. And at the weekends, he enjoyed going off to Vegas for gambling. And in fact, in this context, he co-invented the first wearable computer for cheating at roulette. So this is kind of the earliest form of the Google Glass, as it were. And it was developed so that he could time the roulette wheels and decide when to place his bets. He also co-authored the first paper on computerised chess. And he was interested in studying chess. Again, I should stress, more generally, as a good example of the kinds of problems that we might want to apply computers to. He didn't just want to have a good game of chess against a computer, although it would be nice if he could. And he used this minimax algorithm, which anyone who's familiar with basic artificial intelligence will recognise, because it's an algorithm which we still use today. And it's not quite the minimax algorithm. It has a slight optimisation for the computers available at that time. And indeed, he gives some code in his paper to show how this algorithm could be implemented. The essential idea is to start a position, then compute every possible move that could happen and what the board will look like at that point, and then apply a heuristic function to somehow evaluate the utility of the board at that point. And the other thing I want to stress is just how much chess literature he surveyed for this. So the game theory literature didn't exist at this time, but the literature on chess did because humans were very interested in playing chess. And the sheer number of papers that he'd read in order to design his evaluation functions for the game is absolutely incredible. This basic approach has some problems which he identified, and then went on to identify a better strategy. And this is very similar to alpha-beta pruning, if you know any artificial intelligence. So it involves better evaluation functions. It doesn't apply the old heuristic function in places, in positions where there's an exchange in progress. So obviously, if one move further on, someone's going to take their queen, it doesn't make sense to apply an evaluation function which involves that queen. Right. So, and the other thing that I should say is using this type B strategy, as he calls it, it allows you to explore the game tree to a greater depth just like humans would do. So humans don't exhaustively evaluate every possible move. They quickly discount some, and this is exactly what the type B strategy that he's described does. And if you think about just how many papers have been published on computerized chess since, it's absolutely incredible. And also the effect that they've had on our computing devices and how they've influenced research in machine learning is incredible, and we can all trace it back to Shannon. So, just to quickly conclude, there was just to finally demonstrate his practical nature. There was one famous interview where Shannon stated that Shannon was talking to this interviewer who wanted to know about information theory, and Shannon said, yeah, sure, we could talk about that, but wouldn't you prefer to go into my basement and see all my toys? And one example of these toys is this first computerized learning device. You can place the mouse anywhere in the maze, and it will get to its objective. So, I hope I've managed to convince you that really Shannon developed the foundational science for our age and has influenced so many fields over his time, and really the world would be a very different place if he hadn't done the work. Sure, someone would have done it at some point, but would it have had the quality, and how different would the world be now? And I think 100 years after his birth, that's a good time to realize and reflect on this. So, thank you very much for listening, and I'll be over probably in the robot arms if anyone would like to discuss further. Thank you very much.