 Thank you. So good afternoon. This is a talk about machine learning. How many people have been following machine learning at all? Okay, great. So this talk is going to be more about what's going on, why it should be important to you, a little bit about how you might dive in yourself and why you might want to and how it's applicable to networking and importantly what to watch out for. Because what I will say is we're kind of at the beginning of this for networking and you'll see, I have some editorial comments on this but the hype cycle is pretty intense right now. So how many people have seen this picture or a picture like this? So what happened here was this guy in the code, oh by the way, I'll show you where the code is for almost everything that's here and you can have the PowerPoint or the PDF so you don't need to write it down. And if you don't find it somewhere on the website for these guys, just drop me an email and I'll show you where it all is. But it's pointing right there. Okay, so anyway, this guy made what's called a generative model. That's a machine learning model that tries to learn the core part of whatever it's being trained on and then make more instances that are somehow the same. So this was, you know, this was trained on Van Gogh's Starry Night and others and then the guy took a picture of the Golden Gate Bridge and said make me something that looks like this. So this is sort of the Golden Gate Bridge rendered as a Van Gogh. But this gives you an idea of what you might be able to do with the generative model. Before I dive into this, I'll just tell you that we built generative models that model the sequences that the protocol sequences that TCP and other protocols do and have them generate slight variants on those sequences. Some of them worked, some of them didn't. So this talk isn't really about Siri or anything like that, but I did want to say that speech recognition has really, really hit a milestone which is sort of like above 95% accuracy. So that's why you have Amazon Echo and Google Home and things like this. And this is Andrew Ning. They had a thing, he was at Baidu for a while. He had been at, he's a professor at Stanford, had been at Google. But this is a picture of their thing that's bi-directional recurrent neural network. If you want to know what that is, we'll talk about that later. That's a whole other hour of things. And this is how Google Translate, how many people use Google Translate? I use it all the time. It's great. Well, it's an instance of neural machine translation. So a special kind of neural network that has some memory like a tape. That's why they call it neural machine translation. And it has this idea of zero shot learning. So some of these things, it was never trained on, right? It actually created this thing itself, which they're calling interlingua, which is an internal representation of what the language is. So as long as you have that for one side or the other, you can just connect them up. So this is zero shot, meaning zero instances to train this thing. And then here's one other one. This is the sequence of sequence learning. This isn't actually a Google product, but was done at Google. So it learns sequences and then can decode them from one language to another. It's pretty impressive. Okay. So it's not about any of that. So before we dive in here, this is what this talk is not about. AGI, artificial general intelligence. How many people have heard of it? This is Terminator. We're not building terminators here, right? We're not building sentient beings. We're not building any of that stuff. This is sometimes called strong AI. It's really interesting. There's an open source project. There's many of them, actually. This is Ben Gertzels, OpenCog. Might take a look at it. It's kind of interesting. This is sort of at the nexus of neuroscience, machine learning, AI, and other things. What we are going to talk about is machine learning, which is sort of a methodology for solving engineering problems. Now, it might be used by AGI's because the model you might use to solve some of these problems might look like some kind of either biological or neuroscience kind of model, but that's not the goal. So, for example, a goal here would be image recognition. And sometimes you'll hear this referred to as weak or narrow AI. So that's what we're going to talk about. So goals, briefly review what's about, kind of talk about what we can do in networking. What's practical today? What are the challenges, both sociological and technological, because there's lots of both. And what are our opportunities? And I'll try to go easy on the math, although it's really all about math. But if you want to learn more about that, look on my website. I've written up a lot of stuff for my team there. And then there's so much material on the network, you'll never get through all of it. So I just gave a few quotes here. A year ago at the ONUG, this guy, Bill Cochran, at Sequoia Capital, he just offered this on a panel I was on about storage. And he just said, you might be surprised to know that what's going to drive innovation in an enterprise in public cloud is machine learning. A year ago, he just offered that. I found that interesting. And then last year, I think it was at the Red Hat Summit, Chris and Dave Ward were on the cube. And Chris just said this, you know, so what does all this mean? I mean, what is all this about? Okay. So I'll just warn you guys right now. You want to be careful when people start talking about how great their machine learning stuff is. Because we're kind of at the top of the hype cycle, I'm going to just use the Gartner kind of terminology here. But you know, you are there. And if you look at that, that URL on the bottom, I guess it'll be on the bottom, your left or whatever. Machine learning and AI in particular has failed spectacularly three times already. So just be careful, be skeptical. And in this game, what it comes down to is show me the math, show me the code, show me the data and let me reproduce it. Otherwise, it didn't happen. Right? So, you know, be skeptical. Okay, what's happening? Alright, that's interesting. So for some reason we're kind of oops. Okay, that's a little better. Okay, we were there. So be careful. But this this is all happening at warp speed really happening fast. And this picture is showing you sort of our version of the how many people know about up and to the right. Ever heard that? All during the late 90s and early 2000s, we we lamented the fact that the routing table was growing so fast, and we couldn't do anything about it. And the curves were rose up and to the right. And that's what's happening here. It's all up and to the right. So here's the observation I want you guys to walk away with on this is basically the machine learning has a long history of open source predates things like the open source movements that we've been seeing here. And it's it's everywhere. Everything's open sourced. This is just a picture of deep mind universe or open AI universe deep mind opened their open source their lab all of the stuff is being open sourced. So we have open data. We have open source code. We have open data. We have open models. We have open science really. And all of these things are getting integrated together in this idea of democratizing AI. Now that's a ways off. But the idea of openness is really there. And so when you see breakthroughs in machine learning, what you'll see what you'll notice is that there are published descriptions and results, at least on archive. How do people know about archive? So archives are places where you usually publish scientific papers before they go into a journal or other peer reviewed publication. So get this. So the X in the archive is actually the Greek letter chi. So it's archive. Get it archive. So archive.org. And you'll see okay so see papers there. You'll see a diverse set of open source applications, growing communities just like we have a networking all kinds of them. And we'll see public private partnerships. This is the vector Institute in Toronto and the vector Institute guys. Now, importantly, if you don't see this stuff, ask some questions. Because probably not real. Here's an example of where it went wrong. So Apple wasn't allowed and we're not talking about Siri, but Apple wasn't allowed to publish their research, or involve or get involved with any of these communities. And as a result, Siri really fell away behind. And then last year at NEPS neural information processing systems, they were they announced that Oh, we're going to let our people publish our research. Because they realized that without that, this is not a job for one in one entity, right? So it's interesting to see what's going on. But be aware that if you're not seeing this and you hear things like, you know, Oh, well, our network thing is uses AI or uses machine learning. Ask the question. That's cool. How does it work? And if people can't answer how it works, either you're talking to the wrong people, or something else, and you can guess what I think the something else is. So all of this is, you know, things are just going unbelievably fast in machine learning. You know, I was around at the beginning of the internet in the late 80s and early 90s. And this is way faster. It's really amazing what's happening here. We're behind in networking, I'll explain why in a second. But these are some graphs about different things that you might look at, like the number of papers published in conferences or a number of papers that refer to different kinds of models. And again, they're all up and to the right. So how am I doing that? By the way, if you have questions or comments or anything else, just interrupt me. I don't know if we have, oh, yeah, we have microphones. So just get up and interrupt me. It's fine. Okay, so here's what I think the agenda I'm going to try to get through. I have way too many slides, by the way. So I'm going to pick and choose a little bit. But you know, what can we do today? And what are our challenges? What's the state of the art of machine learning for networking? And then I put this at the bottom here. There's code and different things there if you're interested in learning more about this. So here's one thing we did. We tried to combine SDN and machine learning. This was done with a bunch of my colleagues from Cisco, UPC and other places. And basically what we did was we just took data out of a SDN controller and did inferencing on it. And we built a system that looks like this. You know, which is, this is, by the way, this is just the typical machine learning sort of loop. And what we were able to show is that you could do quite a bit of inferencing of things that are important to you, like predicting one loads on a server are going to be out of the zone that you want for some VNF or other things like that. So this kind of model is what you're going to probably see in one sense or another. There's a feedback loop that's not actually shown here, and I'll show you that in a second. We didn't draw it in here because the feedback loops in development and it would have made the picture really impossible to understand. There's hard coded logic in here, too, by the way, for the time being. And that's the SDN controller itself. Why is that an issue? You might be able to learn what the SDN controller is going to be supposed to do. In fact, Jeff Dean said that, Jeff Dean is like Google guy, and he was responsible for a lot of the stuff at Google at scales. He said, I wouldn't write code for that anymore. I would make it learn it. So a lot of this stuff that we write code for now is going to be learned in the future. So, okay, we're here talking about this because there's been like this massive amount of success in the machine learning space, but where? Okay, so first off, perceptual tasks. I kind of made this turn up because term up because it's kind of how it feels to me. You know, these are things like vision, natural language processing, robotics, but also new domains like medicine, finance, other ones like that. So this thing on the right is just a convolutional neural network, which is sort of the where you get this variance on this is where you get state-of-the-art performance on image and object recognition. These are a few of the things. By the way, I put a lot more stuff in here than we can possibly talk about, but I wanted it to be in there for you guys if in case later you want to look at it. There's been a lot of work on statistical approaches. This is Bayes' theorem or Bayes' rule, kind of the basic way we understand the world around us. We use this a lot in machine learning. How many people have heard of anomaly detection? Okay, so this is kind of good. So this is kind of a technique for doing binary classification. So, good, bad. So why? Why has this all happened? Remember, I said AI has failed many times before. Well, the main reason is that these three fellows made, actually it's Jeff Hinton on the left. Jan and Yoshua were actually his students or were post-docs at one point in his lab. But in around 2006, there were technological breakthroughs that allowed for training of these deeper networks, so deep learning became possible. The other thing was that compute is pretty ubiquitous in the cloud these days. Back in the 80s when these people were starting to work on this, compute was literally millions of times too slow. And the largest models I've seen right now have a trillion parameters, a trillion parameters, and they get trained on 8K GPUs, not 8K cores, 8K GPUs. So compute. The second, the third thing is data. The advent of the hyper scales or the web scales have, you know, provided unprecedented source of data, and these deep neural networks are really data hungry. They need a lot of data. And then the fourth thing, maybe which is kind of more, a little bit newer, but now is becoming a more, more dominant feature is talent pipelines are starting to fill up. In other words, there's people who are getting education who can actually do this stuff. Before there were very few, now there are more. And this is a picture of Jurgen Schmidhuber. If you ever get to hear him talk, definitely hear him talk, he's hilarious, and he has a different view than these three fellows on the right do of how all this evolves. So there's a little bit of nice controversy there. Excuse me. Okay, but what about the network space? How many people have done machine learning for networking? Okay, two. That's a lot, actually. So why have we seen little progress? Well, I'll focus on the data aspects of this and talent pipelines. I'll talk a little bit about anomaly detection, too. Okay, so first off in the data thing, the first thing is is that standardized data sets, and I don't need standardized like ITF. What I mean is data sets that are available to everybody that have known properties. Standardized data sets just have been crucial in the development and success of machine learning. Like Andrew said, data is the rocket fuel of machine learning. And this allows for direct comparison of learning and inference algorithms. So, for example, if we both train on the same data set, and I get, I don't know, 10% error and you get 5% error, you did better than I did. And I can compare that, right? Right now, we don't have this in networking. But the result of this has really been a steady ratcheting down to superhuman levels on a lot of tasks. Superhuman levels, what does that mean? If I give you a set of images, for example, and I say what's in this image, you'll get some error rate usually around for a human. It's going to be about 5% to 7%. These things are below 5%. So they're superhuman. So, we have nothing like this for networking. And there's a lot of reasons for this. Many or most of the data sets are kind of toy. That means small and not representative. They're noisy and normalized. They're just network data is dirty. I'm sure anybody who's worked with network data can commiserate with that. Many data sets are proprietary. Carriers think that their data is a big part of their value proposition. So do the hyperscales. A lot of data is not IID independent and identically distributed. So most of our data sources are also representations of network data that are hard to use for machine learning. And there's another thing. There's no standard way of combining them. So I have now I have logs or maybe syslogs or something like that. And I have flow data. How do I combine those in a way that's that you could replicate or you could compare. And I just gave you a URL of where there's a bunch of data sets here. So if you're interested, take a look. There are various different kinds, not all network data. Talent pipelines. So this one, when I talk to network people, they kind of, I don't know, there's a there's a kind of a little bit of defensiveness about this. So if you get that, raise your hand. So I'll know. But essentially machine learning is math. It's in a form of applied mathematics. And it's multidisciplinary empirical science, which sort of means you use a lot of different domains and you try stuff. Network engineers like us, we typically don't have the background that include the kinds of mathematical training and experience that really are necessary here. So this is just a picture of the things you'd expect, linear algebra, probability theory, differential calculus, and stuff like that. And that's what all this is built on. So if you're going to get a network guy or network person to do machine learning, it requires those skills and others. And the most important one is the ability to rapidly learn new concepts, because the whole machine learning space is moving so fast. But you know, here's a list of some things. But the main result here that I want to leave you with is that there's a serious skills gap between what we do in our daily lives and what we need to know if we want to be successful in applying this stuff. So think about this a little bit. And this URL at the bottom is an interview Andrew Nen gave to Harvard Business Review while he was talking about this. And essentially, I just cut this out of it. But again, you see that data and talent are the gating factors. So we have to, we have a challenge. Said another way, it's all about math. I said that already. This particular stuff is from Facebook. There's a really great set of videos there that describe how this all works in non-technical ways. Do you have a question? Oh, go for it. Do you have a mic? Can you? Yeah, great. Test, test. So before you move on, because you said to interrupt you, I'm sure I have a question about both data and talent. So on the data side, do you think that you get the most bang for your buck right now going with supervised learning and trying to tackle very specific individual problems or going for something like tensor or deep neural net or something like that and trying to just take raw data and have it generate? And then the second one, as far as talent, do you think that it's the same that it was probably say five years ago when if you didn't understand, you know, Gauss and like looking at linear algebra and stuff like that. Now that we have scikit-learn, you know, Theano, all of that, like get people to be consumers of tools that will do all the math in the back end for them. Like, do you think that we're at a point now where even if you're not a number cruncher, you could still make some amazing things happen? Yeah, great question. Okay, so let me answer the first one first. So I think you're asking about supervised versus unsupervised learning, right? Okay, supervised learning means the data of the form x, y, x is the input, y is the answer. So, and I'll show you in a second, but in the case of say image data, I would say here's an image and the label, that's the y, says cat or whatever it is. And then what you do is in supervised learning is you make a prediction about what's in the picture, say, and then you compare it to what they tell you it is. So I make some prediction that says dog, but it's really cat. Dog and cat are represented as vectors. I can do arithmetic on it and calculate the error, use that error to improve my model. So that's the supervised learning part of it. Unsupervised learning doesn't have those labels. And unsupervised learning in the neural network space is cutting edge. It's a lot harder. Now there are kinds of unsupervised learning that have been around for decades like k-means and things like that that are very useful and I'll show you a little bit of that later. So, but I would, what I would say is that the vast majority of all of the fantastic results you've seen from machine learning have been in supervised learning. Does that answer that? Yeah? Okay, the second one, can we, can we just be consumers of this stuff? Right now the answer to that is yeah, if you don't want to do anything at scale, but only at things don't, the only thing that matters is being able to do it at scale, because if it doesn't matter, if you don't have to be at scale it's easy. So right now it still takes these skills. To build a, to build a, you know, to build a big inference model that can work at scale, still takes all of this skill. You still have to know how this stuff works. I will say that things like, you know, TensorFlow, scikit learn, all these millions of, there's a new, by the way there's a new framework every week for this stuff. So one of the jokes in the whole area is, oh well, how long is it going to take you to convert your code into the new framework, because you do it every week, right? So, but still it still requires those skills. Trying to watch the time a little bit. So we will become consumers of it, and that's what this idea of democratizing AI is about, by the way, is making it consumable in the way you're describing it. It's just not there yet. Okay, so in the real world, essentially this is the way it works. You have some real-world system, it generates some data, we gather that data, we might visualize it by k-means or something like that. We build a model, we hand the model to the, to the whoever's gonna evaluate it. They take a new data, new piece of data, and run it through the model, that's called inference. We see what it does, and we just loop. And if it, if it does well, we just go, we're done. And if it doesn't do well, we refine the model. Now, here's the, well here's the scientific method applied to it. Essentially, you got to use case, you know, we want to be able to detect anomalies and say the way that VNFs are being deployed. You get a hypothesis, oh, I can collect this data and use an auto encoder, I'll show you what that is in a second. Then you design the experiment, you do the experiment, you do the inference, you see how it works, and then you loop around. And you actually need automation inside this ability to do automation if you want to do this effectively, right? So this loop right here needs to be automated. And we did it with StackStorm, but you could do it with, you know, any other, any other workflow automation. But you have to be able to automate it and it has to go fast. So I'm just saying that this is, TJ Watson said this, it's basically fail fast, right? Fail a lot and fail fast. So let's see. This is, I'm just going to quickly go through this because I want to mention this to you guys is that a lot of this is data plumbing, you know, a lot of your time is spent building systems. Now this has a serious implication, and I hope I have a slide on this. Yeah. So here's a picture of a typical ML stack. This is Spark. And you can see it has all the typical, you know, typical functionality that you need to get this done. And Spark's nice, right? So I rearranged all this stuff because I wanted to kind of look at how much was ML code. And it's actually this. It's not that much. It's very dense, but it's not that much code compared to the rest of the system. What this means is that these systems get technical debt at the system level really quickly. So there are other problems in just maintaining a system that does this. I'll give you an example. We had some containers. We wanted to run Jupyter inside the containers, but because of the way Jupyter is implemented, it's hard to use GPUs from inside Jupyter inside a container. And it hacked around that, right? Now how many people are using that container and do they know that code is there? Probably not very many. Okay, so let's just talk briefly. Let's see what kind of time I have left. Let's talk briefly about what anomaly detection is because this is kind of a core thing. Anomaly detection might be, like I said, maybe a VNF is behaving unusually or maybe it's a security thing. There are unusual flows or there are unusual packets or something like that. So what it's all about really is you want a group, everything that's normal into one group, and then learn the anomalies. So on the right, there's sort of a binary classification. So you just, okay, there's a group of things and now I got these other points. How far away from all the other ones are there? And this just gives me anomaly. And you could do that, you could just split that again and have multi-classes, but it's easier to think about it in just one class, but without loss of generality. So the approach that you take when you're trying to build this is you construct a nominal model of the baseline, you know, where all of those points are that are in the circle. Think about how you might do that. And then you assume that data that's far from that are generated by a different process, the anomaly. And so there's a hitch here, right? And the hitch is that the evaluation of far is something you have to define outside of what you learn. So you got to tune that. That's a tunable parameter. Now there are lots of ways to do this. Clustering, I mentioned that, K-means. That picture there is what you get out of K-means. It shows you, oh, my data has three kind of main things. That can be very useful for when you're exploring. Principal components analysis, auto encoders, I'll show you that in a second. But there are lots of ways to do this. Oh, I already talked about this, but basically that's, you know, you have a host. You want to be able to do all this stuff with it. So I want to just show you this. And thanks for the, I guess, from TensorFlow, whatever I was using to TensorFlow 1010, they changed the names of subtract and multiply or something. And somebody fixed that for me, because I hadn't looked at it. But check this out. This is kind of an interesting way to think about this. Okay, so MNIST is one of these standard data sets I was talking about. It's 28 by 28 grayscale handwritten images. There's 55,000 of them. And they're in different kinds of configurations. So, and by the way, the labels are what you use for supervised learning. But I'm going to use an auto encoder, which is unsupervised. By the way, we can analyze network data in the same way. This is just a long, what I did here was I just flattened it into a long string of bits where each bit is kind of like a feature. So you could have a flow record. That's what's below there. So I'm using MNIST mostly because you can easily visualize what's going on. In our case, instead of, you know, the input being these vectors that have a bit or 8 bits if it's grayscale, it'll be a vector of counters or various KPIs or things like that, depending on how you design your experiment. So here's a way to recognize it. So what we want to do here is we want to recognize whether or not a 28 by 28 bit image is part of the MNIST data set. We're not going to try to classify what kind it is. That takes a harder network or a little bit more complex network called a convolutional network. But here's the basic network. You have the input, that's the image. You squish it down into a fewer number of bits in the hidden layer, and then you blow it back up. And then you compare what you blew it back up into the input, and that defines the error, and then you try to minimize that error. And that's all you're doing, right? So this is a form of unsupervised learning. So here's what the numbers look like. So the input 785, that's 784, that's 28 by 28. I just flattened it with NumPy, a simple thing to do. I made 100 hidden layers Y. I just, you know, picked a parameter out of the error, and then I blew it back up into what it was. And then I computed the error. So you know what, I'm going to pass on this because we're not going to have enough time, but let's just notice that if you have that simple network I just showed you, that thing has 784 times 100 plus 100 times 784 parameters, about 156,800 parameters. Where are you going to run that in your network? Think about that. And this is tiny, right? The big networks have a trillion. This is 100,000 or a couple of 100,000. We're going to skip this. All kinds of cool stuff. This is how it really works. This is the code you need to do it in TensorFlow. That's it. So, and there's a Jupyter notebook there at that URL, and you can just look at the code and kind of mess with the parameters and see what it does. It's kind of interesting. But rather than talk about this, let's look at what it does. So here's the output from this thing. If I give it one example, the reconstruction, the thing that it blew back up into looks like that kind of, you know, what is that noise on your TV? Why do you think it looks like that? Anybody have an idea? Okay, it's because I initialize the weights that the thing uses to compute what it is randomly. So right now those are just random numbers. And then if I give it 10 on the right there, I guess that's on your left, it starts looking like something, right? And then finally, and finally, finally it starts looking like the real numbers. So on the right where there's a thousand examples, actually now if you give this thing a 28 by 28 bitmap that's not one of these handwritten images, it'll have high reconstruction error, this little equation on the left, it'll get on your right, it'll get high reconstruction error and you'll know it's not one of those. So that did an anomaly detection, right? It told me that this is not one of those, it's somehow different. It's an anomaly, right? And that's all you really have to do, all you really have to do. So, came running out of time. So how much of this has been applied to networking? It already says not too much. Anybody have any ideas of why not? Okay, let's see if we can figure out. Because you know, it would be obvious that we'd want to be able to have this capability in networking, but we don't really have it yet. So the first one is the community challenges I was talking about. Sort of a lack of understanding, we don't quite have the understanding of how progress is made in machine learning. The hype cycle thing is crippling right now. Skill sets are a problem. Proprietary data sets. So these kind of community challenges. We have all these diverse types of network data. And no obvious or consistent way to combine them. Incomplete data sets, all kinds of things like that. So data's not so good. Skill sets, we talked about different types of models for different data. Yeah, that's still an area of active investigation. And this is one that I kind of like to outline is that there's no real useful theory of what a network is. And useful, I used the word useful because it's not useful for machine learning. There's all kinds of network theories like, you know, queuing theory and all different kinds of things, but they're not useful for us. So if you consider the problem of object recognition or something like that, it's been so successful because there's a theory of how vision works and that theory of how vision works is implemented in these convolutional nets. Yeah, so the one other thing I wanted to mention here is transfer learning. So transfer learning means I train a model on one data set and then I can add more things to it and have the union of both of those things. We need this for networking because if you can't get transfer learning, then every network is a one off and that's going to be crippling. We want to have general purpose infrastructure for networking. So let's see. Oh, I talked about this on packet pushers if you're interested there, so you're all. Okay, so I want to just get to the end of this so we can have some Q&A. This is stuff that, you know, it's happening and networking all over the place. And, okay, so why the slow uptake? I mentioned this, community challenges. Most ML systems today use very old, decades old technology, not all, but most. A lot of them are unsupervised learning, things like K-means, et cetera, and other kinds of clustering. There's all kinds of ways of doing this and no usable theory of networking. Editorial comments, everything's ML today. It's kind of like cloud or SDN, you know. It's like remember when SDN started everything that wasn't SDN was and, you know, and as I mentioned, if you do have breakthroughs, things that are really working are new, you'll see it. And, you know, most systems today are these big data platforms, many of these around. Ad hoc statistical techniques, some hardware sometimes, you know, like things like packet brokers and things like that. And a healthy dose of marketing and hype. And I'm kind of concerned about that because that's, I think that's slowing us down. Let's see. How many people remember open washing? Ever heard that term? Okay, in the early days of like open daylight and things like that. And when I was working in the Linux Foundation, I learned about this and open washing just means you call your thing open source and then it's got all the good benefits of open source, right? Well, now there's ML washing. You just call it ML, whatever it is. Okay. And we also need to learn from our environment. So this, I'm not going to go through this because I want to get to the last page of this. If I can find it. Okay, so here's, I'll give you a summary, lots of network data, right? I mean, you know, it's not like we don't have lots of data, network throws off tons of data. But it's not standardized. It's not labeled. It's proprietary, all these kind of things that make it hard, hard to use. There's millions of open source frameworks, cloud APIs, all this kind of stuff. I didn't even put all the ones that are here and every week or every week, there's a new one. But we're still dominated by ML washing and platforms, clustering, hype, this kind of thing. Skills gap is still persisting. And as I said to the gentleman over there, it still takes experience to build these deep neural networks and find the architecture and models. It still takes skill to prevent overfitting. I can explain what that is. And it still takes these kinds of skills to make this work at scale. But even with all of that, you're going to see this in every facet of networking. In fact, a lot of the things that we do today are going to be done like Jeff Dean said, by learning. So what do we need to do? We need to find a useful theory of networking. If you want to know more about that, talk to me. We need public data sets. We need to hone up our skill sets. We need to apply state-of-the-art machine learning algorithms so we can't be stuck back in the 80s, 90s, 2000s. We need explainable end-to-end systems. I know that many people talk to me today about the fact that in these neural nets, you don't really know what they're doing. That's a problem because network operators aren't going to like that. We need to learn control and not prediction. We have to learn from our environment in an ongoing way. Yeah, there's all kinds of issues there. Okay, so here's the thing I wanted to get to. So if you want to jump into this right now, it's a great time because first off, there's so much material that's available to help you learn and there's so much infrastructure that you can use so you don't have to reinvent the wheel, like the gentleman over there was saying, for everything. But there's so many mysteries and so many things that you might like to just try to understand and push the whole thing forward, whether it be networking or otherwise. The first one, I'll just talk about briefly is back propagation. This is the way we train these networks. It's a simple algorithm. I could explain it to you in less than an hour and you would totally understand it. The question is, how can this simple thing create these powerful models? It's a mystery. And on top of that, there's such a thing as adversarial images or in examples where what you can do is perturb, like in the image case, you can perturb the image in such a way that you can't, as a human, detect their perturbation, but they will deterministically misclassify. What is going on there? Here's a picture. The thing thinks it's a panda before their perturbation. Now it thinks it's something else. But you can't tell the difference. What is going on there? There's these new multi-player game things where you take two neural networks and you have them play against each other and train one another. This is incredibly powerful happening right now. By the way, I did this for Network. I built what's called a generative adversarial network for our network data to see if what I could do is have it learn what a transport protocol was, and it did right away. It learned TCP Reno right away. I mentioned this. Neural networks, what are they exactly doing? It turns out that this picture is a visualization of the activations of the first level of a convolutional neural network. These are called Gabor filters. They just are edge detection and orientation. But above that, it's hard to know. Here's another interesting one. These guys train this network on the sentiment data from Yahoo. And they found out that one of these neurons, one of these neural neurons could tell whether or not the sentiment was positive or negative. How did that happen? There's a pointer there. This is the sentiment neuron. What does this even work? Remember I told you in our case, the 28 by 28 grayscale, so that's 784. Suppose there were eight bits of grayscale. Sometimes there's more. But that would be 256 to the 784 possible images of that size. Some just gigantic number. The model I showed you had 157,000, 156,800,000. So if you divide 156,8 by 256 to the 784, that's zero for all intents and purposes. So how does that simple code find these? It's kind of a mystery. And there's so many others. But this is a great time to jump into this. Networking area is going to explode with this stuff as soon as we break some of these problems. And you guys are all in the right, everybody's in the right place to get this to take advantage of all of this and look at what the next generation of automation is going to be about. Let's see what else I got. I think I'm out of time. I'm out of time, right? By the way, if anybody wants these slides or has questions on any of this stuff, just find me or drop me an email or whatever. You're welcome to the slides. You're welcome to use them for whatever purpose you have.