 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager of DataVersity. We'd like to thank you for joining the current installment of the Monthly DataVersity Smart Data Webinar Series with Adrienne Bowles. Today, Adrienne will discuss machine learning from discovery to understanding. Just a couple of points to get us started. A digital large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them via the Q&A in the bottom right hand corner of your screen or if you'd like to tweet, we encourage you to share or highlight your questions by Twitter using hashtag smartdata. If you'd like to chat with us and with each other, we certainly encourage you to do so. Just click the icon on the upper right hand corner for that feature. And as always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and additional information requested throughout the webinar. Now, let me introduce to you our series speaker for today, Adrienne Bowles. Adrienne is an industry analyst and recovering academic providing research and advisory services for buyers, sellers, and investors in emerging technology markets. His coverage areas include cognitive computing, big data analytics, and the Internet of Things and Cloud Computing. Adrienne co-authored Cognitive Computing and Big Data Analytics, Wiley 2015, and is currently writing a book on the business and societal impact of these emerging technologies. Adrienne earned his BA in Psychology and MS in Computer Science from SGU New York Binghamton and his PhD in Computer Science from Northwestern University. And with that, I will give the floor to Adrienne to get today's webinar started. Hello and welcome. Thank you, Shannon. As always, it's great to be here. And for those of you that were on a few minutes ago, it's just great to be out of Atlanta much as I love that city. Last week was a nightmare. So let's get right into it. No need to rehash that. Everybody's seen the news about the airlines. Today we're going to talk about machine learning and the subtext is from discovery to understanding. One of the things I struggle with a little bit in preparing the my notes and slides for today is just how much math we were going to include, frankly. And the possibility went all the way from no math to, you know, little linear algebra, little calculus to doing the whole thing in equations. And so normally I would take kind of the middle ground, but I've just tried it. Yeah, I can't talk. Decided to try something extreme. We're going to do this with virtually no math. And the idea is to help people get an appreciation, not like a music appreciation like, ooh, you know, that's a good deep learning algorithm, I can dance to it. But to get us to think in terms of what problems are we trying to solve, how does this relate to human problem solving? And not worry as much about the actual numerical underpinnings. And I'll point to some things. And if people want more detail, certainly as always we'll give you an email address. We'll make it easy to follow up. But what I wanted to do today was really look at this critical area of machine learning and see where we're going without getting too bogged down into some of the minutiae. So my agenda is pretty straightforward. I'm going to go over a few basic concepts, learning, reasoning, understanding, et cetera. And then I'm going to talk about the two fundamentally competing, if you will, approaches to artificial intelligence in general. Then we're going to go right into machine learning and deep learning basics, sort of a mini tutorial, and then talk about trends. So without further ado, let's put some context in here. The title, as I said, is machine learning from discovery to understanding. I think it's important that we start out by having a common understanding of what we mean by some of these terms. So when I think about understanding in the context of artificial intelligence and machine learning and systems, it's pretty straightforward conceptually. You understand something if you have an awareness of the meaning. But one of the limitations that we have when we do webinars is you can't see how often I do air quotes around understanding. So I'm going to tell you right now that as we get into this, we'll see that the machines will help us understand a lot of things, but don't really understand much at all themselves. So the key phrase is here. Learning is to acquire this understanding. Understanding is the awareness of meaning. But the systems that we're most concerned with throughout the webinar series are the ones that can reason, that can apply some logic or some process to create value, if you will, from the data. So with that in mind, I've used this slide in a couple of different contexts in the past. This is a diagram that I borrowed from loop AI labs. And basically, the reason I use this so frequently is it's a nice graphical representation of the output of the system that has analyzed a volume of text. It was actually from Al Jazeera in Arabic. And looked for concepts and relationships between them. And in this diagram, even if you know Arabic as I don't, but you're a traveler as I am, you may see that there's a theme here and start to get some idea of what it was picking up. We've got 727, 737, 767, which to those of us that spend too much time in airports are all going designations, A300, Airbus. And as it turns out, the Arabic phrase that's in the, almost in the lower left, the highlighted area, I would not have known what that was, except at one of our conferences. So we spoke up and said, yeah, that actually represents airplane. And all of these other words and phrases represent aspects of air travel or flight. And the reason I use this slide so frequently is this is a great example of a machine learning system that has now gone through text. The system itself has no awareness whatsoever of the meaning of the text. It's looking for statistical relationships. So there's no understanding there, but there is this discovery of relationships that somehow all of these things are statistically related to each other frequently enough that they're pulled out and shown to the user, kind of augmenting the user's intelligence, to say these are concepts that are related. I don't understand what they are or how they're related. You know, you can drill down here. But I use this because so much of what we do in machine learning may appear to be magical, but it's really more mathematical. It's looking for associations. It's representing data in a digital form. Obviously everything ultimately gets represented digitally. But then looking for relationships between pieces of data that can be mathematically modeled and perhaps with the assistance of a human if we're talking about augmented intelligence, maybe then we describe being to it, or maybe we attempt to take it to another level and place a label on these relationships based, again, on observation and analysis of the relationship between how things are used, but the system itself at this point doesn't understand. And that's important. So let me go from that into the way humans do things. And we're going to look at how today's machine learning is inspired by human cognition, but it doesn't really mimic it. It doesn't really model it the way we do it. Oddly enough, some people are now looking at human problem solving in terms of machine learning. So it's giving them a different perspective, but I don't want anybody to think that when we talk about something that's machine learning that it's doing things the way people do. The important part here is looking at the scope and the scale. So when we're having a conversation and we're dealing with perception, perhaps it's via text. You're looking at it through your eyes or glasses, but you've got photoreceptors with 120 million rod cells and 6 million cone cells. Perhaps you're just listening in case of a webinar and you've got all these cells that are now changing the analog signal that you're getting from my voice, which somewhere along the way was digitized in the middle, but you're looking at that. But to actually learn from what I'm saying and to understand what I'm saying, now we get into the neurons and synapses and at this point the scale just goes crazy. You've got 100 billion neurons, 100 to 500 trillion synapses. We're not going to build a system that does the same thing, but we're going to be inspired by this and start to look at how we can change the architecture, hardware and software to model the way small, relatively independent processing units within the brain start to communicate based on electrical signals. And that's where we are right now. So I'm going to take that background and say something that hopefully you'll remember that I've been thinking about for a long time. And sometimes it's a terracy to talk in these terms. I think this is probably the most important thing, which is that it's possible, maybe likely, that we're going to build a deep learning system pretty soon that knows everything, knows everything there is to know, but understands nothing. I think it's important. I think it's going to be useful. But it's really mostly the point is that you can't assume that a system that exhibits the behavior of a biological system that does understand actually understands anything. So let's look at the major approaches. There are two twins to the spectrum for building these systems. And we'll start with the knowledge-centric approach because throughout history, this is really where AI began. I think in the first webinar this year I talked about the artificial intelligence as a discipline roots going back to the 1950s. The approach that was taken early on was to try and leverage expertise to create a representation of the problem area that we were studying and use the human expertise that would be captured in a system to map between that representation and some problem domain. And this is something that obviously requires a lot of effort. It requires smart people who understand the problems that we're talking about, whether the problems are tax law or medical diagnosis or weather interpretation to be topical. So upfront, the people who are building the systems have to develop algorithms or they have to develop or capture the rules that experts are using to solve the problems without automation. The advantage of this type of a system, and this is certainly the basis of things like rule-based systems or knowledge-based systems or expert systems, is that even follow generally established practices from logic, first-order predicate logic. And by doing so, even though it takes more effort upfront to build a system like this, you're capturing all that knowledge from humans separate from the data that you're going to analyze. And then when you make a recommendation or you have an answer, if you will, from a question, you can completely trace back and have complete visibility into what assumptions were made along the way to get to that answer. There's no ambiguity. There may be some probabilities associated. You may have a system where, let's say you're looking at a patient and they present with certain symptoms and your system is going to go through and look at all the rules, how do other doctors handle this, how is it viewed, everything has been digitized, now I can look at all the records. And maybe it says, I think there's a high probability that you have condition X. You would be able to interrogate that system and say what part of the data led you to that conclusion? Where did you get that confidence? Everything is evidence-based and it's perfectly visible. We're going to contrast that with what we have today as sort of the leading approach, which is using data-centric systems building approach with deep learning. So in this case, rather than using the human expertise to do all of the mapping and create the representations, you're, depending on having more data available to you, as I say here, let the data drive the process rather than have the process interpret the data, the data actually creates the process around itself. And the danger here is that these systems fairly quickly turn into a black box. You can't then go back and say, well, how did you arrive at this conclusion? And in some cases that's okay, you know, as long as the answers are in general better than or certainly sufficient to make your decisions. Maybe you don't need to go back into it, but there are going to be disciplines like in medicine, like in financials, in accounting, if you're giving somebody advice and they make a bad trade. So what I want to do is look at kind of how we got here in terms of going from being completely focused on human knowledge to automating the process, if you will, with an approach that's more statistically based than knowledge based. It's more based on attributes of the data than attributes of the designer. And to do that, we need to spend a few minutes on the idea of representation. So here I've got a taxonomy that should be relatively familiar to anybody that's played 20 questions. It's loosely based on an early view of the hierarchy in the world of nature. So it's something animal, mineral, or vegetable. And the point of using this as a simple example is that a lot of what we have in terms of human knowledge can be represented in a graph in a hierarchical form. And so the thing to keep in mind here is if we think of this as capturing knowledge and capturing the relationship between concepts, then as you go from nature down to animal, down to mammals, down to primates, at each step, each time you're going lower, you're going from something that's more abstract to something that's less abstract until you get to something that's absolutely unique. If you will. So if we start just all through the red line, an animal is something in nature, it's distinct from mineral. It's very important here that in a real graph like this, you can't have something that's in more than one category at the same level. You can't be both an animal and a mineral, for example. So now with animals, we're going to look at mammals within mammals. We're going to look at primates, hominide, down to humans versus chimps. At each level, you have all of the attributes in common with your peers that you would hear from above, all the behaviors. But you have something unique, which is why it gets split off. And then we get down into a specific example. And statistically speaking, somewhere on the call today, I haven't seen the list, we have someone named Bob and that's where you are. That's how we got there. So we're going from something that's very abstract down to something that's relatively concrete and something that's very specific and individual. If you think in terms of object-oriented programming, for example, we've got a class of objects called humans and we've got an instance of that object called Bob. Bob has uniquely identifiable characteristics. It's coming from the top down. Okay, so we can obviously represent this in a data structure. Don't need to do that today, but if you have any questions on that, you know, certainly follow up with me. But I'm just going to assume that we're putting this in there somewhere and we've got some storage for all of primates where we have all of their attributes. And that links to everything at the lower levels, links up and inherits. One of the talks I gave on natural language processing, I used an example like this. The reason I'm using here is, again, it's about representation. We can have the same words used in different contexts. And if we're trying to understand them as humans, we need to know how they're being used. And so obviously we also get into things like analogies and sarcasm. But let's for the moment just go straight through language. Words have meaning. Words have structure. So we have the syntax, which is the structure. We have the semantics, which is what they mean. So we're trying to build a system that can learn from usage how to either correct or complete when you're typing. Maybe you're typing a text and your cell phone is doing a look ahead and trying to predict what you're going to do next. That's based on a representation that has the words that you've used or the words that are known to the system. And the probability that based on what it has seen, what it's going to see next. And a lot of that is pretty well understood. But the issue here is that we don't have a standard way because we don't think about things that way. I don't think in my brain, where am I storing the word, boy versus buoy versus bye. But there is a neuro-synaptic, if you will, relationship. And so what we have to do in building these systems to learn is have a way of representing different concepts, different words, different phrases that may be logically related like the Al Jazeera example earlier. And having a mathematical representation that puts them closer in proximity in a way that we can describe as a mathematical function. So I can look at two words and see if they're close. Look at two words and see if, in the same context, one is, look at three words, let's say, and see if two are closer in terms of the probability that I should pick one and suggest that as a next step. So this is as close as I'm going to get to showing how we do this mathematically. But what I want to look at next is the trend in applying this type of reasoning to building smarter systems. And this is a diagram that was actually working on this in Atlanta. And it's a typical consulting diagram in that I haven't labeled the axes, but I'm going to walk you through it because I'm going to use it twice here. The idea is that over time going from left to right, the amount of effort or investment that we've put into any given system for AI using machine learning. The relative importance of algorithms and rules versus data has been steadily shifting. It's certainly not linear, this simplification. But the idea is that in the beginning and the beginning being, you know, the 50s, the 60s, the 70s, the main focus was on getting the algorithms and the rules and this knowledge from the people and being able to operate and build systems with relatively little data. As we migrated and where I would say to the right of the line that splits data where we are today is that most of the systems, most of the investment in these machine learning systems is really taking advantage of the availability of very large data sets. And the architecture and infrastructure to process them, which we didn't have in the 60s, 70s, or even the 90s. This is a relatively recent phenomenon that we have huge data sets available to us and the raw power to process them, which allows us, excuse me, to start to build systems that can look at the data and make inferences from the data about how to process that data going forward. We had to do more planning, if you will, up front when we were dealing with much smaller data sets, much slower computers. And what you'll see if you kind of take a historical look at this is that a lot of the algorithms are being used today. I mean, there have certainly been some advances in talking about those. But a lot of what's been done in the last 10 or 15 years, the knowledge that we had for algorithms and how we wanted to process this data has been around for decades, but is only now practical be used. And so this is a, I don't want to say an inexorable trend, we'll see why I think we're reaching the limits of how much is going to be done based on data driving the decision versus knowledge, knowledge driving the interpretation. But this is definitely a trend that you can map out, we can look at products and where it goes. So with all of that as backdrop now, I'm going to do a quick walkthrough of learning machine learning terminology that we're going to take a little bit deeper look at deep learning and then look at how we put all this together to get into real discovery. So, when we talk about learning, and we talk about people, we think of learning as involving different types of reasoning inference deduction, etc. So for the learning that we do with machines and again this is sort of air quotes because it's not learning in the sense of understanding concepts at a level removed from fact, but we're looking at machine learning. So the the simplest way to look at that, and probably the most powerful way is to define performance on a specific task, maybe it's identifying. Maybe it's a system to do differential equations, maybe it's a system to make a recommendation on music choice. Based on what it knows about you and your previous purchases and circle of friends, but whatever that performance criteria is that you define. We say that a system learns if it can improve its performance based on experience with new data, but not reprogramming. The learning of the improvement has to be based on the system changing how it interprets data based on experience with the data rather than you run it and you find that it gives you, you know, it's generally off by 10% or whatever and then going in and changing the code. No code changes when we're talking about this aspect of machine learning. Within machine learning. Two major categories in a subcategory supervised learning versus unsupervised supervised. We teach the system to either detector match patterns based on some training data training data is an important key phrase here. We have to have data available to provide this training. And that's one of the things that has changed a lot since the beginning. If you're looking at systems today, well, typically do training with anywhere, you know, on the order of 5000 or more examples of what a system, how a system should respond to input. So that's kind of the minimum threshold you may get up at, you know, hundreds of thousands of training data sets, but that's supervised. You learn by example, you present to the system, you build the system so that it will evaluate these samples, and then try and generalize from the specific. Right. So you give it a set of stimulus response parents and say the case where this is equal to one value and why is equal to another value. This is the decision that we make. And the more of those examples the system sees the more accurate it's going to get reinforcement learning is part of the supervised sphere, if you will. And here the system learns or develops its strategies and strategies. Again, in mathematical terms, how it's going to operate are based on feedback that you give it based on its performance. So that's part of the supervision. It's like training someone to do a job. And you give them feedback and say, yes, this is good. This is bad. It's that's the reinforcement and it's exactly as it sounds from behavioral psychology. You reinforce by providing a positive input for things that are done properly. Negative input or negative stimulus, which will be punishment, if you will, in the laboratory for things that are incorrect on supervised learning. This is where many things get more interesting. The system has to discover patterns based on experience. And so it's always looking for something that's novel. And this is where the systems today really weren't possible at all. It's not just their orders of magnitude better or faster or more accurate. You couldn't do this if you didn't have enough data to allow the system to, I don't want to say think outside the box, but to experience enough different data points to discover a pattern. So that's the fundamentals. Just to feature those to kind of put it in context. So when we have supervised, if you've done a lot of experience, you've got the training set with the QA pairs. This is the kind of stuff where we've had some very good success early on in diagnosis because there's so many documented cases to look at the relationship between symptoms and conditions. If you were to take something with the diagnostic and statistical manual, excuse me, of the psychiatric association that shows you that's, you know, the desk reference, if you will, or the PDR, the decisions desk reference for medical rather than behavioral conditions. It's creating an environment where the system can learn from those associations. Reinforcement. The best use for reinforcement learning that I've seen is when you have a situation where there are just too many possibilities. If you're building an autonomous helicopter, for example, that has to operate in three dimensions under all sorts of conditions. It's easier, if you're doing this, let's say you're doing it as a simulation before you get off the ground, to provide feedback when the system does something that is inappropriate or that is going to cause a crash than it is to try and come up with all the rules in advance. And so, you know, each of these approaches has a class of domains where it's most appropriate. And unsupervised is generally in case where you don't know to begin with what the possibilities are. If we're dealing with, my example here, network intrusion, you know what every sensor along the system's edge, let's say we're talking about a power grid, has recorded over every period of time for some historical values. Now, if you find something that's an anomaly, that's something to get reported. That's where you get into augmented intelligence versus artificial intelligence. And now you want that feedback. So you have unsupervised can find the anomaly, something that's unexpected, something that hasn't been seen before. And then there is this combination is almost everything is a hybrid, so we said, now you can either act on it or not act on it, but if you act on it, or don't add on it, you can provide feedback in there, which actually makes this look more like reinforcement learning, because now you're providing that. So what we want to do is look at how these things come together and what the technologies are today that allow us to execute. And the last one for definition sake, deep learning, which is what we're going to look at in more detail right now. When we talk about deep learning, deep learning can be used for supervised reinforcement or unsupervised. But the common thread for deep learning is that the architecture, again, which may be entirely in software is biologically inspired in that the processing units or the processing elements are loosely based on the idea of neurons that fire and connect with each other based on some association. And whether we take it to the extreme and say that we're trying to model human information processing, for the most part today, that's not what we're doing. So we have processing elements and you can think of them as being, let's say your entire desk is covered by these tiny processing units that communicate with each other. They fire when they exhibit some electrical response when they're stimulated. And the paths that those responses take are based on maybe loosely on fundamental assumptions about how the human brain works, but there's no dependency there. So if we're dealing with a model or a neural network and we've got all these little processing elements, each one by itself is pretty simple. We're not talking about a grid full of CPUs. We're not talking about a grid of personal computers. Each node, each element in the grid is the simplest possible computing platform, if you will. But it's only when we start to put them together and combine them and have them talk to each other that we get meaningful results. And when we get into deep learning, the key here, the separate deep learning from a simple neural network is that we're going to have multiple layers of these processing units that are successively dealing with problems. So let me give you a picture. In this one, we've got one neural network. That's the orange surface with a bunch of dots, and each of the dots represents a neural processor. And the idea is that these are each of the little dots are connected to each other. We get some input, and the input is in a mathematical form. It's a digital form. It's either a set of numbers that represent the state. If the input is, we're going to do mostly image processing today. It's a picture. Then each of these may be looking at a single pixel to begin with. It may be looking at a collection of pixels, but it's looking at some digital representation. And then based on the mathematical values that it sees, it's going to do a simple process and produce an output by itself. One layer of these looking at a picture on a typical frame from a TV screen today. If you've got 1920 by 1080 pixels, that's how many images you would have. And those could either be individually assigned to a processing unit, in which case you're going to need a lot of these processing units. Or maybe it does it by taking a group of 10 by 10 and looking to see what are the values. What colors do they represent? And every pixel you'll be able to, you can digitize and see what color it is. You can tell the RGB values or a hex value for the color. You can tell what intensity it is. And then you start to go, hmm, okay, by itself, just looking at it once like that, it's hard to get any meaning out of it. But the real meaning comes when we start to create multiple passes, if you will, through these things. So if we take this digital signal that's coming in, we do something with it and then we process the less abstract signal. That starts to get interesting. And if you're scratching your head right now, don't worry because the next slide is going to clear everything up. The idea here was just say we're dealing with multiple levels. And in some cases we need to be able to preserve the state or information about what we had before us. So that each at each step, we're building on what we had in the past. So, if we're not doing this as an exercise in deep learning for the moment, let's say I give you a picture. And you have the picture in analog and digital form, you can take a look at it. And I ask you to start to just do discovery. Tell me what's in the picture. Well, you might think what are the things that I know of that I can recognize in the picture in the abstract. And maybe I'm going to look at the first level for a concept. Is it a animal, mineral or vegetable? What am I going to be looking for to make that determination? Maybe I'm looking for some shape. Maybe I'm looking for some shading. But if you think of this as a flowchart, I'm just taking the lines out for the moment. But you go from one step to the next to the next, you make decisions. Maybe you can evaluate the decision and maybe you have to look at more than one thing in parallel. But basically, you're stepping through a sequence of decisions at each point you're doing some processing. And at the end, you make a decision and you say, okay, what you showed me was a picture of a cat, because the cat seems to be the universal answer here. Well, how this relates to machine learning rather than having a person do it. If we're doing this in the knowledge-based approach versus the machine learning approach, then again, it's up to the designer, the creator of the system to have the knowledge put into the logic of the programming to look for specific things. And if you've ever tried to work with vision processing or audio processing or anything that's not really very highly structured, you realize that there's a lot of stuff to look at. There's a lot of stuff you have to discard. We do it very easily. As humans, I'm going to show you some real pictures and sort of see the difference. But for a machine, it's really hard. And so to give instructions to a machine, that's really hard. So the reason that a deep learning approach is so powerful is that we can take something like this flow chart and convert it into multiple levels of neural networks to communicate with each other. And at the top, we have what's called a visible layer. And that's a neural network that takes in, as input, the digital representation of the observable variables. It's a horribly cumbersome way of saying it, but basically you start out and you've got all the pictures. And if the purpose of the first layer is to just find the edges so that we can start to figure out what's on there, now it becomes a little bit simpler problem. Now we can say, okay, what are the different methods that we can use to find within a picture? What are the edges? And if you think about how you would draw out edges on a diagram, what you're looking for typically is a change in brightness or a change in contrast or a change in color between pixels and their surrounding area. And for the moment we'll just assume we're looking at still pictures. So now in this first step, if we just get rid of everything else and we'll look at it as the edges, we've made a lot of progress, but we still don't know what we're looking at. So the next layer in this example could be, okay, now I've got edges. Let me find shapes. And to find shapes, instead of looking at brightness and contrast, maybe now I'm looking at the rules of geometry. And I'm trying to figure out if this little doodle here is actually an ellipse. It's close enough for a circle. Maybe it's supposed to be a circle. Is it a face? Is it something like that? If you know what you're looking for, if you built this with the knowledge that you're looking for people, you're going to handle the instructions, if you will, to each of the layers differently. But the idea is whether you're doing discovery or detection here is pretty much the same or similar, that at each level through the architecture, you're trying to refine what you're looking for. So you can only have one visible layer. We can have any number of hidden layers. Obviously it gets more expensive. It gets more complicated. And then the last hidden layer is the output layer where you say, okay, this is a human being. And then if we get into facial recognition using the same type of thing, you would just have something deeper. And so this is kind of flip the world on its head. It's going from the very concrete, the very specific, the bit level representation of the pixels to the abstract, find me a person in this diagram. And that's how it steps through. I'm going to take this diagram and just try to declutter a little bit and say, well, if you think about that, once you've found the edges, and again, that's generally pretty simple, how do you, what sort of things do you look for to determine the gender? Can you figure out regional origin? Where does somebody come from? Can you figure out their emotional state? And that's the kind of thing that we've been talking about recently in some of the webinars in terms of looking for gestures. And that's a lot harder to figure out what a person is trying to express when you've figured out that they are a person. There's an example that I was reading about recently that I thought was kind of interesting, using a deep learning system. And they were training it to find barbells, gym equipment. And the unexpected result was it couldn't find barbell without also having a hand or a forearm. It thought that those were inseparable because all the training examples were that. So there's a lot of things that get into it. And what I always think is interesting as you go through this is that unlike the flow chart where you can trace back, once you start to have the system create its own models and figure out how it's going to do this, you get a lot more power, you get the power from the data, but you lose visibility into what was used for decision making. So now I'm going to give you two examples. These are highly personal. So looking for features. Here's a picture of three adorable kids. I used to have kids that look like this. And the issue that the question is, which one is not like the others? The downside of a webinar is that we can't have immediate feedback. But usually when I show this picture and we talk to people, they say, okay, I can see where the edges are, the lines. I can figure out that it's three people. And what's different? In this case, you would pick generally the center person because of the dark hair. Otherwise, they're all pretty similar. So you're looking for what are the distinguishing features. Now, same three kids. Different setting. It's all about the context. Which ones are different? Now you can't see enough of the hair to know which one has dark hair. But if you happen to be a sports fan, you would know that there's an issue here because we have two of one kind and one of another. If you were to put this in as a discovery and just say, find the one that's different, this would be a much, much more difficult problem for a machine vision system using deep learning to figure it out. They all have hats. They're all similar skin tones. They all have similar features. What is it about it that anyone on the streets in northeast U.S. but immediately recognize how to factor these? So the context is the most important thing. I just put this one in for Shannon's benefit. When we're looking at pictures and saying, okay, if you have a lot of noise, it makes it more difficult. If you're looking at this and saying, find the tree, find the ferris wheel. You have a higher probability of success than if you're just saying, from discovery, what can we learn from this picture? So it's so much context driven. I'm going to quickly go through. So the reason I wanted to do this topic this month is because there's a lot of things going on. Here's an article that came from the MIT technology review just this week. The dark secret at the heart of AI and it's good article. But the, and a couple of slides from now, I'll give you the full reference. The issue here is that once we get into deep learning, as I said, as you go from one layer to the next, because the system itself is learning from the data and the person that's trying to understand the results from the system can't possibly read the data based on the volume. We get to the point that the results that are being recommended by these large, high volume data sets through multiple layers of deep learning systems. It's really changed the definition, if you will, of what is an algorithm. I mean, an algorithm, when you're learning to write programs, you've got an algorithm that's a sequence of steps to achieve some goal. And you develop that before you apply it to the data. So it's algorithm driven. If you're creating the algorithm of all intents and purposes on the fly from the data, it's in response to what it's seeing, then the typical system isn't tracking that. You have no historical record. You have no audit trail to go back and figure it out. And so you can do a recommendation. And unlike one of the knowledge-based systems, you can't just say, how did you arrive at that? So there's a lot of work being done today in an area on the explainability, if you will, for lack of a better term, of systems. It should be auditability or visibility, but that's the term you're using. So this is one that I recommend. And the question of whether that's important, obviously I think if you get a system that's using deep learning to make a medical diagnosis and it's trying to inform a physician, but it can't show how the conclusion was raised. That's an issue. I don't think it would be approved as a medical instrument. And here's one from Finance. This is last year's financial statement from JP Morgan Chase talking to their investors about how much they're using machine learning now to predict outcomes to actually help them make investment decisions. And I've been involved with Wall Street firms off and on for a couple of decades. I've seen systems like this that were installed, not with deep learning, but using very high-level programming images like small talk in the object-oriented world, where you could create something that a trader could quickly create a financial instrument, run it, make a buy-sell decision based on it. But if that system, if that state of the system wasn't preserved, then when the auditors come back, you know, three days, three months, three years later and try and deconstruct the logic that was the basis for a recommendation, it's just not there. And that's the issue that we have right now with the deep learning systems that can't be audited. So I'm going to wrap it today with another look at the trends and map them to this diagram. So where we are today is that the availability of big data from things like social media, from things like the IOT, this is just going to continue to increase from groups that are putting behavioral data in the public domain as open source. We covered this last year in one of the webinars, Yahoo is putting a lot of behavioral data. It remains to be seen what will happen as a result of the recent congressional decision on allowing ISPs to sell personal behavioral data. But no matter how that works out, we're going to see more and more data available. And the horsepower to process it, so we're going to continue to see this. Deep learning is now practical. The success that we've had with deep learning, particularly in recognition, the stuff that Google has done with Google Translate when it switched over to a deep learning approach from a more knowledge-based approach. I think that was about a year and a half ago. Just phenomenal increases in performance. IBM just announced a dramatic improvement in recognition in natural language processing based on deep learning. At some point you have to say, well, we may not be able to understand how a decision was arrived at, but we're going to live with it based on performance. So all of the success is spurring more and more innovation and investment. So the investment is now leading us to look for newer types of domains to apply this to. And that's where, in the title talking about going from discovery, now we're seeing it being applied sometimes in a hybrid mode with algorithms and rules, to looking for things that would be unsupervised and trying to put those in context. So there's certainly a caution when the transparency is critical. Sometimes it's regulated that it be critical and sometimes it's not regulated, but it's still common sense that you need to know how decisions were made. So now there's a movement in this explainability research. And I think that for the next few years, as we're getting to the point where we know where a system knows everything but understands nothing, we're really going to be focused on some better hybrid solutions to augment intelligence for these critical applications. And with that, I'm always running to the last five minutes and say, well, I have a few more things that I would rather hear some questions. So Shannon, can we open it up to questions? Absolutely. We have a few great questions coming in. And to answer the most commonly asked question, just a reminder, I will be sending out a follow-up email by end of day Monday with links to the slides, the recording, and anything else requested. So diving right into it, Adrienne. So when we say a machine will not understand, do we have adequate understanding of, quote, unquote, understand to make that statement? I've also heard that machines produce conclusions that humans don't understand. Are these statements in conflict with each other? Can I throw up my hands and say I didn't understand that? Could you repeat that? Because I think I have an answer to it. I want to make sure that I do. Do we have an understanding of, quote, unquote, understand to say that machines will not understand? And to add to that, I've also heard that machines will produce conclusions that humans don't understand. And are those two statements in conflict with each other? I don't think they're in conflict at all. And let me tell you why. The first part was do we have enough understanding of understanding to say that a machine can't do it? And I would say that I believe that it's absolutely true. If you look at the, when you talk to people and you try and understand, it's hard to have this conversation about overusing the word understanding. But if you try and determine somehow objectively whether someone understands the way we do that in school, which may be antiquated, but we give them a test, right? And is every student who's ever had a poor grade on a test knows, ooh, the test just, you know, tell you what I don't know, know what I know. So if you start to look at kind of an interview process and how you assess out what someone really does know, it's generally by asking them to put data or information in a different context and use things like analogies. Now, we, we've talked in this series and we will again in a couple of months about natural language generation and the difference between natural language understanding again in your quotes and generation. I think that what we know that disqualifies systems from the level of understanding that I would attribute to humans is that ability to use something in a novel context. The issue is, for the second part of the question, if a system can make a recommendation or, I think that's how it was worded, that a human can't understand. Well, absolutely. And that's really what I'm getting at in one of the issues. The system may make a perfectly logical or valid recommendation. It may also make a mistake. And we are at this point, unless the system can explain, as you would with a, if you had a knowledge based system, an expert system, you could say which rules fired, you know, go backwards through the chaining of the rules and say, oh, okay, well, you know, if I'm doing underwriting for a bank's loan system, and it says, okay, I'm going to make a recommendation for this person based on the fact that I read a Facebook comment that they made and a tweet that they made and all that. And I have determined their emotional state and it's a bad loan. As long as you can go back and explain what data was used and how it was weighted. And that's something that we didn't get into. Like, and every time you have data that's going from one layer down into another layer down into another layer in deep learning, when you're going from the concrete to the abstract. There are ways of doing like backward chaining and changing the weight and how things are interpreted that as you're training the system. Without that, I don't have any problem saying yes, a system is going to be able to create a good recommendation that I will not understand. I just, it comes down to trust and transparency. But on the first part, I would love to have that conversation with the person, you know, in terms of whether they can understand it, because honestly, I took the exact opposite position with someone well known in the field, who was very upset that IBM was running Watson ads saying that Watson had read all the lyrics of Bob Dylan and understood them and came up with some conceptual themes based on reading the lyrics. And the person was upset that, well, no, it was all about protest. It was all about war. And just because that wasn't in the lyrics system could never figure that out. And my point was, well, it performed as well as a college freshman who also didn't know that Bob Dylan was writing about those things. So it can create the illusion of understanding. And for many purposes, that may be good enough to let it slide in terms of whether or not it understands and it may understand the associations or the relationships. Well enough that for all practical purposes, you will trust if there's transparency. I don't want people to think I'm saying, oh, this stuff is worthless because it doesn't understand the way it understands doesn't have to be the same as the way a human understands the way we have concepts stored in our gray matter based on relationships and reinforced neural paths can be completely irrelevant to the way a natural language system or an AI system using deep learning stores and finds those relationships. If those relationships are useful to produce answers and results, then call it learning. I'm okay with that. Sure. And that was quite an in-depth question, too. So I'm afraid that is all we have time for. But if you have more questions, go ahead and submit them. Adrian, if you don't mind, I'll get you the questions that we didn't have time to get to, and then maybe we can get that out in the fall-up on Monday as well. I'll be happy to. Thank you, Adrian, for another great presentation and really appreciate it, especially after travel woes getting back from the conference. We're glad you're home and safe and thanks to all of our attendees for being so engaged in everything we do. We really appreciate your insights and questions coming in. And I hope everyone has a great day. Thanks, Adrian. Thanks, Shannon. Thanks, everyone.