 So, to start with, let me just introduce our dean, Meng Chen, and he is going to introduce our distinguished lecturer. Thank you, Stanley. It is indeed overwhelming to see so many nurses here at an early talk with the title Machine Learning, Dynamic, Economic and Stochastic Perspectives. I never would have thought if the word stochastic is in the title of the lecture that we'll have standing room only and also being streamed alive and archived recording as well, and people still streaming in here. It must be because of the outstanding lecture that I'll be introducing in just a minute, but this is one visualization that Purdue Engineering is the largest among top ten engineering schools in the United States. Now, as we get towards the end of the semester and this academic year, we're delighted to host one more, and there's one more at the end of the month, of the distinguished lecture series from Purdue Engineering. We started this about a year ago, and each year we bring about eight most outstanding speakers and scholars from around the world in different disciplines. And the hope is to inspire all of us towards the pinnacle of excellence at scale. In the case of data science, machine learning, artificial intelligence, they mean different things, but let's say we bundle them somewhere close to each other for now, there's a lot that we can be doing here at Purdue. I'm proud to say that in applications of data science, areas such as infrastructure monitoring to imaging, digital agriculture to advanced manufacturing, Purdue Engineering has tremendous talents and success in the application of data to these domains. In the foundation of data science, including machine learning, we are growing with tremendous speed and momentum. For example, glad to brag about our work in computation and visualization of data, including the hardware software system of bring-inspired computing, led by ECE faculty Kaushik Roy and Anand Raganuthan that won the SRC DARPA $40 million five-year center in the nation in bring-inspired computing CBRIC. And today's distinguished lecture is somebody who has reinvented the field from so many different perspectives, computational and stochastic and statistical, cognitive and biological. I met Michael Jordan I think about a year ago at the 60th birthday party for my former co-advisor Stephen Boyd and I listened to Dr. Jordan's talk contrasting the philosophical differences between the derivative view and the integration view in control versus optimization community and it was fascinating and I took the courage to ask if Michael will be interested in visiting Purdue next year and thank you so much for taking the time. And I'll abbreviate Michael's outstanding biography to just a few sentences. Dr. Jordan is a member of both NAS and NAE and a member of the American Academy of Arts and Sciences. He has received numerous awards from diverse intellectual communities from IEEE and ACM to Siam and IMS. Notably just last year during the International Congress of Mathematics Dr. Jordan was a plenary speaker at that event which was I think 2018 was the last one and we're all so excited that you are here today Michael to talk about your perspectives in machine learning. Thank you so much. All right thank you. It's a great crowd. It's great to be here at Purdue. Do you have a little one more announcement? Yes please. Because these gentlemen, can you move to the overflow room because we're not allowed to have more people than this room can hold. Although there are quite a few seats down here if some of them want to quickly grab them. We do have an overflow room. Can you just walk with Maria? We're going to bring you to the other room. Okay we have live streaming so don't worry about that. That's fine. I'll give them a couple of seconds here to diffuse. In fact let me take away my jacket so this is for somebody next to the dean. I don't bike in the afternoon so. Yeah you're very brave. Alright. So yeah this talk is for the young people in the audience. I'm glad to see a lot of students. This is for you. What things should you be thinking about studying? What are the opportunities and so on. So I'm going to bounce back a bit between philosophical conceptual issues and but all the way try to what research problems come from these considerations. So I'll have a little bit of a blend of academics and industry. One thing I've liked about being in the Bay Area is all the industry around us and it's influenced me a lot. I'm also going to China a lot these days and seeing development of IT industry there and it affects my thinking quite a bit as you'll go to see. Okay so let me talk about this field. Let's call it machine learning for now. If you were there this morning at a panel you'll know that I kind of think of this as statistical decision-making under uncertainty. That's the real field that we're all talking about but here's a buzzword and of course it is being called AI these days and I if you were there this morning you see I kind of resent that. I don't think that this what that's what this is at this point in time at least we don't know what intelligence is. But anyway it's not new as most things but it's not even new in terms of success stories. So just very briefly I kind of have companies like Amazon.com here in mind but you know already in the 1990s they were taking in vast amounts of data and they were using it for business purposes in particular fraud detection. If you're building an online commerce system you can't have fraud rates like credit card 3% fraud rate. So they use machine learning to take in large amounts of data about fraud not fraud and made the distinction. They got really good at that. Even more important perhaps is supply chain management. This sounds maybe dull you know it's a business school thing or whatever it's not dull and and it's broader than business schools. It's all about you know I've got a billion products and then they've got to be assembled and brought to the right place at the right time to meet certain customer needs in different seasons and all that. And in the old days if you had a thousand products and you know a million customers that was sort of classical you could build it by hand eventually no it had to be all done by data. So companies like that already in the 1990s were modeling the probability the ship would get stuck in the Indian Ocean so you wouldn't get certain parts and and you know and do that across not just a billion products but all the parts that go into those products. And so they continue this day to have you know hundreds of people working on that and companies like Amazon Alibaba and so on. Having built those systems that support that and doing lots of other things like A-B testing was quite important in that era. Now they had the computers behind the scenes that could do all this data analysis. They started to think about other services they could offer now for people not the back end because the data in the back end were about people but why not just directly influence people's you know give them new services. So recommendation systems you've all seen that at places like Amazon you go and interact a little bit you buy some books they start to recognize other books that was critically important for that industry to arise and for Amazon to become what it is today. All done with machine learning all done with just the kind of gradient descent algorithms that we're talking about today. And in fact the algorithms have really changed that much. Okay so kind of what changed is now in this third generation versus focus on pattern recognition problems more of the kind of human variety a computer vision is like trying to imitate human vision or animal vision and speech and so on. And the old algorithms with the new data sets and computing are able to do really very well on those things. So people got all excited. I think probably too much so given that there was a kind of a billion dollar industry implications of the ideas already and people didn't talk about it too much but you know very much more important really than bullet three so far economically just no comparison. But bullet three is kind of about human capabilities and so people getting both panicky and excited. You know but it's going to be decades before we get kind of anything resembling intelligence and computers and probably not even more probably more than that. So what it is it's gradient descent based data analysis at large scale and so it kind of can find things and do pattern recognition and that's all pretty interesting almost becoming a commodity. Now it's at the end of it well definitely not so I think what's emerging is what I'm going to talk about as a kind of the decision side of machine learning. So if you think about machine learning there's two parts to it one is do the record and find patterns and recognize them and so on the other is make decisions. Definitely not the same thing even if you're thinking about one decision like a but a consequential one not about where do you put an ad placement or something or if there's a giraffe in the image rather like a medical decision. So you go into your doctor's office and doctor takes all the data available at that moment in time for maybe from all around the world is floated and they measure all kinds of things about your body including your genome you know height your weight your blood pressure but many many other things they put that into some big machine learning system and outputs that you look like you're about to have a heart attack you need to have a heart operation tomorrow and so is that a decision? No that's just a number coming out of the network it's like 0.7 is the threshold it's 0.72. Are you just going to accept that as a decision? Well no you're going to want to have a dialogue you're going to say well where are your error bars first of all and current systems aren't producing much in the way of error bars. Secondly you're going to have some what if questions what if I were to exercise more what if we get a second opinion or what and even more importantly where did the data come from that led you to make that 0.7 okay if you build a system it's just working right now on today's data which would people do they take the image data set and they use that data to train a network and let it run for four or five years it's going to be out of data in all kinds of ways it's called provenance and database field but they're just kind of interested in counting aspects of the data if you're thinking about inferential it matters when the data was collected for who on what machine to decide whether it's relevant to the current decision that's one piece of a very much bigger sort of thing that's just one decision you want to hold dialogue about the decision back and forth but now it's rarely one isolate decision it's almost always a decision the context of other decision makers there's often big batches of decisions like a system like uber right now in Chicago is making huge numbers of decisions about where to allocate things and so on commerce systems are doing this and so on so that's the really level we want intelligence we don't want at the individual agent level replacing human being which is really really not going to happen we wanted at the overall sets of decisions okay so let me in fact dig in this word it's a little bit from the panel kind of discussion this morning but people are now using this word intelligence for for you know AI for this beta analysis and decision-making problem so do we really have intelligence systems well I'm going to argue no and I don't think we're going to have them in a meaningful sense of intelligence we have things that mimic intelligent behavior from data that's different alright so suppose you're up on Mars in your Martian computer scientist you're looking down at the world the earth you're trying you have a very primitive computers you're trying to get inspiration for how to bring more intelligence into your computer so you look down on earth what do you see on earth that's intelligent worth kind of trying to mimic right well most you all certainly think of brains in minds maybe we can figure out how those brains are working down there they seem to be doing something intelligent and understand the mind I would argue we're not remotely close to that understanding even a single neuron it's a cell and it's got all kinds of protein swimming around they're in all kinds of structures and then it does all this electrical stuff and it's got this branching tree with you know tens of thousands of ramifications and etc it's actually going on forever you know even a single synapse is extremely complicated so it's going to take you know more lifetimes than I have to give it's going to take hundreds of years really start to understand not just a neuron but we put now billions of them all together in this complicated way you know we're not there we're not even close we don't have the principles okay and even psychology we don't understand really how you know abstraction abilities of humans really arise and the ability to talk give a talk and talk and semantically communicate all those things it's really interactions of us with the world at all kinds of levels of abstraction at time it's based extremely challenging so the poor Martian computer science as well as I can't do I can't figure all that stuff out what is there something else down on earth that I could look at and try to make more intelligent right and maybe you already get my set up a little bit most people don't they scratch your head and think well maybe animals you know no that's a squirrel is remarkable robot but it's not super intelligent alright so I would argue that we you know that many people just kind of missing what's obvious that if you take an econ 101 class you're told you're to look at a city like Chicago you know every restaurant is needs all has all the ingredients it needs for every dish it serves every day every household has the food it's not perfect but it works really amazingly well it works 365 days a year for 3,000 years at all scales something is intelligent about that it's a tell it's adaptive it's robust it's the works of many scales it's there all the kind of things people want out of machine learning it has some of those properties I have all of them alright so to me and we know what some of the principles are at whereas in neuroscience and psychology we don't know what they are yet there it's micro economics we have some ideas of those principles it's not we don't know everything and we're gonna have to think about new kinds of markets now what kind of markets well markets that bring in data analysis all right so minimally think about something like a recommendation system suppose I have a two-way market I got producers and consumers and they're not just attached to each other in the usual traditional way rather they see each other through a recommendation system already that's an interesting question and that's a billion dollar question to do that well okay all right so I'm going to be digging into things like that during the talk all right a little more philosophy before I really get going so I wrote a blog this past year if you haven't seen this I would like to encourage you read it I don't usually advertise my work but this was on medium and I think I've had about 400 thousand views of it by in fact a lot of kind of famous people I usually get four or five views of my papers too many uses of the word stochastic but anyway this been read by a lot of people and I do want more people to read it have this dialogue so it tries to say look this buzzword AI first of all it's not one thing we should be lumping everything together it's a mistake to do that there is the classical let's imitate the human idea there's nothing wrong with that it's what McCarthy had in mind and others for better for what's it's a what a lot of the people were self-proclaimed AI people still have in mind which is this aspiration of we have computers now there's hardware and their software that looks like brain and mind maybe in our generation we're gonna be able to put intelligence inside of a computer okay so that was McCarthy 50s touring was thinking that way I'd say in the intervening 50 years we've made very little progress on that frankly and I've been in a neuroscience department I did psychology and so on so I've been watching this for a long time so it's great to continue to aspire to it but it's just not what's happened right what has happened is what's sometimes called intelligence augmentation different people which means that the computers itself not smart but it organizes information in a way that helps make us more smart and certainly search engines do that and all kinds of computing things in our life make our make us more intelligent more creative to that will continue right a search engines are we not intelligent I don't think anyone would argue that it is but it behind that a lot of engineering intelligence when they design of it and it makes more intelligent okay all right but anyway what I think is emerging is more interesting than just even that it's this intelligent infrastructure something you know maybe call it internet of things if you will but internet of things was more about just the more prosaic problem of getting IP addresses for all kinds of objects alright that's fine but more interesting is what if all those objects have data streams associated with them and those data streams are a little bit incoherent and they have to be made coherent so the decisions can be made at scale so that's kind of the bigger problem and think about not doing that just for like factories and cars but think about internet like in the medical domain you know all kinds of sensors are all around bodies and and you know so on in hospitals and all that data flows so that people get better and better treatment over time the system supports that that's what I have in mind okay so one last little slide about this so I don't think this humidity of imitate human imitated AI is really the right goal for a lot of these things because it really is in about one making a smarter computer smart and replacing a human right it's making a system that works so think about self-driving cars should it be an autonomous sit car should we go for autonomy or if you read most most people's writings about this that's what they say we want autonomy no you don't want autonomy you want the cars to communicate among each other if a boy just ran out the street a car sees that it tells all the other cars around it every car tells every other car where it's trying to go what it works to do it's more like the air traffic control system right and so you don't want autonomy you want a federated system that trades off things and interacts and you want the principles to build such a system critically and those principles do not emerge by looking at a single car and a driver and trying to replace the human and just focusing on as the core problem okay we're going to actually solve the problem without putting in a drive a fake driver so much we're going to have all these sensors and all that sort of thing right it's kind of ridiculously think about it if you think about previous engineering disciplines like I in this blog I talked about chemical engineering and civil engineering which were super exciting in their day and it took a few decades to roll out but people really did great things there and they developed other kinds of principles that didn't exist before it's like from chemistry to chemical engineering it's a big gap right and those principles are what we should be thinking about right now so can you imagine a chemical a chemist saying we need to create this field called chemical engineering where we know how to create factories way we're going to do that is we're going to create an artificial intelligence entity artificial chemist who's as smart as a chemist who will figure out how to build a factory right that's ridiculous that's not what happened that's implausible but if you read like the literature from say DeepMind or something that's what they're saying we're going to figure out how to solve cancer by creating an artificially intelligent agent who then looks at all this data and figures out how to solve cancer come on all right so anyway if you had a little formula people talk about AI if you're going to use this term which again I'm going to push back on as long as I can it's data plus I was supposed to machine so you think that's all you need is the day the other machine well no you need it in the context of markets and trade offs and people and utilities and so on so forth okay and lastly one last philosophical slide which is that the IT companies that are doing a lot of this machine learning and building these services are really not thinking about a market at all they're thinking about we're going to build a search box or a social network and it'll be a service that you like and you use all right it's all in the virtual world we're not trying to connect producers and consumers and make a market we're just trying to provide this the service and we know you won't pay for it because it's not that good so since you won't pay for we have to create an advertising market and we make our money off that all right rather why don't you try to think about producers and whom are relationships and make that so strong that people are willing to pay for it and you take a cut and this is not mystery as uber does this okay so in the world of transportation it's not perfect it has all of its issues but the world of transportation they've created they don't advertise on the uber platform right they don't need to and this advertising model you know is really broken a lot of the the roll out of IT it's really put us in a bad place so we didn't think in a new way and I think markets is a good starting place okay so now what do you really work on when you work on this kind of set of problems well here's some of the challenges in this let's call it II these are things I've been working on for the last 10 years and you sort of scan your eyes there there's some pattern recognition about not really we kind of you that is kind of pretty good shape all the greedy descent there but you know real time cloud edge markets multiple decisions and so on so I decided the rest of the talk I'm just gonna pick two or three of these I'm gonna say a little bit about conceptual issues especially for decision-making and markets and then the last part I'll go a little bit through some of the actual mathematical algorithmic issues that arise and some how some of them can actually be solved they need a little bit of new mathematics but not always but I want to give that flavor as well and if you're interested in the latter what I'm going to be doing is giving snapshots of a talk I'll be giving tomorrow where I'm going to dig into the actual mathematics more okay so let's talk about multiple decisions okay so again AI classical people don't think too much about multiple decisions they think about network outputs a decision that you're done or maybe a sequence that's called reinforcement learning but often you have lots of federated decisions so let's think about that all right so when decisions interact because there's a scarcity of resources you know that's what econ people talk about and people and AI haven't been thinking about scarcity very much so in fact here's again one of the big success stories the as recommendation systems you know what they are they take data from one customer and they cluster customers and or cluster products and make recommendations between them all right so they've been used for all kinds of things like movies was an early one books so suppose that I build a recommendation so that it makes a recommendation of a certain movie now is it okay to recommend the same movie to everyone so will this happen first of all yes that's how recommendations work it's a big black box they take a lot of data when I come into a site like Amazon they make a feature vector out of me you know like 40,000 dimensional feature vector they put it in this black box and out pops a list of recommended books or movies later you know someone else comes in moon comes into the same site they feature as him and maybe it's a nearby feature vector but whatever it's a different feature vector they put him into the same box and they'll recommend some books so they can easily recommend the same movie to me and to him easily and probably if it's Amazon they recommend the same movie to 100,000 people a day that that happens I'm sure all the time is that a problem well no there's no scarcity here what about books if you recommend the same book to 100,000 people all by that a problem well no it used to be there was scarcity but now you can print on demand you can get books like into the warehouse within a day so you don't even have scarcity there do you have well scarcity meaningful of course it is in the real world or scarcity all the time all right so here this came from a little bit of travel in China I watched people building business models around recommendation systems because recommendations had to become a commodity you can download software to do a large-scale really distributed recommendation system so people are doing that for things other than books and movies so here's one how about restaurants so that'd be nice I'd like to arrive in Shanghai I don't speak the language very well and I'm by myself I'd like a recommendation system to know about me and you know give me a recommendation a high quality we're not just an advertisement all right well there are companies that tried to do that right and so they download some recommendation software they take some data wherever they can get it and they put that meant and maybe they do okay they wreck but it's not what I want it's a list of things it's maybe some reviews it's complicated I don't want all that complexity all right moreover if it works if it starts to work for a few people that's fine but as soon as like half of Shanghai is using it you know five million people you could easily recommend the same restaurant to ten thousand people or more and they all go there and there's big line you've created congestion okay so it's not unfamiliar to a econ person but a lot of these CS people realize this after the fact it starts to get skated and they start to have new problems they didn't think about well come on folks you should have thought of that beforehand in fact it's not that hard to sort of solve this kind of thing you create a two-way market and what that means is that I I'm in Shanghai pull up my cell phone I'm ready at 6 p.m. I'm ready to eat I'm hungry I want to push a button on my phone have the phone geolocate me and have my feature vector somehow formed a recommendation system and then have that transmitted to all of the to app to all the restaurants around me and then their app it says here's a possible client for you his price point is here he likes such fun cuisine or whatever and then some of them will decide to bid on me then a bidding mechanism will ensue on my phone I'll get a Bing and I'll see a restaurant I'll see some dishes and I'll see a price and I'll see the distance and I'll say great I accept so it's like uber it's not that complicated once that transaction has happened that seat in the restaurant is taken and if he comes in later hit too late okay and if I don't accept maybe then they'll offer me a better discount it's a market right and so on and then other restaurants around can see when one is full and they can make other they can make discount offers that's how it works all right and it's not that hard to do data science in support of that it's just not mostly being done the IT people think they're gonna understand everything about humans is like advertisers do and then give them what they want right it's silly here's even here's another one what if you build up recommendation system to recommend people routes to the airport or wherever right if a very few people are using it no problem as soon as half the city is using it you send everybody down the same street right it's obvious people know this but then how do you fix that and the mindset in Silicon Valley as well we fix that by doing a super fancy AI you know this is Zuckerberg he uses the word AI but he doesn't know what he's talking about our AI systems will figure it out and what does that mean well they'll understand enough about humans to know what they really want okay it's I hope you would see how silly that is right so how do you do this in the real in the right way all right well if he and I are being sent down the same street or you know 10,000 of us are being sent on street that well the system shouldn't have to figure out who gets the street we should have a bidding mechanism so if I can reveal that I'm not so in a hurry today to get to the airport I'll take a back street it'll take five more minutes and I'll pay less and I'll save the money for a future trip I'm gonna be happy and he's in a big rush he gets to have the street for a little more money he's happy all right that's that's that's the right way to do it and so how do you do this well just literally every piece of street bids on the people to pass over the street and maybe new market mechanisms are needed but you know that's kind of the way to think about the problem and then here's my favorite example again this comes part of me in China so you know people now have a little they have a little bit of money so grandmother's got a thousand you know or hundred thousand rmb she wants to invest it she doesn't know what that means her son says hey I can download an app on your cell phone it'll invest it for you and she says great so then the app will recommend to buy you know Alibaba stock and that's fine if it's like a few people but what if half of China is using it well then Alibaba stock shoots up artificially and we've destabilized the market okay so I hope you get the feeling that it's that you know data science is needed here these are data-oriented systems it's not just classical markets where I connect producer consumer I have a classical link it's all about data analysis but the two together so if you're technically it's micro economics meets statistics in a computing framework those three fields together there's some power that has not even been started to be talked about really or realized as an academic and as a business person here's another example more people are making music that ever before because laptops you can make you know my 12 year old makes pretty good amazing little songs on his laptop you can drive a taxi during the week and put up music on the weekend and people will actually listen to it but you're making no money it because there's no market for you and that's bad when there's not a market for human creativity okay so how do you fix this well you don't just stream that stuff to people and then because they're not willing to pay for it create an advertising mechanism to monetize that Spotify and so on or a subscription no you create a market and it's here it's not all that hard to everybody who's putting music up on SoundCloud you give them a dashboard of the data's been flowing let them see the data so I learned that I was popular in in Peoria last week you know 5,000 people listen to me now that I know that I can show that data to the venue owners in Peoria and they will say well I see if you come here and we advertise to those people it's not even advertisements information flow that you're coming they're gonna be excited they're gonna come we'll fill the venue you make ten thousand dollars and then if you do that three times during the year you start to have a salary right and that can happen not just for a few superstar singer types that the record companies decide to anoint that could happen for like a million people in a country okay so mechanism like that as simple as they are they use data together with markets create jobs all right and you could do this now not just for entertainment but for information services that's really what YouTube should be it's more of an information service than just an entertainment thing that someone put up some entertainment okay okay that was the economic side of multiple decisions let's now go back to the statistical side so if your statistician this kind of cartoon will be familiar to you how do you make decisions in the real world well partly you have context and that's where markets come in the other part is you have uncertainty and you better be really clear about that uncertainty okay so here's a typical decision jelly beans cause acne someone has this as a hypothesis okay some great new idea I've got and so it sounds ridiculous it probably is like most great new hypotheses are so the scientists say well I'm going to investigate and what's that mean it ideally means doing an experiment so I take a hundred people put 50 in the jelly beans category 50 no jelly beans and for six months these people eat jelly beans every day these people eat none and after six months I look at their skin condition and probably there there'll be some differences but if you're a good statistician you know something like a permutation test or something you know how to get p-value that says well if there's really no difference probably I'd see the reserve differences you know it's high so I'd say okay I get it it's not real I'm not going to make a discovery in that situation all right so that's the classical setup that we've all learned all right but that's never the real world in the real world people say oh I see my dumb idea wasn't so good I don't give up I try some other dumb idea and I try keep trying a whole bunch of them so if you're ever worked in a hedge fund industry or been around friends in that that's all they do all day long they think of clever little ideas if the price of this goes up there's a and they try to have a circle data and see if it works they usually doesn't and they're smart enough about the uncertainty know that and they don't bet on it and eventually they find one that works and they bet on it okay and but a lot of fields do this all right so they said the person comes back as oh I see it's not it's not jelly beans it's green jelly beans or it's red jelly beans right and they keep trying and I hope I think you know what will happen but just be really clear about it so every one of these the scientists come in and they get a new one fresh batch of a hundred people so we're not the kind of overlap problem all right but you know what will happen finally at some point they'll get a hundred people by chance alone they'll take the 50 people already have a bad skin condition they'll put them in the jelly beans category and the other fifty will go here after six months these people have bad skin condition okay but you don't know that that's the reason and you say well I discovered I made a discovery now the problem is that every will get all excited in the laboratory we've made a discovery send it to the journal the journal is excited because it's an interesting result they publish it and then even worse the newspapers whose job it is to scan the journals and find the interesting results of the year say that's interesting it's interesting because it's probably false okay so this is not new to me obviously or to statisticians but work on this all right and not that many people outside sisters think too much in that way and it really is a multiple decision-making problem and so just to give you a little structure I would tell you a little bit false discovery rate and tell you about an economic perspective on a false discovery rate so here's kind of the setup is that say I'm doing nine high tests of hypotheses or I got you know nine ideas and suppose that in five of the cases on the left the gray ones there's nothing to discover a typical situation it's just nothing there if I see a difference it's by chance alone whereas in four cases there's actually some discoveries to make so I run some procedure I do some neural net or whatever and say in the four cases at the bottom it makes a discovery p9 p8 p2 p3 but actually God knows this that only two of them p2 and 3 are real the other two are false so the fraction of false discoveries is 2 out of 4 and so half of my discoveries are false that's not too good all right false discovery raises the expectation of that proportion okay so now are there procedures to control false discovery rate it's definitely not just the classical threshold and output of a neural net or something right and to really drive home this difference let's look at something a little quantitative let's suppose we're in some industry we're doing 10,000 different ab test today and this is really the right so Amazon has a website and you've been there it's kind of amazing looking that's not some designer that did that that's a bit testing they said let's try green instead of blue let's try this instead of this and they just try it every day on half the people get this half the people get that roughly and they do maybe 10,000 a day all right so let's suppose that we're in a situation that the industry is a little bit mature that 9,900 of those tests they tried there's really nothing discovery it's not better to put blue instead of green it's just it's not true mature scientists scientists like that most of the things you think of aren't you know are actually not real but a hundred of them are real discoveries you could make and make some real money off of it okay now you apply your your your fancy machine learning techniques and you have a really good machine learning system it's probability of making error of a type one meaning that when there's nothing to discover you set as discovery is smaller than 0.05 so of the 9,900 only 495 of them do you say discovery when you shouldn't be similarly your neural that has got a very good power meaning when there is a discovery made you say there's a discovery and the power is 0.8 so again pretty good so your engineers have designed a great system but now just multiply out 0.05 times there's 495 false discoveries out of the 100 non-nulls you made 80 discoveries if you add them up your falsely portion is 495 out of 575 right meaning you go back to the boss at the end of the day he said okay I gave you a lot of money to do all these tests how many discoveries did you make today you said well I made 575 here they are the boss then says how many of them are false I said 495 all right that's bad you're gonna now spend a lot more money following up trying now you're gonna find out they don't really work okay so there are other mechanisms to control this at the level that that proportion at the level 0.05 or is it just you'd have to live with this no there are mechanisms there's something to do to Benyaminie Huckbird was the first one but it's very batch-oriented you take a huge batch of decisions to wait for a few days and then do it all in one lump all right so we've been working on an online version of this more of an economic version of this or instead of making test after test or test at some fixed level or that level is kind of gas to be really really small as you make more and more tests that's kind of the cause of perspective we let that level change over time and here's the key that false discovery portion is a ratio so you can make a ratio small in two ways the numerator can be small or the denominator can be big so if you're someone who's making a lot of discoveries you're going to get some maybe more and more wealth okay because that ratio will be under control for you all right and if you're not making many discoveries your alpha will go down and you'll start to see that I'm not making any discoveries I'm in the wrong field so what does a human do at that point do they just continue to do the stupid thing over and over again and then eventually can't make any more tests because they it's also well no they move to a different field where there's new discoveries to be made and that's good that's what the statistics should tell us so anyway we've worked on this tiana zurnitch in the middle has kind of been leading this in the last round she's a very nice paper on doing this and distributed asynchronous setting with dependence kind of just a really really real world thing that is industry ready her latest paper so we have a way of setting these time-varying alphas let me just show you a picture it's very economic in the beginning of your life I give you a budget of certain number of alpha points and every time you do a test and you don't make a discovery you lose some alpha points if that keeps happening eventually you dry up and you can't make any you can't do any more science but if you ever make a discovery they go down down down you made a discovery suddenly I give you some more wealth and there's a formula for doing this all right in the formula correctly tells you and the following was pretty strong result which is that you can stop me at any time during my lifetime say how many discoveries have you made up until now and I'll say you know 45 how what fraction of them are false will less than point oh five and you can do that any time in my lifetime or you can even do that over a group of people okay so this kind of way of thinking exists and it should be everywhere and it sort of needs to be so if you think the decision part of the machine learning this is a big part of the story all right so let me return to that slide so these were I talked a little bit here about multiple decisions and I talked a little bit about markets I won't talk about any of these other things really but when the last part of my talk I'm going to spend a little time getting down into actual algorithmic and mathematical challenges that arise when one starts to work on these classes of problems so now we're a little more technical with no apologies this is where the kind of the if you're a student you need to be learning about these kind of topics so most of the problems we work on are nonconvex optimization problems or their sampling problems in nonconvex optimization there's all things about dimension about saddle points about dynamics and so on we've got to be quantitative about all these things not enough just to be metaphorical and in the in the sampling world we also have nonconvexity and we have dynamics that's complicated and we want to link this to optimization so we have a overall toolbox and then we got to bring market perspectives here to where we're often interested not in avoiding saddle points when we're optimizing but going to saddle points because those are equilibria where there's a trade-off being realized all right so again I have a whole bunch of work on all this tomorrow I'm going to go through some of this material more slowly but let me give you a few highlights and someone will tell me when I'm starting to run out of time here okay so this was a for early paper for us that really helped set the tone of this of a bunch of our projects after that Chi-jen who was on the job market this year led several this and she has become a world leader in nonconvex optimization by this line of work in particular escaping saddle points efficiently is really really important so here's a saddle point in three dimensions we're going to focus on how do you you're coming down rolling down the hill and the saddle point is kind of bad because it slows you down and it may slow you down for a large amount of time now in three dimensions it doesn't look too bad but if you're in a hundred thousand dimensions there might be only one or two directions out and it may take you a long long time to find those directions and eventually escape so we need to quantify that how long is it exponential in dimension is a polynomial what what is the rate of escape from saddle points and if you see real practical learning systems in fact like neural nets what you'll see is the error goes down really really fast and it plateaus out and it stays there for a good while and then it dives again and then it plateaus out it keeps doing this and those are saddle points and if you wait for not long enough you think you're done all right and in an online system trying to make decisions you might you know just have to make your decision but you should know that no you're not done and we should have a theory that supports that kind of influence okay so we're getting a little bit of a math here but I'm going to just highlight a few things so first this this is a result here on the left let me just focus on this line right here so this is a classical result due to Yuri Nesterov this is for the convex case so we have a bowl shape and I'm going to run gradient descent which just takes the steepest descent direction okay and I want to get to a ball of size epsilon around the optimum there's a single optimum in that world and I want to be a ball of size epsilon question is how many steps does it take to get in a ball of size epsilon all right for convex problem and so that's kind of all has a little bit of a complexity theoretic side to it and so it's not an old result 1998 and the number of steps is given by right here it's one over epsilon squared okay so if I want a little small ball it takes me more steps and it goes as quadratic all right more over there's a Lipschitz constant here a 2 and the initial distance to the optimum we're trying to optimize a function f right that's a beautiful result this is not asymptotic it's true for any epsilon there's no hidden constants there's and all the constants are nice natural ones okay so this is kind of a result you aspire to in this field and over time this kind of result has been achieved for lots and lots of areas of optimization and there are lower bounds and these tend to match the lower bounds okay so we said well now if you run gradient descent on a surface that has got saddle points what are you going to arrive at okay so we had a paper that's showing the asymptotically you will not arrive at the saddle points so that was known for continuous flow but not for discrete so we prove that then we prove that that gradient descent alone can take exponential time in dimension to get away from all the saddle points so that's bad all right then there's another paper that shows that if you add some noise stochastic versions of gradient descent you can escape all saddle points in polynomial time so that's a very important result but it's just polynomial it could be d to the 45th power or d to the third or something not so good all right so we studied this for a while and we came up with a result here it is right at the top which is the number of iterations in a non-convex landscape okay to go past all the saddle points and arrive at a local minimum is again one over epsilon squared so as if you're on a convex problem with stochastic gradient descent okay pretty amazing there's a Lipschitz constant there's an initial distance optimum so again it's one of these pure beautiful results except for that little tilde there and little till this tradition they used to hide dimension dependence because no one was able to analyze it but here we did do the analysis of dimension events that's a whole point of the paper and it turned out to be not polynomial not exponential but actually logarithmic our particular proof techniques based on coupling arguments from probability using brownie motions and that's responsible for the fourth power to be in there I don't think it's really a four it's probably just log but that's what we were able to get okay so all right so that's an early result using probability ideas using some convex non-convex geometry and using this simple form of dynamics to show that you can actually have very very favorable results so we will talk a lot about why do these large-scale machine learning things work well stochastic gradient everyone kind of agrees is a reasonable thing and this is actually supports that folk wisdom this is a theoretical result that shows why it works all right the next critical step and I say is even more critical so 15 minutes thank you even more critical which is to start to understand these things more deeply you wanted to know well that result we just showed me just go back there actually I should be using this that's you know pretty are we done that looks like it parallels this case it looks pretty beautiful is that the best you can possibly do all right and that's a really important question to ask this is now a real complexity theory which meaning in some machine in some setup is there a lower bound is there the is the best you can do so you know the field is finished when you arrive at a lower bound okay and so mature fields tend to have lots of good lower bounds and statistics has quite a few like Cramer-Rau lower bound you may have heard of information there is quite a few I should say I don't think computer science has very many okay they have a few but they're very low there's a big gap between them and the actual upper bounds that are known it's it's a hard field but it's also a little newer okay well optimization theory has some very good lower bounds to and it's partly because it's older older field and a lot of them came from the Russian school of Nemorovsky Nestrov at all okay so so here we're going to work on lower bounds and so I forget what I have in the next let me just see yeah I don't have let me just say something in English in the world of gradient based method so I'm the suppose I build a machine who can take a grid who can get a gradient oracle I have access to gradients and nothing else okay function values and gradients right what's the optimal rate of convergence for that machine okay that's our complexity a theoretical question and and Nemorovsky answered that and showed that it goes from one over epsilon squared to one over epsilon so much faster okay so actually that's not quite right he goes all the way to one over square root of epsilon so even faster and so there's an algorithm that achieves that and that was discovered afterwards by Nestrov and it's a algorithm that takes not just one gradient but it takes two gradients and does a kind of great clever combination of them and this was a big surprise to people that this was even possible and that algorithm goes at this faster rate and it achieves the lower bound so it's kind of the best algorithm so all right so we've worked on this problem and we said well that algorithm it's still very hard to understand and what it's called an accelerate on what does it mean to accelerate in the optimization world not just a world you're hopping along a set of points what does it mean to go faster on that set of points okay doesn't really clearly and so this is part of the problem people really know to develop a good general theory of acceleration because I don't think they the right topology to support it you need a continuum where you can go faster all right so you need to embed the problem in continuous time and we did that and found that that gave a huge amount of insight in continuous time you can turn up a knob and you can accelerate sorry it's right until someplace something breaks there's a phase transition continue time which doesn't exist in a discrete time right so I another kind of meta message here is that both optimization computer science that almost always focus on discrete time algorithms discrete everything and you're missing some insights by doing that you got to go to continue this time okay so we did and it turns out that the acceleration algorithms due to nestrov and so on and a whole bunch of others all came from a single object we called a bregman Lagrangian and this in tomorrow's talk I'll dig into a fair amount but there's a mathematical continuous time object it's a function of position and velocity and it has something called a bregman divergence in it and a few kind of parameters around if you do standard character variations you get out a certain differential equation and if you specialize these alphas and betas and gammas to particular choices you get specific dynamical systems that are the one nestrov's and and a mirror descent one and a cubic regularized Newton and all these continuous time algorithms that have been studied over the years all all fall out of this one master equation now more over this master equation shows you that no matter what rate you choose you can do it in continuous time but you will always follow the same path in the phase space so it actually has nothing to do with speed at all it has to do with the path you follow it's a geometric acceleration has to do with geometry and not with just speed all right more over if you ask to go too fast in a continuous time that's okay you can go as fast as you want but that just means you're just sort of changing your clock it's not really that important all right but if you're trying to go too fast in continuous time there'll be a breaking point at which you could no longer discretize this differential equation it's impossible mathematically and that breaking point is where you made a discovery that there's an algorithmic transition back in discrete time that there's a you cannot do something in a certain place so that that's for tomorrow at the end of all this work we were able to develop some new algorithms because now we have this Lagrangian we turn that into a Hamiltonian we use something called a symplectic integrator which is a smart way to integrate differential equations they're very stable and now we just put all that into the computer and it's able to optimize using a symplectic integrator so you was talking a little about derivatives versus integration here's using integration in the optimization setting and it just as good as Nesterov but actually it's even better than Nesterov because if you turn up the step size if you see we moved over to the left we're going faster Nesterov flies out from stable whereas this new integrator stays stable okay so it's a really good way to get downhill okay so again for tomorrow's talk I'll talk a little bit more about the consequences of all this now we have a little bit of if we're going to finish time he gets insights we also know a little bit of how to deal with non-convex geometry saddle points what if we put the two together for example so what if you're flying down a hill but it's not convex there's some saddle points down there is it good to be having an acceleration and there's been two different intuitions and no one's and this has been an open problem some people say well I'm flying down the hill I hit a saddle point I'll just go roll back up the other side and that'll slow me down others say no the acceleration allows you somehow to blow past the saddle point it's been intuition so anyway we worked on that with these two tools we used our continuous time Hamiltonian or symplectic framework and we used this non non-convex geometry coupling probability idea and again this is Chi-Jian who led this and at the end of the day we were able to get again very strong results there is our result at the very top and I don't want to get into details but the rate went from 1 over epsilon squared to 1 over epsilon to 7 fourths that's a better rate that's faster okay so this is a proof that acceleration helps you in the non-convex setting you go you fly past the saddle points more quickly with acceleration so these tools allow us to get it results like that okay next kind of step we've been doing is to put this in the domain of stochastics now and so here's a question if we don't do just this is grading descent but we do stochastic reading descent or we do a diffusion a browning motion kind of driven dynamical system we're trying to and we're trying to get down hill quickly is there an optimal way to diffuse okay so if you ever learned about diffusions or browning motion it was probably in physics or in finance and it's really in both cases she's a model of some phenomena of how things move all right but as an engineer we're often interested in I want it to move in a certain way I want to go fast down the hill all right and so people in statistics know this from like Markov chain Monte Carlo you design an algorithm which should diffuse and get to an answer and they love it if it went fast but they don't have the mathematical tools to do that they started talking about mixing times but they can never really get their hands on that so it's kind of been a unsatisfying thing optimization theory tells you how to get down fast and we even found that there's optimal ways to optimize that's what this bregman Lagrangian is telling you there's an optimal way to optimize is there an optimal way to diffuse so that's a brand new class of problems okay just I'm gonna shake give you a couple results but for some of the young students in the audience this is brand this be decades of work this is gonna be really interesting and challenging and you know if a young comal dwarf was around you know he would probably he or she would probably start working on this because it's really very very pregnant with possibility so anyway what you do here you study things like long event on Markov chain Monte Carlo it's just a gradient descent that's what I'm not using f anymore I'm using you but I have a gradient and I add some browning motion to give me the stochasticity so this is a kind of classical thing to study it turns out it has a rate and that rate has been analyzed here here is a good example of the rate in green so it's one over epsilon squared which is kind of surprisingly fast given I have all this stochasticity but it has a D it's not logarithmic dimension it's just D that's kind of bad and that's just kind of the stochasticity in all these dimensions and that's a recent result this is a very important paper by Jermus and Mouline but they studied this stochastic differential equation there that's just gradient descent plus noise right the work I've been talking about has these two gradients and it has kind of two equations to it it's oscillatory it's more momentum what if we put momentum into the stochastic framework will that help again this has kind of been open people that haven't really known how to do that or at least how to analyze it well here's how you do it you just write down two equations not just one and you put the gradient the browning motion in the velocity term and you integrate the velocity to get the position so it's more of a second-order dynamics all right all right so now can you analyze this stochastic differential equation that's kind of again the fun mathematics here is yes you can use some of the same coupling tools we were talking about it's a reflection coupling instead of a classical coupling and you use eto calculus to do this instead of just regular calculus but it's not nothing particularly all that hard and after we did all that analysis we got a rate which was not just one or a plon squared it was actually one over epsilon so much faster for this thing and we got one from D to square root of D which is even more impressive this algorithm is way better than classical allgevan which is better than the classical MCMC algorithms like Gibbs sampling and all we're really starting to get closer to better algorithms for MCMC and they are non-reversible and they're based on second-order accelerated dynamics and the inspiration came from optimization theory I'm going to skip this I've got five minutes left just again if you come tomorrow you'll see more about a comparison of optimization and sampling and also a relationship between this that the notion of how fast can you sample let me tell you about one more little thing which is go back to this market design issue so I hope I convinced you earlier that thinking about markets is a nice way to think about lots of emerging IT problems but again what is the algorithm mathematical challenge well market design is a field of its own mechanism design market design you have to solve to form a market you have to do some kind of algorithm moves you in some parameter space and usually you're finding equilibria like a Nash equilibrium it's where you know he goes down and I go up and so both of us are as happy as we can be so we know we're become experts on gradient algorithms in high dimensions what if you run gradient algorithms not to find the bottom of hills but to find saddle points and this is a classical field of study the classical algorithms do one step of go down and then one step of try to go up and so they try to find so and that's provably doesn't work it can oscillate it just it's a known failure all right so we've been working on this one problem we've worked on is how do you find the helipoints high dimensions and we want to find these Nash equilibria and this is actually different than saddle points so a Nash equilibrium is a saddle point but it's axis parallel his axis is one axis my axis the other axis I want to be going down he wants to be going up if I take that same saddle point and tilt it and put it out there somewhere it's still a saddle point but it's not a Nash equilibrium he's going to make progress on Hexax we'll move off we would like to move off of that but our gradient based algorithms don't know how don't know the difference all right and so classical algorithms for this and econ aren't gradient based they're way more complicated they don't scale all right so long story short with Eric I should have introduced the students here Eric has been working on this with me and then my two other students Lydia and Hori have been working on the other problem I want to briefly mention which is competitive bandits and two-way markets so bandits are a beautiful way to think about decision-making and statistics machine learning I've got K options I don't know which one of the options the best I try all of them a little bit I start to figure out which one looks the best and I start to pull that option more often pick that more often right but I also have uncertainty and start to make sure I cover the things I'm uncertain about so there's algorithms like UCB that do this pretty well but it's not been done in the economic context of other decision-makers usually okay so what if both me and Moon are doing this we are both trying to find the best option for us and the other side is the other side of a two-way market there are merchants over there and they may have preferences among us so we don't know those preferences so we start pulling these arms and I start to see that I'm liking arm one but I realize he's liking one arm and I'm like the I'm starting to realize the merchant over there prefers him to me so I start to hedge my bets and look at the other arms so there should be a regret bound that reflects that extra exploration needed in the competitive situation so that's with Lydia and Horiya we don't have a paper yet on that that's very active right this moment and we do have a paper with Eric and let me just show you Eric's result this picture is not is my last one I'll finish it's let me parse it just really quickly there are three green crosses there in this particular problem those are Nash Equilibrium see their saddles and they're actually Nash Equilibrium you'd like an algorithm to go find those there's a blue one there which is a saddle point but it's not a Nash Equilibrium it's tilted all right we ran kind of a bunch of different algorithms that are gradient based on this and an example was the black one if it starts those red points it will go down and find the Nash Equilibrium but it also go to that blue point it doesn't know the difference right our new algorithm which is kind of gradient plus a little bit more is the red curves there it goes towards the bad equilibrium but then it moves away and moves to a Nash Equilibrium all right so so first of all it's interesting to analyze this we're still not done with that in particular what is the conversions rate of this algorithm because you're paying an extra cost that you went towards something bad you had to sense that it was bad and move away so it took you longer but you have to measure that somehow okay okay so that's all I want to say let me just have a few concluding remarks that was kind of a whirlwind tour through a bunch of different ideas again this slide I already had earlier just sort of say it more slowly computers are currently gathering huge amounts of data for and about humans to be fed into learning algorithms and often the goal has been to use all this to imitate humans to try to make computers smart like us and again I don't think against that goal I still think it's really what's happening and I don't think it's the most interesting thing to be doing either okay it leads you down in the role of the whole point of the computer is to learn about people and provide services to them to understand them and it's a little bit just implausible that you're going to do that even with five people but think about 500 million you really get understand 500 million people from their browsing patterns no all right so we want to provide this in the context of market and so when data flows it's not just to be used for learning algorithms it's used to create value to create markets and if your IT person or if you're an entrepreneur which I hope some of you are the audience I hope you resonate to my message which is now you if you think of it this way you don't have to make money off of advertising which is where Google and Facebook have all gotten stuck that's why they're having so much trouble doing the right thing because that and so if you say no my role is to create connections between producer and consumer how can I do that you've created a market that probably is going to be a more healthy thing for humanity just overall okay so this slide I've been using for about ten years but let me just have it up there at the end this field is coming of age but it's really not it's going to be quite a while until we have really what I would call it engineering discipline we have just people building things out there sometimes trying to do the right thing and build good services for people sometimes just trying to make money sometimes both but what we really need is this engineering discipline where we start to think about what is the problem how do we assemble all the pieces how do we break out of our classical boundaries of you know CS versus stat versus e and all that how do we see there's all kind of one problem here and how do we educate a new workforce to kind of solve problems in this way thank you very much so before we go to the Q&A section let me just make two comments one Michael mentioned that if you be giving a talk tomorrow morning that's for the data science foundation I'm not the person organizing that but if you're interested just go to Google and type data science foundation workshop for you and then you can find out the information I believe it's free for undergraduate students and then for grads I think it's 10 bucks just tell you advance it to pay it so yeah so just it's a market that's right it's marketing okay so that's number one number two is that I got a question from our colleagues and I feel that I have the burden to ask you this question since we're in Indiana and we are big basketball lovers and what do you think about Larry Bird about Larry bird don't know what that is so so Larry is a pretty big basketball player during Larry bird I'm sorry never mind I don't know Jordan I was in and yeah I was in I was in Boston as an MIT as a young professor so I definitely know who Larry bird was how do you think about it they're both great okay so hi any question from Thor so thank you very much for the fantastic presentation I also agree that we are very far away from actually smart computers and stuff my question is more you know to your criteria on what is that what is it that is missing to achieve that and like what is this missing link that can allow us to achieve that of a machine that could learn or be conscious yeah great question so I'm gonna be kind of say I don't know but so one way to try to answer it is that work on problems where it seems that you really need some new some more abstraction you need some more semantics semantics is you know the fact I'm talking to you is a semantic relationship among us right it's and there are semantic networks out there there's kind of logical expressions there's if you work in an area of natural language processing you have lots of data and you try to make predictions like what word comes next and you do things that neural nets can do pretty well or translate strings to strings but going down into a semantic representation of understanding what's being said and then reasoning about that they're not doing okay really at all but if you're serious about that field you try to build in that kind of thing you try to engineer I'd say a semantic network what's called an ontology often in industry so a lot of industries now may have an ontology with 200,000 nodes in it 200,000 nodes in a graph of you know this person is a friend of this person this person is you know married to this person so on so forth or products relationships and you try to bring those together with the machine learning sort of stuff and it's a big engineering thing right you can start to get systems that can answer some simple questions or have some very very simple dialogues I'd say in ten years you'll have things that do pretty good question answering kind of stuff and even some very simple dialogues in narrow domains and then they'll kind of break as soon as you get into a bigger collections of people and all that and by the end of our lifetimes maybe they'll be some like online you know find a flight to Paris and you can really interact with the computer and have a real dialogue about that but it's gonna be very slow engineering progress kind of like going to the moon you know that level of big engineering efforts could be needed now somewhere along that maybe some magical happen that there'll be a deeper understanding what kind of abstractions are we talking about here is it like logical forms or is there some other way to think about the representations how come humans are so fluent at this and so I think working on those problems are probably the best way to discover that I don't think looking at the brain or the mind is sadly I mean just it's too complicated but I think trying to build those systems will probably help I'm not so sure there will be magic because in this notion of being able to abstract and have intelligence with like right now we're communicated at a very high level that our computers will be left in the dust right there look they're still to kind of down the pixel level or the edges and we're up at this kind of you know very abstract level any word in any language is very very rich you think about the word not in English think about all the what does not mean right not today not tomorrow not you not this not that not all every one of those versions of not has a different subtle state of semantics okay every one of them and it depends on the context that shamanic can shift we all know all that we don't even think about it right the computer has to learn all that but not just from lurking at strings of data it's got to learn the context in which that sentence was uttered so that you understand the semantics of that okay somehow you know I don't know what the magic is to get there okay now so what if we found that magic would it be how great would it be well I'm not so sure it would probably change lives a lot of but you know it would be we just have a new human there just happens to be artificial that would be exciting to some I'm not that we have so many humans why do we need it you know another one really I want more these services that make human life better and I think having humanized a bit messed up right now in some ways and I want them to be better so I'm emphasizing this markets because I do see more intelligence there of a different kind that's not about the human that allows us to build better things and better systems and better believable trustable things and so anyway yes your online markets example one question one concern about this is you talk about what the consumer wants yeah what the the restaurant for example wants yeah but often the you know what the the market maker in this sense wants these are like in the case of uber these examples often they're kind of natural monopolies these are things that work very you know the more you have a monopoly the better it works your goal is to accomplish that monopoly and so if I think for example restaurant recommendations my goal might be very well served by having everybody show up at a restaurant not be able to get in and they think wow this is a great recommendation engine it sent me to the restaurant that everybody loves I'd push back against that I mean I there's certainly markets do not solve it quote-unquote in fact you need regulated markets and part of the whole story will be what regulations are appropriate for these markets but your example there if people are showing up and not getting served the utility is to have to eat well and if no one's eating well that's that's broken no one's going to play in that market anymore they're going to another market where they can eat well right and I don't think it there are some natural monopolies but there you know if you do microeconomics you learn there's kind of reasons for them it's not a typical market phenomenon to be a natural monopoly and you can kind of break them by doing things like you know loyalty programs why do we have so many airlines still why doesn't there just one airline well you know I have my points on united I'm not gonna go fly Delta you know it sounds stupid but really that's really important that there's a little loyalty between a producer and a consumer and that leads to breaking apartment operates and so there's all so I'm not a microeconomics person but as usual in my academic life I like that I'm ignorant about a whole field that just feels that their way of thinking is real it doesn't solve all the problems it has a whole bunch of other ones that's cool I like that and even in the ad world which I was bashing a lot I know when people started to online ad markets they just didn't use classical victory auctions or whatever from market design that didn't work they had to develop some new ones same thing here all right but I'm gonna push back against people that say well no we know markets don't work see all the unhappiness in the world because of markets that's not what you're saying but there will be people saying that and no the three thousand years of human development from you know the sticks markets is the number one reason why it's happened right the ability of people to come in and trade and economic prosperity follows from that so there's something very robust and very healthy about that suitably regulated and suitably transparent with trust mechanisms is a path out of our current state that I want us to exploit better okay so have we have limited time can we take one more questions thanks your lecture and I have a question that nowadays and there's some new methods like curiosity skin or jive and base methods which is curiosity curiosity curiosity skin and which is relate to the decision-making so whether you think this field is can really combine with the non-converse or convex optimization I you're kind of down in a particular little algorithm there and I let me just say that there is a lot of innovative thinking going on in kind of the neural network world where people trying out stuff a lot of it is reinventions of things and a lot of its people just are narrowed down to this one thing so curiosity well was that really mean that's a kind of a metaphor for me it probably means you have some uncertainty and you're going to sample in places where you're a little more uncertain you're going to favor that well as you may know there's a whole area of optimal experimental design there's a whole world causal analysis and then there's a bandit literature just so I was talking about a minute ago I don't know which of the k arms of the best it's not supervised learning therefore I picked if what if I pick each one of them ten times and I see which one is the highest and then pick the highest that's provably a dumb algorithm all right a better algorithm is to have error bars around each one of the means that I get and I pick the one has the highest error bar okay because now if it's really high because it's good that's I'm going to pick it but if it's really good because I'm certain I'll pick it too that's curiosity and a very clean mathematical way and there's a lot of theory there so I don't want to you know diminish people's cure you know cleverness to think you have new terminology and stuff like that but you haven't my you know thinking of mechanisms like that in the world of neural nets you haven't gone outside of the whole scope of the area that many people have been working on and especially the younger people in the room don't just focus on neural nets you know again I love them it's been great progress it's been fun to see but there's this whole broader control theory statistics it's said optimization that if you're a young person you should be educating yourself and all that and then be creative on top of that thank you yeah yeah he's asking about what should I advise students well so one thing I didn't talk about today is that we have a data science program at Berkeley and we have actually kind of a new division and a college emerging and it's been a struggle with all the deans fighting it and everything just to say here you have a dean who's not trying to fight it if you're lucky but one of the things we've done bottom up without any deans helping us whatever as we'd help we designed a bunch of classes that for undergrads and so the first class is called data 8 at Berkeley I was on the team that designed it and I'm now designing a follow-up class and we're pretty proud of it it is a class that you learn it's a for freshmen and you assume they know no math or arithmetic and you assume they know maybe no computer programming all right so you're going to teach them python all right so but you're gonna teach just enough Python to do something interesting statistically so for example I talked about a B test and permutation test where I got two columns of numbers and if it's really the same I can put them together I could permute them and I'm still in the same null distribution I can get a P that bubble I can describe that to you you would understand that in about two minutes with no math no Greek symbols no nothing you get the kind of the beauty of it I think and students do then you could say how do you do that in Python well I need a list of some kind you could teach them enough Python to do that and then you could ask a really interesting conceptual computational question which is how do you do a random permutation how do you do that I got in items in a list I want to permute them and get a uniform at random permutation so I'll leave you with thinking about that the naive thing you will think about is kind of swapping all pairs that gives you a permutation but it's cost of n squared that's not good in the modern world is there a faster algorithm I can tell you the answer to be yes but we make students think about it about half of them kind of figure it out right then the cool thing is that they put it in Python and they progress and then we get some real-world data so a typical example we use is here's the ethnic composition of juries in Alameda County here's the population ethnic distribution in Alameda County those are two columns of numbers are they the same or different of course they're a little bit different but as a statistician are they really different and so students love that they can use their tools to actually get a p-value for or the juries biased in Alameda County and I can tell you the juries are biased in Alameda County they quantify that and then they can go on to all kinds of other problems all right so hopefully that inspires you a little bit so we're doing no math there but you can see there's like symmetries there's permutations there's group theory somehow sitting behind the scenes there's a probability theory and so then slowly over the next three years we introduce a little bit more master support that so what is the math well it's probability and statistics it's some optimization it's sort of some you know algorithms and computer science and some data structures but you know that is kind of modern stuff that's most useful to us and then I think some econ but you can kind of craft your thing but anyway it's our job as professors is actually interlaced these things in a single class you look at the classical way of teaching python they'll teach the same syntax we do when they get to an example it won't be some statistics or AB testing problem it will be how do you do Fibonacci series well Fibonacci series are fine my 12 year old loves them but I don't use Fibonacci series in my life I never will but do I use AB testing yeah I mean Jeff Bezos uses AB testing all day long so we need to teach those kind of things they're inferential so people talk about computational thinking that that's taking over well no it's it's big part of it but there's a whole other part about inferential thinking of using algorithms to decide what's behind the data not just process the data but what would they come from that's inferential so you have to if you're gonna be in this field get those both styles of thinking one you get mostly from classical statistics I want you get from computer science but ideally good universities will actually blend them and they won't just put lump it you have to go all these classes plus all these classes it'll be each class has a bit of a blend thanks for asking yeah it's great