 So I'm going to talk today about something I don't think I've really discussed before, which is my journey to deep learning. So nowadays I am all deep learning all the time. And a lot of people seem to kind of assume that anybody who's doing deep learning kind of jumps straight into it without looking at anything else. But actually, at least for me, it was a many decade journey to this point. And it's because I've done so many other things and seen how much easier my life is with deep learning that I've become such an evangelist for this technology. So I actually started out my career at McKinsey and Company, which is a management consulting firm. And quite unusually, I started there when I was 18. And that was a challenge because in strategy consulting, people are generally really leveraging their expertise and experience. And of course, I didn't have either of those things. So I really had to rely on data analysis right from the start. So what happened was from the very start of my career, I was really relying very heavily on applied data analysis to answer real world questions. And so in a consulting context, you don't have that much time. And you're talking to people who really have a lot of domain expertise and you have to be able to communicate in a way that actually matters to them. And I used a variety of techniques in my data analysis. One of my favorite things to use was this was just before kind of pivot tables appeared. And when they appeared, that was like something I used a lot, used various kind of database tools and so forth. But I did actually use machine learning quite a bit and had a lot of success with that. Most of the machine learning that I was doing was based on kind of logistic or linear regression or something like that. And rather than show you something I did back then because I can't because it was all proprietary, let me give you an example from the computational pathologist paper from Andy Beck and Daphne Coller and others at Stanford. This was trying to think maybe 2011 or something like that. 2012 and they developed a five-year survival model for breast cancer, I believe it was. And the inputs to their five-year survival model were histopathology slices, stained slides. And they built a five-year survival predictive model that was very significantly better than anything that had come before. And what they described in their paper is the way they went about it was what I would call nowadays kind of classic machine learning. They used a regularized logistic regression and they fed into that logistic regression if I remember correctly, thousands, I think, of features. And the features they built were built by a large expert team of pathologists, computer scientists, mathematicians, and so forth that worked together to think about what kinds of features might be interesting and how do we encode them so that things like relationships of contiguous epithelial regions with underlying nuclear objects or characteristics of epithelial nuclei and epithelial cytoplasm, characteristics of stromal nuclei and stromal matrix and so on and so forth. So it took many people, many years to create these features and come up with them and implement them and test them. And then the actual modeling process was fairly straightforward. They took images from patients that stayed alive for five years and they took images from those that didn't and then used a fairly standard logistic, regularized logistic regression to build a classifier. So basically to create parameters around these different features. And to be clear, this approach worked well for this particular case and worked well for me for years for many, many projects. And it's kind of a perfectly reasonable bread and butter technique that you can certainly still use today in a very similar way. I spend a lot of time studying how to get the most out of this. One nice trick that a lot of people are not as familiar with as they should be is what do you do with continuous inputs in these cases and how do you transform them so that you can handle nonlinearities. A lot of people use polynomials for that. And actually polynomials are generally a terrible choice. Nearly always the best choice it turns out is actually to use something called natural cubic splines. And natural cubic splines are basically where you split your dataset into kind of sections of the domain and you connect each section up with a cubic. So each of these bits between dots are cubics and you create the bases such that these cubics connect up with each other and their gradients connect up. And one of the interesting things that makes them natural splines is that the endpoints are actually linear rather than cubic which actually makes these extrapolate outside the input domain really nicely. You can see as you add more and more knots with just two knots you start out with a line and as you add more knots you start to get more and more opportunities for curves. One of the cool things about natural splines and also called restricted cubic splines is that actually you don't have to think at all about where to put the knot points. It turns out that there's basically a set of quantiles where you can put the knot points pretty reliably depending on how many knot points you want which is independent of the data and nearly always works. So this was a nice trick and then you know another nice trick is if you do use regularised regression particularly L1 regularised regression I really like you don't even have to be that careful about the number of parameters you include a lot of the time. So you can often include quite a lot of transformations including actually sorry not transformations so interactions including interactions of natural cubic spline terms. So you know this is an approach that I used a lot and had a lot of success with but then in the I think it's 99 that the first paper appeared in the early in kind of 2000 it started getting popular was random forests and random forests this is a picture from Terence Parr's excellent detree viz package random forests are ensembles of decision trees as I'm sure most of you know and so for an example of a decision tree this is some data from the Kaggle competition which is trying to predict the auction price of heavy industrial equipment and you can see here that a decision tree has done a split on this binary variable of coupler system and then for those which I guess don't have a coupler system it did a binary split on year made and those which then were made in early years then split oh then we can see immediately the sale price so this is the thing we're trying to predict the sale price and so in this case we can see that it's in just four splits it successfully found some things which this is actually the log of sale price has done a really good job of splitting out the log of sale price. I actually used these you know single decision trees a little bit in the kind of early and mid 90s but they were a nightmare to find something that that fit adequately but didn't overfit and random forests then came along thanks to Breiman who a very interesting guy he was originally a math professor at Berkeley and then he went out into industry and was basically a consultant I think for years and then came back to Berkeley to do statistics and he was incredibly effective in creating like really practical algorithms and the random forest is one that's really been well changing incredibly simple you just randomly pick a subset of your data and you then train a model train it you know just create a decision tree with a subset you save it and then you repeat steps one two and three again and again and again creating lots and lots of decision trees on different random subsets of the data and it turns out that if you average the results of all these models you get predictions that are unbiased accurate and don't overfit and it's a really really cool approach so it's basically as soon as this came out I added it to my arsenal one of the really nice things about this is how quickly you can implement it we implemented it in like a day basically so this came out when I was running a company called optimal decisions which I built to help insurers come up with better prices which is the most important thing for insurers to do one of the interesting things about this for me is that we never actually deployed a random forest now what we did was we used random forests to understand the data and then we use that understanding of the data to then go back and basically build more traditional regression models with the particular terms and transformations and interactions that the random forest found were important so basically this is one of the cool things that you get out of a random forest it's a feature importance plot and it shows you so this is again the same data set the auction price data set from Kaggle it shows you which are the most important features and the nice thing about this is you don't have to do any transformations or think about interactions or non-linearities because they're using decision trees behind the scenes it all just works and so I kind of developed this pretty simple approach where I would first create a random forest and I would then find which features and so forth are useful I then use partial dependence plots to kind of look at the shapes of them and then I would go back and kind of for the continuous variables that matter create the cubic splines and create the interactions and then do you know regression and so this basic kind of trick was incredibly powerful and I used it a variance of it in the early days of Kaggle amongst other things and got to number one in the world and won a number of competitions and funnily enough actually back in 2011 I described my approaches to Kaggle competitions in Melbourne at the Melbourne meetup and you can still find that talk on on YouTube and it's actually still pretty much just as relevant today as it was at that time so this is 2011 and I became the chief scientist and president at Kaggle and we took it over to the US and got venture capital and built into quite a successful business that's something interesting that happened as chief scientist of Kaggle you know I was getting to see all the competitions up close and seven years ago there was a competition Dogs vs. Cats which you can still see actually on the Dogs vs. Cats Kaggle page it describes the state-of-the-art approach for recognizing Dogs vs. Cats as being around about 80 accuracy and so that was based on the academic papers that had tackled this problem at the time and then in this competition that just ran for three months eight teams reached 98 accuracy and one nearly got to 99 accuracy so if you think about this is a 20 error rate and this is basically a 1 error rate so this competition brought the state-of-the-art down by about 20 times in three months which is really extraordinary it's really unheard of to see an academic state-of-the-art result this has been you know carefully studied slashed by 20 acts by somebody working for just three months on the problem you know that's normally something that might take decades or hundreds of years if it's possible at all so something clearly happened here and of course what happened was was deep learning and Pierre actually had developed one of the early deep learning libraries and actually even this kind of signal on Kaggle was in some ways a little late if you actually look at Pierre's Google Scholar you'll see that it was actually back in 2011 that him and Jan Lacune had already produced a system that was better than human performance at recognizing traffic signs and so this was actually the first time that I noticed this this really extraordinary thing which was you know deep learning being better at than humans at very human tasks like you know looking at pictures and so you know in in in 2011 I thought well that's that's super interesting but it's hard to do anything with that information because there wasn't any open source software or even any commercial software available to actually do it there was Jörgen Spithuber's lab had a kind of like a DLL or something or a library you could buy from them to do it although they didn't even have a demo you know there wasn't any online services and there wasn't any nobody had published anywhere like the actual recipe book of like how the hell do you do these things and so that was that was a huge challenge it's like it's exciting to see that this is possible but then it's like well what do I do about it but but one of the cool things is that at this exact moment this dogs and cats moment is when two really accessible open source libraries appeared allowing people to actually create their own deep learning models for the first time and critically that they were built on top of CUDA which was a you know dramatically more convenient way of programming GPUs than had previously existed so kind of things started to come together you know really seven years ago a little bit I had been interested in neural networks since since the very start of my career and in fact in consulting I worked with one of the big Australian banks on implementing a neural network in the early to mid 90s to help with targeted marketing not a very exciting application I'll give you but it was it really struck me at the time that this was a technology which I felt like at some point would probably take over just about everything else in terms of my area of interest around predictive modeling and we actually had quite a bit of success with it even then so that's you know like 30 years ago now yearly but you know there were some issues back then for one thing we had to buy custom hardware that cost millions of dollars we really needed a lot of data millions of data points so you don't want to retail bank we could do that and yeah it was even then there were things that just weren't quite working as well as we would expect and so as it turned out the key problem was that back then everybody was relying on this this math result called the universal approximation theorem which said that a neural network could solve any given problem computable problem to to any arbitrary level of accuracy and it only needs one hidden layer and you know this is one of the many many times in deep learning history where theory has been used in totally inappropriate ways and the problem with this theory was that although this was theoretically true in practice a neural network with one hidden layer requires far too many nodes to be useful most of the time and what we actually need is lots of hidden layers and that turns out to be much more efficient so anyway I did kind of feel like for those 20 years at some point neural networks are going to reappear in my life because of this like infinitely flexible function you know the fact that they can they can solve any given problem in theory and then you know along with this infinitely flexible function we combine it with gradient descent which is this ball purpose parameter fitting algorithm and again there was a problem with theory here which is I spent many many years focused on operations research and optimization and operations research generally focused on again kind of theoretical questions of what is provably able to find the definite maximum or minimum of function and gradient descent doesn't do that particularly stochastic gradient descent and so a lot of people were kind of ignoring it but the thing is the question we should be asking is not what can we prove but what actually works in practice and the people who the very small number of people who were working on neural networks and gradient descent throughout the 90s and early 2000s despite you know the or the theory that said it's a terrible idea actually we're finding it it was working really well unfortunately you know academia around machine learning has tended to be much more driven by theory than results at least for a long time it was I still think it is too much and so the fact that there were there were people like Hinton and Lacun saying look here's a model that's better than anything in the world at solving this problem but you know based on theory we can't exactly prove why but it but it like really works those were not getting published unfortunately anyway so things gradually began to change and one of the big things that changed was that finally in the you know kind of around 2014 2015 we started to see some software appearing that allowed us to conveniently train these things on GPUs which allowed us to use you know relatively inexpensive computers to actually get pretty good results so although the the theory didn't really change at this point what did change is just more people could try things out and be like oh okay this is actually practically really helpful um to people outside of the world of neural networks this all seemed very sudden it seemed like there was this sudden fad around deep learning where people were suddenly going well this is this is amazing and so people who had seen other fads quite reasonably thought well this one will pass too um but the difference with this fad is it's actually being um under development for many many many decades so this was the first neural network to be built and it was back in 1957 that it was built and um continually for all those decades there were people working on making neural nets really work in practice so what was happening in 2015 was not a sudden um here's this new thing where I got all going to flock to but it was actually here's this old thing which we finally got into the point where it's actually really working um and so it's it's it's not a new fad of all but it's really the result of decades of hard work of solving lots of problems and finally getting to a point where things are making sense um but what has happened since kind of 2015 is the ability of these um um infinitely flexible functions has suddenly started to become clear even to a layperson because you can just look at what they're doing and it's mind blowing so for example if you look at open ai's um darly um this is a model that's being trained on pairs of pictures and um captions such that you can now write any arbitrary sentence so if you write a if you write an illustration of a baby daikon radish in a tutu walking a dog darly will draw pictures of what you described for you and here are some actual non cherry picked pictures of that and so to be clear this is all out of domain right so darly has never seen illustrations of baby daikon radishes yet or radishes and tutus or let alone any of this combination of things it's um it's creating these entirely from scratch um by the same token it's never seen an avocado shape chair before as best as i know but if you type in an arm chair in the shape of an avocado it it creates these um pictures for you from scratch and so you know it's it's it's really cool now that we can actually kind of show we can actually say look what computers can do um and look what computers can do if you use deep learning and to you know anybody who's grown up in the kind of pre deep learning era this just looks like magic you know it's like this is not things that i believe computers can do um but but here we are this is this is the this uh um theoretically universal um um universally capable model actually doing things that we've trained it to do um so in the last few years we're now starting to see you know many times every year examples of computers doing things which we're being told computers won't be able to do in our lifetime so for example i was repeatedly told by experts that in my lifetime we would never see a computer win a game of go against an expert and of course we're now at the point where alpha go zero got to that point in three days and it's so far ahead of the best expert now that you know it's it's kind of makes the world's best experts look like um total beginners and one of the really interesting things about alpha go zero is that if you actually look at the source code for it um here it is and the source code for the key thing which is like the thing that figures out whether a go board's a good position or not um fits on um one one slide and furthermore if you've done any deep learning you'll recognize it as looking almost exactly like a standard um computer vision model um and so um one of the things which people who are not themselves deep learning practitioners don't quite realize is that deep learning on the whole is not a huge collection of somewhat disconnected but slightly connected kind of tricks it's actually you know every deep learning model i build looks almost exactly like every other model i build with fairly minor differences and i train them in nearly exactly the same way with fairly minor differences um and so deep learning has become this um incredibly flexible skill that if you have it you can turn your attention to lots of different domain areas and and rapidly get incredibly good results so at this point deep learning is now the best approach in the world for all kinds of applications okay i'm not going to read them all and this is by no means a complete list it's it's far longer than this but these are some examples of the kinds of things that deep learning is is better at than any other known approach um so why am i spending so much time in my life now on deep learning because it really feels to me like a very dramatic step change in human capability like the development of electricity for example and you know i would like to think that when i see a very dramatic step change in human capability i'm going to spend my time working on um figuring out how best to take advantage of that capability um because that's you know there's going to be so many world changing breakthroughs that come out of that and particularly as somebody who's um built a few companies um as an entrepreneur that the number one thing for an entrepreneur to find and that investors look for is is there something you can build now that people couldn't build before in terms of as a company and with deep learning the answer to that is yes across tens of thousands or hundreds of thousands of areas because it's like okay suddenly there's tools which we couldn't automate before and now we can or we can make hundreds of times more productive or so forth so it's to me it's a very obvious thing but like this is what i want to spend all my time on and when i talk to students you know or interested entrepreneurs i always say you know this is the thing which is making lots and lots of people extremely rich and is solving lots and lots of important societal problems and we're just seeing the tip of the iceberg so as soon as i got to the point that i i realized this um i just i decided to start a company to to actually you know do do something important and so i got very excited about the opportunities in medicine and i created the first deep learning in medicine company called um inletic and um i didn't know anything about medicine and i didn't know any people in medicine so i kind of like got together a group of um three other people and me and we decided to kind of hack together a a quick deep learning model that would see if we can predict the malignancy of nodules in lung CT scans and um i turned out that we could and in fact the algorithm that we built this company that that i ended up calling inletic had a better false positive rate and a better false negative rate than actually a panel of four trained radiologists and so this was at a time when deep learning in medicine and deep learning and radiology was was unheard of there were basically no papers about it there were certainly no startups about it um no one was talking about it and so this uh this finding got some attention and this was really important to me because my biggest goal with inletic was to kind of get deep learning and medicine on the map because i felt like it could save a lot of lives so i wanted to get a lot of as much attention around this as possible and um yeah very quickly um lots and lots of people were writing about this new company and as a result um very quickly deep learning particularly in radiology took off and within um two years um the main radiology conference had a huge stream around AI it was lines out the door they created a new um a whole new journal for it um and so that was really exciting for me to see um how you know we could help kind of put a technology on the map um in some way you know this is great but in some ways it was kind of disappointing because there were so many other areas where deep learning should have been on the map and it wasn't and um there's no way that i could create you know companies around every possible area so instead i thought well what i want to do is make it easy for other people to create to create companies and products and solutions using deep learning um particularly because at that time nearly everybody i knew in the deep learning world were were um young white men from one of a small number of exclusive universities um and um the problem with that is that there's a lot of societally important problems to solve which that group of people just weren't familiar with um and even if they were both familiar with them and interested with them interested in them they would they didn't know how to find the data for those or you know what the kind of constraints and implementation are and so forth so um Dr. Rachel Thomas and I decided to create a a new organization that would focus on one thing which was making deep learning accessible and so basically the idea was to say okay if this really is a step change in human capability which happens from time to time in in technology history um what can we do to help um people use this technology regardless of their background and so there was a lot of constraints that we had to to help remove um so the first thing we did was we thought okay let's at least make it so that um what people already know about how to build deep learning models is um is as available as possible so at this time there weren't any courses or any kind of easy ways in to get going with deep learning and we had a theory which was we thought you don't need a Stanford PhD to be an effective deep learning practitioner you don't need um years and years of graduate level math training we thought that we might be able to build a course that would allow people with just a year of coding background to become effective deep learning practitioners now at this time so what is this about 2014 or 2015 can't quite remember maybe 2015 um nothing like this existed and this was a really controversial hypothesis and to be clear we weren't sure we were right but we this was this was a feeling we had so we thought let's give it a go so the first thing we created was um a fast AI practical deep learning course and um certainly one thing we immediately saw which was thrilling and we certainly didn't know what would happen was that it was popular um a lot of people talk the course um we made it freely available online with no ads to make it you know as as accessible as possible since that's our mission and I said to the um that first class um if you create something interesting with deep learning um please tell us about it on our forums so we created a forum so that so that students could communicate with each other and um we got thousands of replies and um I remember one of the first ones we got I think it was this is one of the first was um somebody who tried to recognize cricket pictures from baseball pictures and they had um I think it was like 100 accuracy or maybe 99 accuracy and one of the really interesting things was that they only used 30 training images um and so this is like exciting to us to see somebody like building a model and not only that building it with far less data than people used to think was was necessary and then suddenly we were being flooded with all these cool um models people were building so um uh a uh Trident Adam Tobago two different types of people model a zucchini and cucumber model this is a really interesting one this person actually managed to predict um what part of the world a satellite photo was from with over 110 classes with 85 accuracy which is extraordinary um a Panama bus recognizer cloth recognizer you know this you know some of the things were clearly actually going to be very useful in practice this was something useful for disaster resilience which was recognizing the state of buildings in this place in Tanzania um we saw people even um you know right at the start of the course breaking state-of-the-art results uh so um this is on Devangari um character recognition um this person said wow I just got a new state-of-the-art result which is really exciting um we saw people doing the same similar skating um state-of-the-art results on um audio classification um and then um even we started to hear from some of these students in the first year that they were taking their ideas back to their companies and um in this case um a software engineer um went back to his company he was working at Splunk and um built a new model which basically took mouse movements and mouse clicks and turned them into pictures and then classified them um and used this to help with fraud um and um we know about this because it was so successful that it ended up being a patented um product and Splunk created a blog about the this cool new technology that was built by you know um a software engineer with no previous background in this area and we saw startups appearing so for example this startup called Envision appeared um from one of our students um and uh still going strong I just looked it up before before this um and so yeah it was really cool to see how um people from all walks of life were actually able to get started with with deep learning um and um you know these courses got really popular and so we started redoing them every year so we'd built a new course from scratch because things were moving so fast that within a year there was so much new new stuff to cover that we would build a completely new course and so uh there's you know many millions of views at this point and um you know people loving them based on what they're telling YouTube anyway um so this has been um really a pleasure to see um we ended up turning the course into a book as well along with my friends Silvan Gujar and people are really liking the book as well um so after the next step after kind of getting people started with what do we already know by by turning putting that into a course is we wanted to push the boundaries beyond what we already know and so one of the things that came up was uh a lot of students or potential students were saying I don't think it's worth me getting involved in deep learning because it takes too much compute and too much data and it's you know uh unless you're a Google or Facebook you can't do it and this became particularly an issue when Google released their TPUs and put out a big PR exercise saying okay you know TPUs are so great that nobody else you know can pretty much do anything useful in in deep learning now um and so we decided to enter this international um competition called Dawn Bench that Google and Intel had entered to see if we could beat them like be faster than them at training deep learning models and uh so that was April 2018 and uh we we won on uh many of the um axes of the competition um uh the cheapest the fastest GPU fastest single machine um and then we followed this up with additional results that were actually 40 faster than Google's best TPU results and so this was um exciting because um here's a picture of us and our students um working on this project together it was a really cool time um you know because we really wanted to push back against this narrative that you have to be Google um and so it got a lot of media attention which was great um and it really the the finding here was using common sense is more important than using fast amounts of money and compute um it was really cool to see also that a lot of the big companies noticed what we were doing and bought in our ideas so um so uh Nvidia when they started promoting how great their GPUs were um they started talking about you know um how good they were with the additional ideas that we had developed with our students so um academic research became a critical component of fast ai's um work and we did similar research to drive breakthroughs in natural language processing uh in tabular modeling and lots of other areas so then the question is okay well with all these um you know now that we'd actually push the boundaries beyond what's already known to say okay we can actually do get better results with less data and less compute more quickly how do we put that into the hands of of everybody so that everybody can use these insights so that's why we decided to build um a software library called fast ai um so that was just in 2018 that version one came out but it immediately got a lot of attention it got supported by all the big um cloud services um and we were able to show that compared to keras for example it was um much more accurate much faster far less lines of code um and uh we um really tried to make it as accessible as possible so the you know this is uh some of the documentation from fast ai you can see that not only do you get the normal kind of api documentation but it's actually um you know got pictures of exactly what's going on it's got links to the papers that it's implementing and also all of the code for all of the pictures is all directly there as well and one of the really nice things is that every single page of the documentation has a link to let you actually open that page of documentation as an interactive notebook because the entire thing is built with interactive notebooks so you can then get the exact same thing but now you can experiment with it and you can see all the source code there um so we really took the kind of approaches that we found worked well in our course of having students do lots of experiments and lots of coding um and making that a kind of part of our documentation as well is to let people really play play with everything themselves and experiment and see how it all works um so incorporating all of this research into the software was super successful um we started hearing from people saying okay well i've just started with fast ai and i've started pulling across some of my tensorflow models and i don't understand why is everything so much better um i you know what's what's going on here um so people were really noticing that um they were getting dramatically better results um so this person um said the same thing yep we used to try to use tensorflow we spent months tweaking our model we switched to fast ai and within a couple of days we were getting better results so by kind of combining the the research um with the software we were able to provide a software library that let people get started more quickly um and then version two um which has been around for a bit over a year now was a very dramatic advance further still there's a whole academic paper that you can read describing um the all the deep design approaches which we've used um one of the really nice things about it is that um basically regardless of what you're trying to do with fast ai you can use almost exactly the same code so for example um here's the code necessary to recognize dogs from cats um here's the code necessary to build a segmentation model um it's it's basically the same lines of code here's a code to um classify text movie reviews almost the same lines of code uh here's the code necessary to do collaborative filtering almost the same lines of code so i said earlier that kind of under the covers you know different models look more similar than different with deep learning and so with fast ai we really tried to surface that so that you learn one api and you can use it anywhere um that's not enough for researchers or people that really need to dig deeper so one of the really nice things about that is that underneath this applications layer is a is a tiered or a layered api where you can go in and change anything and i'm not going to describe it in too much detail but for example part of this mid tier api is a new two-way callback system which basically allows you at any point when you're training a model to see exactly what it's doing and to change literally anything that it's doing you can skip parts of the training process you can change the gradients you can change the data and so forth and so with this new approach we're able to implement for example this is a from a paper called mix up we're able to implement mix up data augmentation in just a few lines of code and if you compare that to the actual original facebook paper not only was it far more lines of code this is this is what it looks like from the research paper without using callbacks but it's also far less flexible because it's it's everything's hard coded or else with this approach you can mix and match really easily another example of this layered api is we built a new approach to creating new optimizers using just two concepts stats and steppers and i won't go into the details but in short this is what a particular optimizer called adam w looks like in pie torch and this talk about two years between the paper being released and facebook releasing the adam w implementation our implementation was released within a day of the paper and it consists of these one two three four five words because we're leveraging this layered api for optimizers it's basically really easy to to utilize the components to quickly implement new papers here's another example of an optimizer this one's called lamb this came from a google paper and one of the really cool things you might notice is that there's a very close match between the lines of code in in in our implementation and the lines of math in the algorithm in the paper so anyway so there's you know a little summary of both what i'm doing now with fast ai and how i got there and why and yeah i'm happy to take any questions thanks chairman so um yeah really interesting kind of historical view of where where you came from and um so i guess like you so i'll start with a quick thing so i mentioned so like in deep learning that you know obviously there's very similar structures and code and solving problems but are you how do you incorporate things like knowledge about the problem like obviously the type of you know architecture that would have to go in there would come from the context of the problem yeah that's a great question so um there's there's a number of really interesting ways of incorporating knowledge about the problem and it's it's a really important thing to do because this is like this is the this is how you kind of get a whole lot of extra performance and need less data and less time the more of that knowledge you can incorporate so yeah one way is certainly to directly implement it in the architecture so for example a very popular architecture for computer vision is convolutional architecture and the convolution a 2d convolution is taking advantage of our domain knowledge which is that there's generally auto correlation across pixels and both the x and y dimensions and so we're basically mapping a a set of weights across you know groups of pixels that are all next to each other there's a really wide range of interesting ways of incorporating all kinds of domain knowledge into architectures and you know there's lots of geometry based approaches of doing that within natural language processing there's lots of autoregressive approaches there that's one area an area I am extremely fond of is data augmentation and in particular there's been a huge kind of improvement in the last 12 months or so in how how much we can do with a tiny amount of data by using something called self-supervised learning and in particular using something called contrastive loss and what this is doing is you basically come up with really try to come up with really thoughtful data augmentation approaches where you can say like okay so for example in nlp one of the approaches is to translate each sentence with a translation model into a different language and then translate it back again so you're now going to get a different version of the same sentence but it should mean the same thing and so then with contrastive loss it basically says you add a part to the loss function that says those two different sentences should have the same result in our model and so with something called uda which is basically adding contrastive loss and self-supervised learning to nlp they were able to get results for movie review classification with just 20 labeled examples that were better than the previous state of the art using 25 000 labeled examples anyway there's lots of ways we can incorporate domain knowledge into models but there's there's a couple of ones that i like yes i guess there are a couple of questions about kind of interpretability so one of the one of the questions they came up is it's hard to explain to stakeholders and so how how can you kind of convince them that deep learning is worth adopting i mean obviously that you can show predictive performance but is there any is there any other ways that you can do that sure so my view is that deep learning models are much more interpretable and explainable than most regression models for example generally speaking the traditional approach to people thinking the way the right way to understand for example regression models is to look at their coefficients and i've always told people don't do that because in almost any real world regression problem you've got coefficients representing interactions you've got coefficients on things that are correlated you've got coefficients on various different bases of a transformed nonlinear variable none of the coefficients can be understood independently because they can only be understood as how they combine with all the other things that are related so i um genuinely really dislike it when people try and explain a regression by looking at coefficients to me the right way to understand a model is to um do the same thing we would do to understand a person which is to ask it questions and so whether it's a regression or a random forest or a deep learning model you can generally easily ask questions like what would happen if i made this variable on this very little bit bigger or a little bit smaller or um you know things like that which actually are much easier to do in deep learning because in deep learning those are just questions about the derivative of the input and so you can actually get them much more quickly and easily um you can also do really interesting things with deep learning around showing um which things are similar to each other kind of in the um deep learning feature space and you can build really cool um applications for domain experts then which can give them a lot of comfort so you can say yes it's it's accurate but i can also show you which parts of the input are you know being particularly important in this case um which other inputs are similar to this one and often we find like for example in the medical space doctors will kind of go wow that's really clever the way you recognize that this patient and this patient was similar a lot of doctors wouldn't have noticed that it's actually this subtle thing going on and i guess we're right at 11 o'clock but maybe i'll one last question that some somebody brought up is um is there any future research opportunities in the cross machine learning and quantum computing that you can think about it's an interesting question i don't know if you thought about that um no probably not one i've got any expertise on could well be an interesting question but i'm not the right person to ask okay um one thing i do want to mention is um i i i have just moved back to australia after 10 years in um san francisco um and i am extremely keen to see um australia become an absolute knowledge hub around deep learning and i would particularly love to see you know our fast ai software like this that you know just like when you think about tensor flow you kind of have this whole ecosystem around it around google and startups and all this i would love to see like australia become uh you know that fast ai is kind of the the homegrown library and that people here will really take it to heart and um help us help us make it brilliant it's all open source you know and uh we've got a discord channel where we all chat about it and you know any organizations that are interested in you know uh taking advantage of this if this free open source library i would love to support them and see like you know and academic institutions i'd love to see this be become a really successful ecosystem here in australia great yeah now it seems like there's um it's going to be quite useful to solve lots of problems so i think it'd be good to do um so there are still some questions in the chat uh we'll we'll have the chat transcript and if there's any questions that jeremy you know might be worth addressing from there we can we can think about posting responses to those if there's anything in there but we can do that after the fact so thank you everybody for coming and um thank jeremy for uh joining us today and thanks so much yeah it's been great bye