 Hi and welcome everybody to this course on analyzing software using deep learning. This is a course at University of Stuttgart in summer 2020. Usually this is not an online course, but because of the current Corona situation this year it is an online course. And the good news about this is that everybody can enjoy this course no matter whether you're enrolled at University of Stuttgart or not. So what we'll do in this course is to look at the intersection of two fields namely program analysis and machine learning and more specifically deep learning. And this intersection of these two fields has emerged as an interesting and exciting topic only a few years ago. So most of what we'll do in this course is based on relatively recent research from the last five or so years. But at the same time a lot of these ideas have already transitioned into industrial practice or are in the process of transitioning into industrial practice. So it's a really interesting mix to be in. So before we start with the course, let me say a few things about myself. My name is Michael. I've been a professor at University of Stuttgart since fall 2019. Before that I've studied a while ago at a couple of universities, TU Dresden in Germany, E-Course on Paris and Paris France and also EPFL in Lausanne, Switzerland. After that I went to ETH Zurich also in Switzerland for my PhD followed by a postdoc at UC Berkeley in California. Then I went back to Germany to become an assistant professor at TU Darmstadt. Then back again to California to spend sabbatical at Facebook Menlo Park to see how software development works in practice and then finally I came back again now back to Germany and now I'm here at the University of Stuttgart. So I'm leading a research group called the Software Lab which has been my group since about six years. What we do is we do research that focuses on tools and techniques to help developers build better software and what better means can be many things and can mean more reliable software, more secure software or more efficient software. To do this we use a mix of techniques including a lot of program analysis techniques and a lot of program testing techniques and more recently also a lot of learning techniques in particular deep learning techniques. So what we talk about here in this course is really right at the core of our research interests which is why I'm pretty excited about this topic. If you are a student at the University of Stuttgart we regularly have opportunities for master thesis also for other kind of student jobs. So if you're interested in any of this for example on any of the topics that we talk about here feel free to contact me. The course is split into different modules and every module has a couple of parts. So this is the very first module called introduction and in this very first module we are in the first part which tries to give a bit of a motivation for why this is an interesting topic. So as I said this course combines program analysis and machine learning so let me start by saying what program analysis actually is. Program analysis is an automated way of reasoning about the behavior of a program. If you have ever written a program which I hope you have otherwise this course is probably not the right course for you then you have of course reasoned about programs. What program analysis does is to not do this manual but to automate this process of reasoning about a program by essentially having a tool that does this reasoning for you. Why do you want to do this? Well for many reasons. One of them is for example that you want to find programming errors or bugs in your code and want to have a tool that tells you where your code might be wrong. Another reason might be that you want to optimize your performance or maybe just better understand the performance of your program so that you can later optimize it manually or maybe you want to find security vulnerabilities in order to avoid being attacked by malicious people. So how does such a program analysis work? So let's first have a look at how a program usually works. So typically a program takes some input and then produces some output based on this input. Now what a program analysis does is to also provide you some additional information about this program. For example it might tell you that if the program gets this input and produces this output this output may actually not be what you want so there may actually be a bug in this program. Now an analysis can do this for one input and one behavior that the program has triggered by this input. But of course what you ideally want is to not only do it for one input but for many inputs and maybe even for all inputs that this program might ever take. So why do we need all of this? The reason is that program analysis is at the core of many developer tools that software developers are using in their daily work. One example is a compiler. So if you're compiling your code then under the hood a program analysis is running and it's for example type checking your code and making sure that you do not have any type errors. Likewise if you sometimes get some warnings in an IDE that tell you that maybe this particular piece of code is wrong then the tool that has emitted this warning is a program analysis that tries to reason about the behavior of your code and tries to reason about what may be right or what may be wrong. Another example of a program analysis that maybe some of you have used is a performance profiler which looks at the execution of a program and then tells you in which parts of the program a lot of resources such as CPU time is spent. Yet another example that I'm sure everybody is used in one way or another is code completion. Very often code completion is built into an IDE and tries to complete your code by basically helping you choose what parts or what tokens to type in next. And under the hood this of course is again a program analysis because the more the code completion tool understands about your code the better its suggestions will be. Yet another example are automated testing tools. For example the monkey tester for Android is such a tool which automatically uses your application and tries to make it run in order to expose for example bugs. And then there are many many other tools like code documentation code summarization tools and many others all of which developers are using very regularly and all of which are based on program analysis. So where do all these program analyses come from? So the traditional approach to create a program analysis is to have an expert in program analysis. For example someone who did a PhD in this topic sit down and take all the knowledge that he has about programs and about their behavior and then writes an algorithm which is called the analysis that tries to solve a particular problem for example finding a particular kind of bug. Now this is a good approach but it requires significant human effort to create a program analysis. One of the reasons is that there are a lot of conceptual challenges. So how to effectively reason about the possible behaviors of a program while scaling too large programs. Another reason is that even if all these conceptual challenges are solved there is a significant implementation effort in order to make sense of the code and really understand what's going on. And all of these factors are the reason why it takes pretty long to build a new program analysis and why typically only a few experts can really do this. Another property of this traditional approach is that a traditional program analysis analyzes a single program at a time. So it's meant for analyzing arbitrary programs but then at a given point in time it's analyzing one specific program. Now the motivation for this course is that as an alternative or maybe as an addition to the traditional approach you can learn a program analysis from existing data. One reason is that there is a huge amount of existing code sometimes referred to big code as in big data which we can learn from. And the reason why learning works pretty well on source code is because programs tend to be pretty regular and repetitive. Of course every developer is an individual and everybody writes different code but if you look at a lot of code it turns out that programs tend to look the same over and over again and there are many interesting properties that are repetitive across programs even written by different developers and even written in different application domains. Now machine learning and in particular deep learning is really good at extracting such knowledge from large amounts of data and seeing patterns that may not be obvious if you just look at individual examples but become obvious if you look at a lot of examples and to apply this knowledge in new contexts. And by using this idea learned models on code have been able to for example complete partial code so they essentially give you a code completion mechanism or they can tell you how to use an API by learning from existing API usage examples and then telling you as a developer how to use that API. Such approaches can also learn how to find bugs or maybe even fix bugs because they know how incorrect code looks like and they know how to make the code correct. And you can for example also use such learning-based approaches to create inputs for testing where you have an approach that automatically exercises a program by learning how to make best use of the tested program. Out of the many machine learning techniques that exist, this course will mostly focus on deep learning which is a specific class of machine learning algorithms. Deep learning refers to neural network architectures and the deep refers to the fact that there's not just a single formula that takes some input and transforms it into some output but usually there's a sequence of different layers and transformations that transforms the input to the output. Another important feature of deep learning is that you do not have to explicitly specify the features that your data, in our case for example your program has, but the features and also the representation of this data for example programs are extracted automatically. So basically what you have to do is to give the programs in some suitable format to a model and then it's automatically learning what parts of the programs are most relevant to solve your task. Deep learning has already revolutionized a couple of other areas, for example speech recognition where now your phone can understand what you're saying basically because there's a deep learning model running somewhere that understands what you say. It has revolutionized image processing and can really precisely tell you what is on an image and it even works pretty well in game playing and can compete with human players at the highest level. An area that deep learning has only started to revolutionize is how to make software developers more productive. And this is exactly what we look at in this course. So we look at this intersection of program analysis, tools that make developers more productive and deep learning. So automatically learn models that in this case will help us solve some of the program analysis challenges. So what we'll do here in this course is to cover some of the basics of these areas. So we look into some program representations, for example. So different ways of representing a program that go beyond what you typically see as a developer when you're coding. And we'll also look at some neural network architectures that are most suitable for this specific task. Most of what we'll cover here is based on recent research results. So basically ideas that have only come up in the last five or so years, which means that there's not really a textbook to look all of this up, but you can of course read a lot of this material in some of the recent research papers. To give you also some hands-on experience and a really concrete idea of how this intersection of deep learning and program analysis works in practice, the course also comes with a coding project where you will design and implement and evaluate a learning-based program analysis and so get a very concrete idea of how all of this really works. So now you know what this course is about. Let me also tell you what this course is not about. So this is not a detailed course of program analysis. This is also not a really detailed course on deep learning or machine learning in general. And despite the fact that there is a course project, this is not a tutorial on any machine learning or deep learning library. The reason is that there are plenty of other resources that cover these different really interesting aspects. For example, I'm also providing a course that is typically taught in the winter semester on program analysis, but this is not what we'll do here, but this course is really about the intersection of program analysis and deep learning. All right, and this is all I have for this very first part of the introduction to analyzing software using deep learning. I hope this sounds interesting and you stick with me for the rest of the course. Thank you very much for listening and see you next time.