 Okay everyone, so please welcome Fridolin who's gonna tell us about TOTS, the recommendation engine for Python. Okay, so hello everyone, welcome to this presentation about project TOTS, about recommendation engine for Python applications. Before I start, let me introduce myself. My name is Fridolin, I work at Red Hat. I'm not first time speaker here, so you've probably seen me like two years back. I had a talk about project Selenon. Right now I work on project TOTS. I'm one of the developers of TOTS and I would like to introduce that project to you. So what is TOTS? And why TOTS? You probably know the PyPI, the Python Package Index. That's Index that hosts open source projects. And when I wrote these slides, I found out that there is something like 200,000 projects available out there, free to use. And there are about 1.6 million releases. That's quite a huge number, I would say, and this number grows each and every day, especially with popularity that Python ecosystem has. So let's try to use some packages. So let's create some artificial example and use some packages that are hosted on PyPI. So let's say we would like to use TensorFlow, a machine learning library, and also Flask. So we install these dependencies using, for example, PIP, and we write our application that uses these two libraries. OK, but if you take a look at this scenario, it's not just your application with these two libraries. By installing these libraries, you also introduce most of the time transitive dependencies, and you use some Python interpreter that uses these packages. This Python interpreter then runs inside some operating system that provides some packages, such as glibc, that is, in our case, used by TensorFlow. And then there is that kernel space where operating system runs and provides some kernel modules. And this all runs on some hardware. You can imagine that if you have any issue in any layer of this stack, then your application simply misbehaves or provides wrong outputs or simply doesn't even run or doesn't even start. So let's take a look at this layer. Let's take a look at direct dependencies and transitive dependencies. When I was writing these slides, I took a look about its packages, and I found out that there are 33 releases of Flask. And when you do PIP install Flask, you introduce five additional packages, such as click, and so on. These all are also released in different versions, and now let's suppose that we find all the versions of these libraries, and we would like to know how many combinations are there to install Flask with its dependencies. And I got this huge number. This is estimation. So the actual number will vary based on actual resolution, but you see that there is something like 50 million possibilities how you can install Flask application with different versions of its dependencies. I did the same for TensorFlow, and at the time I was writing these slides, there were 85 releases of TensorFlow, and TensorFlow is much bigger project than Flask, and it introduces something like 30 additional packages. When I did estimation on how many combinations are there to install TensorFlow stack, I got this huge number. So it was something like 10 to the power of 20. If you take a look at some AI approaches that are today, you probably know AlphaGo, the famous Go AI that beats every Go player, and it uses machine learning to find the best possible move in a game. And AlphaGo or Go game in general has something like 10 to the power of 172 combinations, how you can place things on board. So it's much bigger number compared to our number, but in project thought we took similar approaches as AlphaGo, and we are trying to solve developers' issues with stacks. So this was this layer. So we had something like 10 to the power of 20 number of combinations, but there is much more layers. So if we take a look at the software layer, there are different Python interpreters in different versions that you can install, then there are different operating systems in different versions that you can install. Then you have packages available on PyPI such as Flask or TensorFlow, or you can host your own packages on your own index, and we do so in our team. We optimize TensorFlow by switching different compiler flex, and we provide a community index that hosts TensorFlow releases free to use. So if you use TensorFlow, you can use these releases and gain some performance. OK, so now the question is, what should I use out of these options that I have? Now let's simplify our use case, and let's say that we have some two libraries. One is called SimpleLip, and the other one is AnotherLip. So it's similar case as Flask and TensorFlow, but in this case SimpleLip will not introduce any transitive dependencies to our stack, and AnotherLip will not introduce any additional libraries to understand. In other words, we install just these two libraries when we run our Python application. Now we would like to find some function that describes how good our software stack is. So we would like to install SimpleLip and AnotherLip in different versions and evaluate how good the given software stack is. We can do so, and we can create such function, and we install different versions of SimpleLip, different versions of AnotherLip, and then we have some overall score. This function is discrete, so we have discrete values, and let's try to interpolate these values. What we get? We get a surface, and we see that in some cases our application is scored better in some cases worse. What this score can mean? It can mean whatever you can think of. It can be, for example, number of vulnerabilities present in your software. It can be also performance of your machine learning model, but it can be also a combination of these two or any other metric you could think about. What we are looking for? We are looking for these spikes, like values that are very high score, and we assume that the scoring function returns higher values for software that is good for our use case. That's what we are looking for, and that's basically TOT. In TOT we aggregate some knowledge about packages, such as if the application builds correctly, if it runs correctly, if it runs correctly on different Python versions in different operating systems, if it behaves correctly, what's the overall performance, and then there is written a resolver that takes into account these observations and can resolve software stakes that are high performing. So if you remember that slide with function, these spikes in the wall state space. So we say that our latest versions are not always greatest choices, and that's basically the idea behind TOT. TOT is a bigger project. It runs in OpenShift. There are components and different integration points. But the most interesting component for this slide, for this talk, is Advisor. It's a component inside TOT, and it's basically the implementation of Python resolver. So TOT is a service that provides endpoints that you can consume data from, and it's pure server-side resolution. We had multiple iterations on implementation. The very first implementation was pure Python implementation, and it loaded the wall dependency graph into memory, and then we had representation of these packages. That was basically an re-graph, and we did transactional operations. So we removed these three packages from dependency graph, and that transaction error proceeded or not. It was also possible to score these packages, and the dependency graph was adjusted in a way that resolution led to higher performing stakes sooner. This implementation was, however, memory quite hungry, so the main issue was memory consumption. And, as you know, objects in Python carry some additional metadata, such as reference counts. And, for example, for a TensorFlow stack, we required something like 32 gigabytes just to resolve a software stack. So we abandoned this solution, and we re-written the core part into C++. So we designed a protocol that effectively serialized the wall dependency graph, and we were able to keep nodes of dependency graph in something like 24 bytes, and we gained also memory consumption improvements. However, we reached another bottleneck, and then there were a number of queries that were hit to our knowledge base or to our database just to obtain the dependency graph. So there were something like two and a half thousand queries to our database just to obtain TensorFlow's dependency graph. And then there were subsequent queries to score the dependency graph, and that required a lot of pressure on databases, so we changed the database two times. So right now we run third database, and later on we also decided to abandon this solution because of these queries. That's quite a huge number. So we looked for some solutions that we could apply from theoretical informatics, and we did operations research. So we implemented different types of resolving, for example hill climbing or simulated annealing. And this rethinking of the wall resolver led into split, the wall resolver into two parts. One was resolver that can lazily resolve software stack based on Python's ecosystem specification. And then there is predictor that guides this resolver on which packages, in which versions, should be resolved in order to come up with some software stack. This implementation worked for us, but if you think about it, we were basically randomly sampling this wall state space of software stacks that can be resolved. So we randomly picked some, scored it and evaluated the score. That's not nice, so we tried to find another solutions. We know that there is this function that describes packages. But what about finding gradients? So if we are able to find the gradient of this function, then the gradient can lead us to these spikes, and we can find higher scored software stacks much faster. And that was basically the next approach we took. So the most interesting paper that we evaluated was neural combinatorial optimization with reinforcement learning, published by Google Brain, and we really tried to learn that gradient. Unfortunately, if you think about thought as a service that should be responsible to users, you don't want to learn some gradients and you don't want to spend time on learning some neural network and provide inputs to neural networks. So you don't want to spend two hours to learn neural network and another 30 minutes to query your database. So we also abandoned this solution and right now we use gradient free methods. So we implemented temporal difference and Monte Carlo research, where Monte Carlo research looks the most promising way how to resolve graphs. So how does it work? We basically tried to sample that state space, that function of all possible resolved software stacks, and we learned policy how to find the best software stack based on predictor and based on scoring mechanism in the resolver. The resolver itself is a reconfiguration pipeline, so you can write different pipeline units. I think there are five pipeline units in total and these pipeline units are dynamically constructed on each request and they score actual resolver steps. So this way we can, for example, plug a new pipeline unit if you are using convolutional neural network. We can plug a pipeline unit that is specified just to score convolutional neural network layers. Then a special component in our deployment is called the dependency monkey. And this component is able to gather observations for us such as if the application runs correctly, if it behaves correctly, what is the performance. And this dependency monkey can sample that state space of all the possible stacks and resolve software stacks that we don't have any observations about and submit them to our service that evaluates how good the given software stack is. So this is how we advisor in our stage environment. So we have totally deployed at Red Hat and I asked to resolve TensorFlow stack. TensorFlow stack together with Flask. If you can see the whole resolution to something like 8 seconds, I asked to resolve latest software stack. When I asked to resolve the best possible software stack, the resolution took something like 2.7 minutes or almost 3 minutes and we were able to score a half million of software stacks. If you compare it to P-Pen, the packaging tool that is recommended by PyPA, when I tried to install TensorFlow and Flask and asked to resolve its latest version, the resolution took something like 1 minute and I had turned on pip cache so it was using cached artifacts from PyPA. So we are really benefitting of that offline resolution that we have inside our resolver where we do not contact directly PyPA, but we have pre-computed data and we can resolve software stacks quite fast. There are also other parts of thoughts that I haven't spoke about. For example, we use bots to automate updates of dependencies, new releases of bots components, of thoughts components. You can find also our TensorFlow index that hosts these TensorFlow artifacts that are optimized for performance and we have integration with different tools such as OpenShift or Jupyter notebooks. So if you are data scientist and you use Jupyter notebooks, you can enable our extension to Jupyter notebooks and it will talk to TOT and TOT will resolve software stacks for you. Then there is command line interface that is similar to PIP or P-Pen. If you have a GitHub repository and you use Python applications, we soon will release developer preview for our application that is called KBahead and you can install it, you can enable it for your repository and your dependencies will be automatically managed by TOT. So if you want to give TOT a try, feel free to install this application and we can make TOT better. So it's a community project. You can find information about TOT on this site. We have Twitter, so feel free to follow us on TOT Station Twitter handle. We are open source, so we can find us on GitHub under TOT Station organization and if you are interested in BOT, there is a link to BOT and if you have any feedback for BOT, you can submit it to our feedback form. So this way I would like to thank you and that's it. We have plenty of time for questions. So the question was whether this is just for PyPI or if this solution will work also for your local packages. In TOT, we can register Python index if it follows PEP specification of Python index and if it is publicly hosted, we are able to analyze these packages and resolve and use them for resolver. When computing the number of possible configurations, did you take into account the actual version requirements or it was just based on the name of each package? Sorry, can you repeat? Is that better now? So when computing the number of possible configurations, did you take into account the actual version requirements or it was just based on the name of each package? It was based on actual version specification. So some packages were really discarded, like older versions were really discarded by these combinations. How trustworthy is the output of TOT? Say it tells you that the best score for your software stack is using a pretty old version of a simple lib or another lib and you may be missing some features that you need but TOT is telling you that your software stack is going to have a better score. So how do you handle that? So if you use some features of libraries that are released in recent versions, it should be stated in your requirements file or in your PEP file because you really rely on that newer version that has these features. Can we expect this to come to other languages as well, like Golang or C? So right now we are focused on Python, maybe follow us on Twitter, we will post updates. So yes. If I have a very custom score function, I can provide it and you recompute it for all the package I need and it sounds very expensive. Recomputing of what? If I have a custom score function. Can I provide my own score function? You can. So there are these pipeline units that you can implement and in that case you can basically create your pipeline unit and say this is score that I would like to expect and then the pipeline unit is plugged to resolver. Well, thank you for doing. OK, thank you very much.