 Welcome everyone, the next talk we have coming up is reinforcement learning based dependency resolution by Frito Bacorni. Okay, I didn't butcher that. Frito is a senior software engineer in the AI central of excellence at Red Hat. Please let me know if you've noticed any issues with the video playback. Just getting started with it. Welcome to this talk about reinforcement learning based dependency resolution. This video was created for DevCon Views. That is a virtual event in 2020. I hope you are staying safe. My name is Frito Lin and let's start with a brief introduction about Project TOT. Project TOT is a project that is started in called AI COE in the office of the CTO and all the ideas and things I explained were invented in this project. The project is about making OpenShift a better platform for running AI ML workloads. And as Python is the driving force for AI ML applications, most of the things that I will describe will apply to Python. But the ideas that I will show can be also applied to other programming languages and other ecosystems considering nuances that differ these ecosystems. Why are we focusing on this? We know that software stacks are complex and they are changing over time. So if we develop an application today, it can get old in one year and things change. So it requires maintenance and knowing how the application behaves in different runtime environments considering, in case of Python, Python libraries, Python interpreter, native packages provided by the operating system, as well as kernel modules and down to hardware, which spends also a lot of variations how you can run your software. We are hosted on GitHub, so feel free to browse our code base. And let's move on to Agenda. I will talk slightly about Dependency Resolution in Python, but this will be really a brief introduction to get the basic idea and principles that are out there. And then I will show existing solutions and their pros and cons. And then we will move on and we will describe why we need another solution or why we started inventing a new solution. And the solution will use Monte Carlo Research. So I will slightly discuss about Monte Carlo Research and its variation for resolving high-quality software stacks. And then we will plug this algorithm into Resolution Pipelines, which make Resolver a configurable piece of software. So let's start with Dependency Resolution in Python. If you ever used some Python modules, you probably used this site. It's a Python package index, and it's a repository of software that was written mostly in Python programming languages. And as you can see, there are quite a lot releases, so it's more than two million releases of Python modules available out there, free to use open source. And you can download these Python distributions and use them in your applications. These Python distributions can come also from other indices or other sources. So an example can be TensorFlow. This machine learning library is produced by Google, which is published on PyPI. So I picked a version TensorFlow 2.2.0. And we produced another version of TensorFlow. It's another build that is optimized for ADX instruction sets. It's more optimized, and you can retrieve it from another source. So PyPI is just one source of these Python distributions, and you can find other sources on the Internet and install software from there. If you install some library, most often it depends on some other packages, and this fact is stated using version range specification, and these dependencies can be also conditional. So for example, TensorFlow has nearly 23 packages that are dependencies of TensorFlow, and it has, for example, NumPy with the given version range specification. Then there is Enum 3.4. It's a back port to older versions of Python. Of Enum module, this available in Python standard library, starting version 3.4. So there's no point in installing it for newer Python interpreters. So that's why this dependency is introduced conditionally if the Python interpreter version is older. As you can see, these version range specifications do not imply index or XURL, so they really use package name and version range specification that needs to be satisfied in order to satisfy dependency graph or construct dependency graph and satisfy resolve software stack. So let's create some dependence graphs. I've already mentioned TensorFlow in version 2.2.0 that is published on BIPI, and by analyzing it, we observed that it has 73 dependencies. So here you can see them stated all. So you see version range specification, some dependencies optional if environment markers are evaluated as true. And if we try to construct the dependence graph, we really end up with a lot of packages, a lot of software. So here's an example of TensorFlow 2.2.0. Again, it's the one that is installed from BIPI.org, and it has some dependencies with version range specifications, and these need to be resolved. So for example, ABSL Pi needs to be in version that is above or equal to 0.7.0, and this needs to be resolved. So resolver needs to find what versions satisfy this version range specification. So to this date, there are, I think, six versions of ABSL Pi that are published on BIPI that satisfy this version range specification. For another dependency, it's more easy because there's just one dependency that satisfies this version equality. For example, NumPi, for example, the version range specification for NumPi satisfies is satisfied by 21 releases of NumPi, so things are getting worse. And we would continue with all 23 dependencies that are direct dependencies of TensorFlow. These are direct dependencies, and these dependencies can have also and usually have dependencies, meaning TensorFlow has some prancing dependencies that need to be installed in order to run TensorFlow stack. So for example, PsiPi will have also dependencies, NumPi itself will have some dependencies, and we could continue, and we could see a quite large dependency graph. This dependency graph can grow over time. So for example, in case of ABSL Pi, there's stated version 0.7.0, and it could lead to opening because if maintainers of ABSL Pi that are unaware that TensorFlow uses ABSL Pi with some memory specification release a new software, for example, a new ABSL Pi in version 1.0.0, there might be incompatible API changes that will break all the installation of TensorFlow in version 2.2.0. If they resolve ABSL Pi to latest version that is released. This can apply even if there is a new minor release. In that case, software is un-tested with TensorFlow considering TensorFlow has a good test suite. So again, we can end up with a broken software, and there is just a new release, one single library in the whole dependency graph. So we consider all the direct dependencies of TensorFlow and we compute number of combinations, how we can install this software. We can end up with some number that is re-multiplied by 10 to the power of 13. So this number is quite large. It can grow pretty rapidly. Also, consider we have only direct dependencies. These direct dependencies have transitive dependencies, transitive dependencies, TensorFlow. So this number grows. And this number is valid only to this date. So if there are new releases, as in case of ABSL Pi, this number simply grows. We consider a release that are available only on Pi Pi in this case. So there are other package indices that are used in the resolution process. Again, this number grows. So these indices have patched versions of some libraries with bug fixes and stuff like that. We also consider only direct dependencies of TensorFlow. So if you use TensorFlow together with some another application, for example, another package, for example, Flask, that this grows. And dependency graph can be quite extensive and can become quite complex. So if you take a look at the dependency resolution in Python, we already know this dependency resolution is dependent on the environment. The example was Enum 3.4. There was an environment marker that stated when the given dependency should be installed into the stack. It might be even more complicated because these dependencies might be stated in setup Pi and the final list of dependencies that should be installed with software can be created dynamically on Pitch installation. That's why we created a tool that is called Top Solver and this solver can inspect Python pages and check what dependencies some given library has. This solver runs inside specific environments, so we have solver that is specific for BI8 running Python B6 and another solver that is specific for Fedora32 running Python 3.8. In these environments, some packages can have different dependencies, so we really need to inspect how the given page behaves in these environments. In other words, we invest CPU time and we pre-compute dependencies for a later dependency graph construction that we will do later on in this talk. I would recommend an article that was published by Dustin Imram and it's called Why Pi Pi Doesn't Know Your Project Dependencies. This is this topic. We created also another article that is called Python Dependencies and how to be Python's PIP Particle Series and it basically describes why Pi Pi Doesn't Know Python Dependencies but totes solver on all speeds. Let's take a look at existing solutions and their pros and cons resolving software stacks. I took a listing of resolvers from packaging.pyton.org and you are probably familiar with PIP that is the recommended tool for installing Python Kegies. Then there's PIPend that also manages the virtual environment, manages the log file that is kept in sync with packages and Python module that you installed. Then there is PIP tools which creates some locking mechanism on top of requirements in and requirements TXT. There is again another format for describing dependencies. There's also community effort similar to PIPend that is called Poetry and Poetry tries to manage also virtual environment but also the life cycle of a module is based on IPI or other Python package index. PIPend does not have this assumption that modules that are managed by PIPend will be released to some Python package index. There's also Micro-Pipend this tool was developed in project totes and it does not actually implement any resolver but it's already state it and it provides lightweight tool that can be used to install dependencies managed by PIP tools, PIPend, Poetry or raw requirements TXT files. We use this tool for install dependencies in generalized environments such as S2I such as OpenS2I. Why do we need another solution? Why do we need another resolver that needs to be implemented or why? What's the story behind? So all these tools that implement resolver try to install latest software and what if the latest software is not the great software? What if that software is not the software that I would like to use? If you recall that back with hypothetically released Pi version 1.0.0 that broke our software stack, these things happen and it's quite issue because it breaks the application and one or two years later you can end up with a software that you cannot run because you don't know what dependencies should be installed in order to run your software. So let's create a tool that installs packages that are installable into your environment that's first assumption so there are no supply issues containers provided correct packaging so the Python distributions are installed into your environment. Then we want to install software that runs in the given environment so if we are running Python 3 we don't want libraries that are not Python 3 compliant for example or if there are some big changes in Python across releases we want to run software that respects these changes then we want to install software that correctly runs so there are no bugs on application level we want to install software that performs well if we are running TensorFlow we have ABX 2 instruction set available on our CPU we want to install TensorFlow that is optimized for our CPU with ABX 2 instruction set that's one example and at the same time we want to install software that is not prone to non-monoability CVEs at one CVE example but if I'm running some production components I would like to be sure that it's security or it's not vulnerable and it's secure to run it ok so let's move on and let's define a way to find high quality software and we will discuss Monte Carlo research algorithm before we go deeper into that algorithm I would like to say why we use this algorithm so it was not decision like from one day to another that we will use this algorithm but we experimented with different approaches that ended up with different results so we had quite extensive journey to come up to go with Monte Carlo research and the very first effort that we did was performing computations directly on the dependency graph this idea is described more in the linked talk and basically we load the whole dependency graph and we tried to adjust it in a way that the resolution process finds the best kind of date the discord respecting packages in the result software stack so come up with the high quality software however this approach was not good because it required a lot of queries to database and it's basically the program querying the database then we tried to optimize it using narrow miniatural optimization with brain postman learning where we used neural network but again that solution was not scalable and not nice it did not resolve anything real world then we came up with adaptive simulated annealing that was successful the solution was lazy so the queries to the database were really the queries that were required when the state space during simulated annealing was sampled this approach is documented or described in linked talk and then we came up with an idea of using reinforcement learning so there were implemented two main approaches the first one was temporal difference link and finally we compared it to Monte Carlo research and Monte Carlo research was the algorithm or is algorithm that we use as of now for reason being software stacks why I will tell it I will say it in few minutes so what is Monte Carlo research it will say something like in computer science Monte Carlo research is a heuristic search algorithm for some kinds of decision processes most notably those employed in software that plays board games in that context Monte Carlo research is used to solve the game 3 ok so we are playing some games but not really in the resolution process there is no real opponent when we want to resolve software stacks so first the idea looked shady but with some adjustments to Monte Carlo research algorithm we were able to apply principles that are in Monte Carlo research and we created a variation of Monte Carlo research which uses adaptive simulated annealing for bouncing exploration and exploitation I will talk about this in a minute as well the real opponent that we created is imaginary component because we are not really playing any board games and this imaginary component is TPU time or time needed to satisfy user request so we wanted to have our recommendation engine as responsive as possible so if a user comes to us and asks what software should he or she use when running TensorFlow we wanted to give recommendations in a reasonable time and that's the opponent we are playing against we know already that the state space of all the possibilities all the packages that are possible or available in the dependency graph that space is too large so we need some examples of how to browse dependency graph and how to behave in the dependency graph if you are familiar with Monte Carlo research or model 3 reinforcement learning methods you probably know Markov Decision process I will talk about this process but basically we pull resolution process and MDP that means we are looking for software that has high quality so we want to solve MDP by accumulating high reward and software stacks with highest possible cumulative reward are the software stacks that we are looking for the software stacks that have high quality we will dive into this idea as well it's worth to note here that Monte Carlo research is one type of predictor in our component in our recommendation engine that is called advisor and the predictor is abstraction that we've created in the implementation so there are also other abstractions so you see predictor that is the implementation of Monte Carlo research that we have tested using adaptive simulated annealing principles and then we have other predictors for resolving latest software stacks or predictor that uses temporal translearning or another predictor that uses adaptive simulated annealing and there are also other predictors that we've experimented with for example predictor that tries to resolve always one package to compute the results in the tech so for example if you want to test TensorFlow with different versions of Mumbai we can pack a predictor that can't narrow the resolution process to a desired set of software stacks much faster so this is the idea of predictor abstraction and in the slides you will also see other abstractions such as resolver that abstracts the resolution process that is combined with Python standards so the resolution process resolves software stacks that respect version specifications and also environment markers in the resolution in the whole resolution process and then predictor X predictor X as a guide who helps predictor to resolve software stacks there is also an under abstraction that is scoring pipeline and I will talk about it later on so mark of vision process again we will check few sentences from Wikipedia and then we will dive into examples to understand the idea and adjustments we've made more deep or deeper so in mathematics mark of decision process is a discrete time stochastic control process requires a mathematical framework for modeling decision making in situations where outcomes are partly around them and partly under the control of decision maker NDPs are useful for studying optimization problems solved by a dynamic programming and reinforcement learning so this is the definition that was obtained from Wikipedia but we will take a look at MDP that was kind of adjusted to other needs and adjusted so that it could be used in resolution process so a resolver that is using that is implanted in thought is using list or set of states this abstraction is called beam so it's a beam of states and these states are partially resolved so they do not state all the dependence that form a final stack but they key just few resolved packages or non-resolved packages and packages that are about to be resolved so each state has two sets one corresponds to packages that are resolved and another set that has that holds unresolved dependencies if a state hold no unresolved dependencies it means it's a final state and all dependencies that are present in the result set form the final stack that is resolved if resolved set has no dependencies that means that we have an initial state from where we start by expanding unresolved dependencies in which case unresolved dependencies are direct dependencies of our occasion so the whole process Markov decision process is basically resolving the dependency graph this resolution reflects the dependency specification as stated before and there are actions that are done so I've already mentioned that we have resolved set and resolved set of dependencies and if we move a dependency from unresolved set to resolved set then this is called an action in MVP in a resolver implementation it's called resolver steps by doing so by moving unresolved dependency to the resolved dependency set we obtain some reward signal that can be positive or negative meaning if including a given package in the stack is something we want to do like positive we have positive signal and if it has some drawbacks such as the given package has vulnerabilities including it in the server stack can have a negative impact and in that case the signal is negative after the resolution process is done we backpogate information about rewards computed and also information about accumulated rewards that is case of Monte Carlo research in case of TD learning the backpropagation is done immediately after after step is done so that's the only difference between Monte Carlo research and TD learning it comes to implementation okay so maybe let's have a look at some details so let's consider that we have some state SM and this state is formed out of result dependency set and unresolved dependency set as stated before in result dependency set we have class in a specific version that is installed from pipi.org and in the unresolved dependency set we have dependencies that can be included into our software stack in specific versions so I have options like if I want to use if I want to include click in my software stack then I have two options one is install click in versions 6.7 or install click 7.0 from pipi.org which one that's up to Richter which will tell the result which version should use so we have this state SM result dependencies unresolved dependencies and we have also score that corresponds to this state and what we want to do we want to in this case we are programmers we want to maximize this score if we would be mathematicians we would minimize that score but we want to maximize it so this state SM responds to this node in the graph and here are input options that we can take so we have this state SM and what we do we ask predictor which dependency should resolver meaning I which dependency should I choose to resolve this score stack and predictor it can be NCDS multi-carot research did you learning or it can be adaptive simulating any other predictor says that we want to resolve we should resolve ginga in version 2.10.2 from pipi.org so let's do it if we do it then we need to adjust the state in a way that we don't consider ginga 2 as unresolved dependency and move it to resolve dependencies this move or this action that was taken says that we have immediate reward signal that is 0.2 so we adjust our score in newly created state so that it is 0.2 as a reward signal that we obtained then what we need to also do as we move to ginga to resolve dependencies we need to resolve dependencies of ginga so we need to ask resolver what ginga to what dependencies of ginga to so in this case it says markup safe inversion above 0.23 and bubble above 0.8 that need to be resolved in specific versions that are known that are known and and are available I skipped this part the reward signal that we obtained can be also none or infinite infinite this is just implementation detail so if the given transition would not be elite we would obtain a reward signal that is number and if the resolution process would lead to a final state then the final or the reward signal would be infinite that would be propagated so predictor knows that expanding this dependency leads to a final state so we know that we need 100 dependencies from resolving ginga to in the specific version and again it's new packages to unresolved dependencies in state as n plus 4 and we continue this process until we don't have any packages in the unresolved dependencies set and in that case we would have the final state which states fully resolved software stack with some score that corresponds to cumulative reward signal so this was one performing this was performed one action Monte Carlo research resolution and in case of Monte Carlo research we continue this process until we reach one lethal and then we backpogate information about score parent nodes meaning the nodes that were created out of the original score so in case of state as n plus 4 that we computed we propagate information about score to parent nodes so the predictor can learn how the solution process looks like and how the packages or how these software stacks are scored in the whole state as I already said that we had to adjust Monte Carlo research and that adjustment lies in balancing exploration and exploitation these are two terms we want to observe the state space like how it behaves when we resolve certain packages then do exploration we learn how software stacks should be resolved in order to come up with a set of packages that form the final stack with very high final score then what we do we do exploitation so predictor learns how to resolve software stacks during the resolution process this predictor knows which packages should be resolved in order to maximize reward signal it's quite good to say here that in opposite to board games we most of the time don't end up with same state, same node during the gameplay or resolution so what we do we average how the state behaves when we get a certain package so on average for example if we add Jinja2 to our stack in the version that we saw on average we obtain a reward of 0.2 then we state the stack in the predictors history so to balance exploitation we use adaptive simulating annealing principles in which case we have temperature that is set to some high number and over time it decreases and factors that form this adaptive simulated annealing are number of stacks that are resolved or number of iterations number of rounds done in resolver number of actions taken and so on so we really limit time spent in the resolution to come up with recommendations we also designed a random number generator that's called terminal random that helps us to narrow the resolution more closely to latest software but not use latest all time so this was the idea about resolving software stacks in really large state space and as you've already seen there are too many possibilities to be checked and number of combinations grow really significantly with each page added to the software stack so we already know that there is no no way how to check all the possibilities in any real world applications so how we created these heuristics to find the high quality software stack as soon as possible we found that reinforcement learning is the way to go to resolve high quality software stacks the last thing that I would like to discuss is resolution pipeline so you already know that we are taking some actions in NDP and these actions correspond to steps so we created an abstraction that's called resolution pipeline and this pipeline is created out of units these units have their separate semantics so there is a unit that scores packages based on performance based on security and stuff like that the idea is that latest software is not the greatest one so let's plug pipeline units that have some knowledge about software and can guide predictor the resolution by scoring and sending the reward signal during the resolution so we have different pipeline units based on their semantics they do one thing and they do it properly so they follow Unix philosophy and they form some kind of programmable interface to the resolution process as we will see this pipeline unit are very straightforward to implement the user of this recommendation or the programmer of this scoring pipeline does not need to know about all these MDPs and stuff like that so it's abstracted away these it's important to also state that these pipeline units are plugged into the resolution process dynamically that means they are asked to be included I will show you in a minute how it works so the resolver access vector vector of information so for example direct dependency software environment hardware that is used libraries that are used together some source code analysis like what it calls are performed also recommendation type so if a user wants to use secure or performance software so these are input vectors into the resolution pipeline that is created out of units and then together with a resolver and predictor there are created pins down software stacks considering knowledge that we have about these packages so let's have a look at an example of a pipeline unit this pipeline unit is called SAP and scores tensor flow builds that are optimized for AVX2 enabled CPU processors user uses AVX2 enabled processor so there are two important methods one is called run and the other one should include this class method and there is also AVX2 CPUs so let's take a look at the implementation the listing of CPUs is quite straightforward it holds a couple of two integers that describes CPU family and CPU model processors that support AVX2 so these are basically constants then there is that should include class methods that asks whether it should be included in the resolution process so it checks what recommendation type user requested and if the recommendation type should be performance or user wants stable software we include the pipeline unit we also include if and only if the user uses CPU the disk capital of AVX2 instruction set so that's the only case when this pipeline unit is included the semantics behind this class method is that if it returns none the pipeline unit is not included and if it returns a dictionary then it is included the dictionary has special meaning I don't want to go into details there is reason behind returning a dictionary then there is another method called run and this corresponds to the action that is performed in the MDP so we are in some state and we want to go into another state and that's another state is created by including package version into the current state so the run method accepts two parameters that are state and package version to be included into the state if the package version to be included is TensorFlow and it supports AV and the included TensorFlow supports AVX2 instruction set then we propagate a reward signal that is positive and we also say some informative message to user so we are coming to the end this was brief introduction to the resolution process that is using reinforcement learning this all was done in project tots so if you are interested feel free to reach out to us at totstation ninja we are part of AICLE office of the CTO so you can also find us on google chat the source code is hosted on github the totstation organization we have also twitter account with totstation handle handle and that handle has some dash int so be careful you follow and we have also youtube channel where we radically send or publish videos from our scrum sessions and also some interesting talks that we found if you follow us you will find more information about distributed environments about for example argo workflow, secton pipelines rcd all this cool stuff like performance of machine learning libraries and stuff like that so you don't adjust about this reinforcement learning algorithm but also other things that might be interesting to you this way I would like to thank you for attention and hopefully see you next time, thank you alright thanks for that Frito it was really interesting looking at the schedule we're pretty tight on time right now so you can hop on video right now in case we have a question or two but otherwise I'd recommend we move the conversation over to the breakout room okay it seems like we're not really seeing any questions so again thanks a lot Frito have a great day