 Hello, I'm Chris. Hi, I know you Don't do we do this for everybody I gonna introduce myself you're not gonna be introduced. This is another modaka. Hello, Antonio. How are you? So, oh, yeah up there Francesco and I we are working for the AI center of excellence that is part of the office of the CTO So we are red headers. We are not affiliated with any of our products But we try to influence these products and we try to come up with a few new ideas Today we're gonna talk about what machine learning might do for developers. So for people that are writing an actual code and What we can get out of source code so code that people have actually written We are not at the point where all these machine learning things generate new code for you guys so that you Okay, still need it. So we will meet you in the future. That is what I want to say I've been in redhead for quite some time I've been in other organizations which might have bought redhead for a few million dollars and I've been an infrastructure guy I've been in a Development guy and I'm pretty high level That is why I've become a manager because I don't know how to do stuff So they moved me away Francesco is very different Francesco has been at the European space organization agency or European space agency He knows how to do to put stuff in the atmosphere and in the outer space That is why we hired him for at least some competencies So that might be I have no stickers so We depend on your participation No stickers, but participation. So what I can offer to you is that I stop by and shake your hand if you participate, right and Some people are not allowed to answer. So Dave is allowed Let's figure it out French freedom is also not allowed Answer so we're taking care of software stacks with our group. We want to help developers somehow I don't know if you have ever written software, but if you have done that. I'm pretty sure you Somehow depend on other software because software systems are getting more and more complex If you're in the node JS universe You basically include another module to do your hello world stuff, right? Because it's simply possible I don't know who came up with that idea, but node JS is a very complex ecosystem We are somehow originated in the Python ecosystem. So it's not that complex, but still complex. I Cannot comment on Java or Golang. It's also very complex. So how do you guys who of you is a developer? But very nice You are our target audience and who of you has actually used some dependencies and how did you choose these dependencies? So, how did you figure out what dependency to use to implement your stuff? Say it again, so you read the read me for Don't laugh at it. It's it's a good approach to figure out what's happening, right? Yeah, that's what what we do because we want to figure out is it fitting my needs is it good and What about the alternatives? I guess you most often find alternatives to your requirements Is there a different approach to reading the read me? I mean we have Okay, send it through Send it through CI and figure out if it's fitting your needs, right? Okay. Good approach agree What if things get faster and faster so if you have more and more and more read me's to be read I mean we are human beings that that's a problem, right? And what about if things get regulated so Some of us are very lucky because we work in an open world some of us may work in and financial or pharmaceutical company Or in a defense company. They are not very open most often. So things get regulated So what if all the read me's you read are good, but you're not allowed to use it But it's a thing. It's a real thing What if what if your your dependent the software package you use is very critical I mean have you sent stuff to space is that critical? Yes. Yes, okay Depends on the point of view I've refused to work for aviation because it's critical because people gonna die if you Do a mistake? But in the end It is really like you research on all these frameworks read the read me You test all that stuff send it through CI figure out if it's still happening You have a lock look at the build logs and you try to figure out What is happening with this software? I think you said it What if the project is slowly dying is it still maintained stuff like that? But I think we can also agree that this is not really a feasible approach because it's somehow Very limited you guys need to read all the read me's so You're constrained by space and time and if you research more and have to read more read me's I'm pretty sure you can code less or you have more working hours than 40 week per week Which is also not good You could also do Thing that we do in management we ask others to do our work, right? Yes So I scored that slot for the presentation Francesco did all the work Manager But we are talking about software developers. So there's a lot of mundane work, right keeping stuff up to date You have read the read me you send it through CI you figured out the project is still alive Maybe in the next future it's still alive. So you got to keep it up to date all your software There is a lot of updates happening Again, Python universe is a good example. I don't know about Java It's constantly updated. So your beloved package is going to be updated each and every other week You could ask others to identify Drawbacks of the thing that you made up. So if you've chosen some framework Maybe you're gonna talk to Francesco because he knows space maybe he's a good source of information for you or You could simply Ask a different person to give a little bit of advice on your source code, right? So you you are using a Dependency, but you don't know if you're using it in the correct way and that is exactly what we want to do for Developers we want more cyborgs more bots that are doing your work so that you can Spend more time on actual coding, right? We want to get rid of all the boring stuff or mundane stuff And just want to focus on creating fancy algorithms which are doing fancy things for us Less mundane more coding So that is actually what we want to achieve with something we call project Toth So we really want to create a pretty huge knowledge graph or knowledge database or knowledge base or whatever you call that thing we want to aggregate in a lot of information by reading readme's by sending stuff through CICD pipelines by figuring out how code behaves that we are using We want to store all these observations that we have on software stacks into a graph into Dammit, it's a database We want to store all that information that we all these observations that we have made into a knowledge graph and we want to enable developers to access that information either by using bots which act actively act on your source code or by a querying our database and give a little bit of good advice right, so Our goals are really oh by the way we can talk for about 345 minutes So whenever you have a question maybe just throw it in because I don't know if we will have time at the end It might be that we question them. Okay. Go ahead Yeah, I don't know if it's called open hub So there is a service by a black duck which try to assess open source products and which try to assess especially security-related information and License-related information. I Can't remember if they go into the direction of project health for example Yes Exactly Yes, so For for us the problem with Websites like these which provide a lot of information is the same as with the read me It's a fancy read me But you have to read it and you have to make decisions and you need to ask others what they think about their information So it's just like pushing the problem a little bit Away from you, but not really solving the problem of scale from my point of view So what are we going to do with project health? First of all we want to deliver optimized AI stacks So I said that we want to aggregate all that knowledge about software stacks. We want to use that knowledge to optimize Important AI stacks as you can imagine. We are the AI center of excellence. We are very interested in AI stacks Our primary ecosystem is Python because most of the data scientists nowadays work in Python I know there's our and stuff like that, but most of them work in Python We are interested in frameworks like tensorflow, PyTorch and so on and so on Delivering optimized versions of that is one of our primary goals That is basically applying our knowledge to a few open source software projects we care about The second thing is that we want to Create a prototype which is able to deliver all these fancy information that we have aggregated to developers We call it an AI backed guidance system You can call it a recommendation system or I don't know you can you can have many names for it The point here is we want to open this information that we aggregate to developers and integrate it into developers processes And By the way, I have a clicker, but I don't use it. Maybe you use it. I like to walk around right the third goal for us is that we really want to Automate and advance the analytics we are doing right now and come up with new ideas for this software push them through the System that we have anyway CICD pipelines Observations that we generate and try to figure out if we can have some bots doing the optimizations for us If you if you try to optimize tensorflow It took a human being about a year to increase the performance a little bit something like five eight ten percent, right? So it's a very expensive to to increase or to optimize stuff like that if you use a human being So we really want some some AI bots doing that stuff Good so What we really want to do is to increase the Velocity of the developer we want these guys to work faster faster faster by getting rid of all the boring stuff Just my name comment Yes, you need to deliver stuff. I'm a manager. That's what I say faster. I Mean you can translate that also in into more freedom, right? If you don't have to do that boring stuff like the 15th dependency update today That's a little bit of freedom you can get from us so now I'm gonna flip through this because I Screwed up the transition. You did this light. Yes. I did the slides I'm a manager and to that and Francesco will go a little bit into the architecture and all the technical Details that we do right Right. Okay. Go ahead Using clicker. Yeah, it's easier. It's not working Managers can flip slides. I can do it for you. Sorry Example of what we Yes, I think we come to examples later on right so we go a little bit into the details of what we do and how we do it and Examples are attached. So give us about 75 minutes and we're gonna Okay, so So this is the architectural review of a project thought so project thought is Deployed in different namespaces on open shift and each of these namespaces as specific services that we use to gather some different types of observation to perform some type of Analysis in order to create this observation and there are different kind of APIs to talk to thought for different Services and this API is are not used just by humans But most of the time I use by bots because we like to automate most of the things we do and So I will go a little bit more into the details of the type of observation we have and the general overview of the recommender system that is that we created actually Frida created which is the implemented the advisor pipeline and Then I will show you some example of how we integrate thought in the day-to-day life of every developer So this is Some of the observation that we collect We plan to increase the number of observation But as you can see We focus as we said on the Python ecosystem and we consider all the software stock So starting from the application level the direct dependencies of that application the transitive dependencies that comes from the use of these dependencies and then we go to the Python interpreter the native dependencies and the operating system that stays on above the hardware and In order to collect these different types of observation We use different services. So first we have what is called thought solver so solver is analyzing taking each Python package from PI PI and basically we try to install it to build and install it on different type of solver so different type of Fedora's different type of Rel versions and in this way we can get all the different dependencies and we can store them into our database So later on we don't need to re-download them But we will have already we can do the resolution of the dependencies directly offline and then we We can also find out for example if there are errors So we try to install some package we can see if the solver fail and we collect also these analysis and these logs and we are currently also working on with any now one of our intern on Classify this type of errors so we can give also some type of recommendation of why that solver failed and Then performances so as we said We use Python and we focus on the Python ecosystem and nowadays is used Especially for the machine learning and all the AI Field and we have two different components because what we want to do is to obtain observation on the performances of a specific ML application so for example We started the analyzing different type of algorithms and from this algorithm. We try to decompose them in order to identify some specific Backbones of this Algorithm and we created what we call performance indicator, which are basically script that we run in order to collect Performance observation that then we can reuse in the system and in order to do that we have these two components. So dependency Banki is Basically doing a monkey job. So we start to providing a Software stack and then for example for a one version of the of tensor flow You have all the sensitive dependencies and is going to create all the different combination of the software stack possible And then it's gonna input to a moon API, which is actually the one that is gonna inspect the software stacks and Provide us with the performance observation We don't collect observation only at the level of the application, but also at the Python interpreter We have two performance indicator one using PI bench and one using PI performance So we also include the performance is not just at the level of the application But we also go in the different level of the software stack and Then of course we want to include the issues that could arise from the ABI incompatibilities, so you you are able to install it, but then you cannot run the actual application and of course also Security vulnerabilities, so we collect we have a specific job that take all the possible all the Available CV and Store it in our database. So has everybody tried to figure out why PI error in the latest version and tensor flow in version 114 is not working together in an installation in an application Not huh, okay, we did we have parts for that and it's very tricky because it's of some Symbols exported by different versions of the library and by two different Versions of the library that have been used by the spider Mario So I guess if you run into that situation you're gonna spend a lot of time figuring that out Unless you're kind of hacker or a library developer it took us some time and Then This is another project that we are doing with one of our inter and As we were talking before how do we choose like a package? So you want to know if it's a popular package if it's well maintained So you want to know the health of this of that project in order to say okay I'm gonna use that because I'm sure that it's gonna be well maintained and it's gonna be Let's say You will have it for many many years and in order to do that we have this so we analyze source of metrics So from different software development platform We started with the github we take a specific package and we start to collect the poor request issues all the interaction between the contributors and from all this data We try to define some source code meta information that can be used later on to maybe predict if the the package is gonna be Popular or if it's gonna be abandoned so we want to provide also this kind of Let's say recommendation to the user and for example, we collect the time to review So all the time to close an issue if someone open up Prequest and what or a feature request you want to know how much time is taken in in order to provide this feature and some example for example are Given for example by the length of the request this is maybe unexpected the results but the biggest the Request then the longer time you take to review and this could lead to a longer time to provide the feature for example So we can give also some we can provide this kind of analytics in order to use them for other recommendation for the users And so we ask at the beginning how do we choose this dependency? As you understood we like bots so what we're gonna do is to let the machine decide So we have all this knowledge and we want to use it to provide the recommendation on which of the stacks for example needs to be used for If you have requirements of performance is if you want to have a very secure without Vulnerabilities so we can automate all of that you don't need to because the bots you are gonna are gonna do that for you and so We go a little bit at the high level of what the recommendation system in thought is so the component that Actually provide the recommendation is called advisor and it's based on the reinforcement learning So if you are familiar with the three main classes of machine learning you have supervised learning when you have Fixed data set with example which has inputs and outputs or they are labeled. So the system is gonna learn using this example in other cases you have Datasets which do not have any label. So in that case you want to discover if there are Some patterns some common artists within the data. So in that case you rely on unsupervised learning and There's the third family of machine learning problems, which is their enforcement learning. So in this case The system is gonna Have a sequence of decision-making. So it's gonna do a trial and error Cycle in order to learn which is the best policy to solve a specific problem and just to give you an example Everyone knows Pacman, I guess. So reinforcement learning is used a lot in game and Not all in this field, but it's very used used a lot in this in this field. So In this case we call the one that is gonna take the decision of what is the next action is the agent in this case It's Pacman. So Pacman stays in an environment, which is the game environment so you have the position of Pacman and the position of the enemies and the agent is gonna take an action and Receive from the environment a new state So a new position of the of Pacman of the enemies and the reward and The goal of the of the agent of course is to maximize these rewards not in the immediate Time but on the long on the long time So at the end you want to accumulate as much as possible rewards in order to finish the game and well don't die let's say and So if we try to make a commonalities with what we are trying to do in our case so we have We want to resolve the dependencies. So we have a new space of Dependencies that we want to resolve and all these dependencies are basically creating our space of possibilities and the resolver for example start with one specific version of the tensorflow and see what are all the transitive dependencies that existing in our space and the state in this case is not just the state basically give you the The dependency that they've been solved plus the dependency that still needs to be solved and the set of action is basically which is the next dependency that I need to use in order to move forward and the in order to reach the final state which is the fully pinned down software stacks and Basically what you want to do is to as I said to maximize the cumulative reward and from which state you will receive also the reward That is going to take you to that final goal And if we go a little bit more into the detail of the resolver So the resolver is Respecting all the Python ecosystem. So it's still following the peep and peep and the standards and It works offline. So we already collected all the dependencies in the database. So we don't need to re download them in order to provide the fastest result and We we have new type of observation that we can include in the recommend in the final decision of the software stacks For example performances if there are CV so we give new type of hint to the developer for the DC for the final soft attack that is going to be provided and resolver is our advisor in general as a different pipeline with different steps and sieves in order to Decide which is the next which is the final software stacks to be provided and the resolver is as you imagine as to span a very huge type of number of dependencies and Is used another component which is called the predictor which is basically the one that is gonna Tell what which is the best policy in order to reach the in the fastest way The final software stack. So the resolver So it the it's the so the predictor has two main goals So it's to guide the resolver to obtain the fast in the fastest way the the final software stack to be provided to the user and at the same time it it can see if there are observation missing in the system and it can use that to Basically instantiate dependency monkey in the moon to collect those those observation And it's currently under development the temporal difference is already implemented If you have more Precise question Frido is also here to answer. Yes So you ask if Python is the only Ecosystem that we are currently using The answer is yes, but we don't want to focus on in the future. We're gonna have I guess Yes, of course, yes, so Python is a very large and popular ecosystem for machine learning. We are the AI Center of excellence. So that's a natural choice The the platform that many many many fancy boxes you have seen that we have created is agnostic, so if you change the solver that we have right now for JavaScript resolver NPM for the Node.js ecosystem, we could do that. Maybe we need to change a few database structures But yes, we are open to that Yes, you can decide how many number of stacks can be I'm sorry how many number of stacks can be Basically spanned by the resolver so you can impose the limit on how many you want to consider Other questions. Yes So you're asking if we could include developer preference in in our decisions. Yes Yes, yes, so These advices we give are created by a pipeline and there are pipeline steps which add possible states and there are pipeline steps which remove possible states so I think you are asking if we can have a pipeline that is Respecting a certain set of criteria to remove packages from the resolution Yeah, I mean, it's it's a weighted thing, right? You want to say for whatever reason I like these people over there at oracle and that is why I prefer their XML parsers, right? So that that is something that could be included nothing that we can do right now because we have focused on on the observations that we did but in our Thinking it would be some kind of observations. Ah, this guy loves oracle. No Making things up this guy loves IBM for its For its XML parsers, so we prefer that a little bit more that is something that we could include Yes, but we are not doing that right now right now. We are pretty emotionless about your user preferences We are Looking at alternatives. So if you figure out that this is a Package for machine learning and there's an alternative. We could take that into consideration Yeah, that's basically it but these pipelines which create the advices are pretty flexible talk to Frito We accept Issues on github as feature requests and we accept pull requests on github as feature implementations and user contributions You want to know if we respect the licenses of the dependencies no Correct so the question is if we respect the licenses and Compatibly issues between licensees in the complete dependency tree. No, we are a little bit There's a different project in red hat which is focusing very much on CVE so security and license Compatibility that is why we have chosen to say no, that's not the topic we are looking at because some other people do it and We are really looking for observations like performance compatibility ABI compatibility how to resolve these things but again looking at the license Or figuring out what licenses use and what package would be an observation in our universe and that observation could be evaluated by one of the steps in the advice pipeline so we need to figure we need to Do that observation to generate it that is something that Seems to be easy because most often we can look at the license file in the repository and figure out what licenses is Creating that compatibility issue topic or is compatible with Relation that is a little bit more problematic because I think right now I would throw a human being on that because It's a limited set of licensees right now and we know them at least we redhead us know them pretty good and we know what's happening Is it something that you need to get solved or is it? Yes So for us redhead us it's most often more like a Discussion on the physical level because there's a few people which prefer Apache and there's a few people which prefer GPL So it's more like a political discussion than a technical issue that we need to solve Yes, yes, but but as we are in a political discussion we leave it to the humans to figure all that stuff out As I said, we are pretty emotionless with all that machine learning Did I answer your question somehow? Gonna give you the Yeah, we have the slide if you want to find the where are all the project Yeah, I'm gonna arrive at the end and then if you want I can also show you So oh, yes So There is what so in the pipeline we have steps and strides so strides is basically acting as a filter and As far as we have this information inside the graph database So the the CV you that we store at the moment can be extended with other type of CBE if we have other type of sources we can include them and this is gonna be included every time in the in the resolution of course because So it's not limited. It's just that we need more observation So as far as we get more observation the recommender is gonna get smarter and smarter and And give you a better recommendation So we are really opinionated about that we create the pipelines we give the way to these topics We can create a lot of pipelines if you think security first Yes, let's let's have a look at it CVs are very important for you if you mix that up with the Project health observation like the project is slowly abundant and you Mix it up with your well There's no CVE in there because I somebody don't care about all that stuff that is exactly the kind of observations Which should eliminate a project because it's that anyway and insecure Talk to Frito again We are happy about feedback like that, right? Yeah Keep on going to satisfy the show examples so how do we actually help the developers so Thought as a recommendation system can keep your dependencies up to date So we currently have a bot that do that for us that open for the quest with the update the dependency for us so they check if the Dependency can be solved and they open up request for us So we completely avoid the to do that work that is supposed to be so we can spend our time do other code or Going out maybe and So they maintain the software stack secure So if that is what you're interested in then is going to avoid all the CV of the observation that we have inside the database If you are looking for a application, it's going to provide you with the most performant software stacks for the specific type of application you want to use We integrate as I said other type of source metadata information that can give you an idea of if the package Is going to be is well maintained all is going to be abandoned So we want to integrate also this kind of recommendation inside and in general is can be it will be integrated in the top day-to-day developers tools and Just to show you the example we have at the moment So thought can be used from the command line. So there is a Thomas which is basically the one That talks with the top directly and you can use it from your command line We have the KBSHET cyborg that create the request the put request when You want this dependency need to be updated? So we don't need to look for updating that but the bot will do that for you We can be included in container builders or source to image We recently also Created a github up and Also, there is another one of our team members is working on including taught in Jupyter notebooks So to help the data scientist and just to show you what has been done right now There is a UI that so it's an extension of for Jupyter notebooks. We call the nb requirements. You can find it online and Basically, you want to choose which kind of Packages you need to you want to in your project and you can do it very easily using this UI and On the back-end side, you will have basically taught Recommending you the best software stack to use for your dependencies and So this is a very simple command if you want to install Thomas you can be prepared to install Thomas and write Thomas advice if you want to receive advice and And Thank you for all the Let's say suggestion that you gave us and if you have other please just contact us and if you were asking For how to contact us you have we have first of all, this is where where we have all the Code so on github you look for thought station is our organization We have several type of a repositor inside you can find on Twitter and we have also website Let's show what we are doing And that's it seems it seems German, but what sorry say it again So the question is why that name that nobody can pronounce So We are not recorded and all that stuff is not going to be published on the internet, right? So somebody gave us the name Toth that is because of the expenses And we figured out we need to have we are going to have a lot of components if you look at our github repository it is Sorry, if you look at our github organization, it is like 110 repositories or something So we have a lot of components and we need to name all these components So we figured out the expanse is not what is giving us enough names so we switched over to Egyptian ancient history and I think the the what I'm what I'm most proud of is This one the last line github Application called Cabbage head which is basically just a different name for the same old thing and everybody hates it, but We stay in the picture. So thanks for the question. It was Good to talk about it Any other questions any other ideas as I said before we are open for for all of your ideas Our primary goal is really to help developers get most things done Ten minutes. I need to talk for ten minutes now Antonio So your question is if we can do for example dependency updates Automated dependency updates for Golang or have all these evaluation these observations for Golang I Thought I answered that question before so you might have not been listening Antonio, but here's the answer again so So our whole platform is basically agnostic to the the language that we are analyzing the ecosystem There's a component called solver where we try to solve the Dependencies and have observations during that solving a process. There's a component called the the advisor Which is giving these advices. So these are obviously language specific I need to know what a python module version string looks like because I know that you go go guys do it a little bit Different that you don't like semantic versioning and all but there's obviously a language specific component in here In a rough guess it's it's it's a middle-sized effort here It will take you longer than four months, I guess so We need to have a look at the database, which is pretty python centric right now We need to have a look at the solvers and all that stuff Python is as I said primary focus because of AI data scientists and friends Golang is Maybe the next because it's very interesting. You said that redhead is investing a lot in Kubernetes open shift The whole ecosystem around go is very interesting for us If you want to join I'm open Any any other questions Not in five minutes, but yes come meet us outside Bring a coffee. We're gonna have a demo So Right now we we have a bunch of documentation So if you want want to run through a Jupyter notebook and see how Toth works You can do that if you look at our github Organization you see a lot of pull requests which have been created by bots Taking you really through all these steps that happen while we are doing an advice or a recommendation Would take a little bit longer, but I think we can do that, right? So join us outside Next no Yes, so that's a question to the Defconn for organizers organizers our slides online Yes, they will be published right so if you're on chat org and and look for this talk I think there will be a link from that talk announcement to the slides itself and Again, oh damn. I moved it away Have a look at github. I created a repository for that so that we have 129 repositories and we're gonna put all the presentations in there Ping us either on on Twitter or open a github issue if you need these presentations and I forget about it Another question, mr. Neary Yes Yes, so You're asking if there's a way to to query our database. Yes, there's that Tamos utility Command line thing mentioned in the first bullet point That's basically command line tool which takes your repository from your local disk Send it over to us and we're gonna analyze it that will work if you are inside the red hat VPN if you are outside the red hat VPN I'm really sorry our IT security is not as fast as I wanted to be so we have no public service by now But if you leave your name, I'm gonna come back to you and you're the first alpha tester Good, so you're safe Any other question? Thanks Check it out. Let us know what you want