 staying so late for our talk. So we are going to show some interesting use cases we are solving at Red Hat. And this is basically how we are leveraging AI, or rather deep learning, to enhance developer productivity and confidence, because that is one of our key themes or key goals. And slides and code are available. You can check it out on GitHub. And basically, we are from the Red Hat Developer Tools team. And I'll let my co-speaker introduce himself before I get started. So yeah, I'm also a data scientist working in the Red Hat Developer Tools analytics team. And we'll cover two interesting use cases here. The overall thing is we'll cover what is this dependency landscape all about key objectives, business significance. And we'll talk about our deep learning models, what we are using. We'll try to go through some hands-on code so that you can see the kind of architecture we are using, kind of outputs we are getting. And we'll talk about the current results we have got and what are the next steps for us. So the main thing here is everyone here is a developer, at least. Even if you're a data scientist, you are building or developing models, or you have been a developer, let's say, at some point in your life. And the thing is we don't code everything from scratch. We don't code neural network models from scratch. We leverage reusable components or libraries. And that's the thing about dependencies. They are all these open source frameworks out there where people have kind of worked hard on building these frameworks and making all these cool features available for us, which we can integrate our own applications, let's say enterprise applications, and then build these applications with ease so that we don't reinvent the wheel. And these are examples of dependencies like leveraging Google Cloud, deep learning, and so on. Now, if you look at the dependency landscape, there are so many options out there. With every year, as we go into the next year, from the previous year, these number of dependencies are kind of exponentially increasing. The Node.js ecosystem is kind of the most popular one out there with JavaScript dependencies increasing exponentially, as you can see. And then you have Maven, which is the Java-based ecosystem. So all these dependencies are increasing exponentially, which means more and more better features, more and more better libraries are being available to us. But it's also hard to do the due diligence of whether our code is secure, our code is safe, and so on, right? Because we are not the ones who are writing these libraries. We are the ones consuming them. So this is a case study on proactive security and vulnerability dependency analytics. And what is the security landscape like? So the security landscape, to understand that, we need to understand, what are these vulnerabilities? Because even I am not from a security domain, to be honest, my domain is data science, right? But I'm working in this kind of a problem right now. And a lot of you may not be, let's say, security savvy. So what are security vulnerabilities? They are basically bugs in source code, so that, let's say, malicious hackers or attackers can kind of exploit your system and gain access, right? Unauthorized access. So commonly, they are termed as CVs, known as common vulnerabilities and exposures. And there is actually an online database maintained by the US government, known as NVD, National Vulnerability Database, where all the CVs are listed for each and every ecosystem. Like if you search for a TensorFlow, you will find that there is a CV ID because of some kind of malicious code which could be exploited, because of some bug in the code, some attacker could get in and exploit it. Similarly for different third-party libraries, which you guys are using, let's say, day in and day out, there could be potential vulnerabilities. And the ideas we don't want these to affect our enterprise applications, right? Because if it's a personal project, great, but if people are consuming our enterprise applications and these kind of vulnerabilities are there, it becomes a problem. And if you see these trends over the period of the last few years, with regard to all the major ecosystems around Maven, NPM, and Python, the number of security vulnerabilities are also increasing because the number of dependencies are also increasing. And there was like a 43% increase in vulnerabilities in 2017 and a 33% increase in 2018. And the idea here is, unfortunately, we are working a lot on the Golang ecosystem, which is this one, right? So the number of public vulnerabilities are often missed because if people don't find out about a vulnerability and they don't disclose it publicly, it will never be published as a CVE, right? So that is becoming difficult for more complex ecosystems like Golang because it's not as straightforward as the other ecosystems. And this is the reason only 10% of maintainers or let's say people who build these repositories or dependencies file a CVE publicly. And because these vulnerabilities are not disclosed as CVEs, what happens is if you are using these dependencies where a vulnerability exists, but these are not reported, there is a real risk, right? Because people can hack your code and they can just one of the most common vulnerabilities you all may have heard about is the SQL injection attacks, right? So similarly, there are so many other types of vulnerabilities which can really affect all your applications. And one of the other key limitations of the existing vulnerability analytics is that the time elapsed when a vulnerability, let's say exists versus when it's disclosed, it's like huge. So between that time, a lot of damage can be done to your own enterprise. So what are the key objectives for us here? So we focus on Red Hat products and trying to keep them safe and secure, that is our go-to thing to kind of focus on. So Red Hat has this product called OpenShift which uses all the goodness from Kubernetes. And what you can do here is you can build your applications, you can deploy them and monitor, maintain, and scale them at ease, right? Because it uses the power of Kubernetes. Hence, our focus is on the Golang, Kubernetes and OpenShift ecosystem. That's like a total of almost 850 plus direct and transitive dependencies. And if you kind of keep a pulse on the security news, you may have seen recently that almost a four month audit was done on the entire Kubernetes stack and 34 vulnerabilities recently cropped up. So that's like all these days people are using this and there were 34 serious vulnerabilities which could be exploited anytime. So that's the thing here, we want to kind of find these vulnerabilities proactively even before they are publicly disclosed so that we can integrate these findings into the developer processes and potentially engage with the developer community out there to improve our models. So our key objectives, what did we do here? How did we solve this problem? We potentially built these deep learning models based on past historical data where we knew that some vulnerabilities happened versus when they didn't happen, let's say, depending on that kind of data. We'll talk about what is the data shortly. And what we are doing is we are kind of monitoring all these regular public activity going on in all these 850 plus repositories and dependencies like let's say GitHub events around issues, pull requests, mailing list conversations, bugs being filed online, all those aspects. And using that public information, we are trying to predict probable vulnerabilities that based on this kind of activity, maybe there is a probable vulnerability in, let's say, Golang or let's say in Kubernetes. And obviously we have to validate with the security experts because we are not security experts, right? So coming to the business significance, I already covered this, right? So let's say when a vulnerability is already present and by the time the fix comes out, the median time to potentially fix and put it out and publicly disclose it or maybe sometimes they silently just fix it, right? It's like close to 886 days. So this is like really bad. A lot of damage can be done in like two years, right? And how we are leveraging our models to work is we are tapping into the public data because most of the time you will see that, let's say I'm a maintainer of the request package in Python. I will have so many other things to do, but all these millions of people who are using my request library, they will know that if there is some kind of a bug or something is there, right? Because they are using it day in and day out. So often publicly people will open issues or file pull requests or basically talk about bugs in mailing list conversations and so on. So we are kind of trying to tap into this data and also let's say people are doing code reviews and other aspects. So we are trying to tap into all these public events and conversations around all these dependencies and feeding that to our models to make these predictions. So to summarize the business significance is basically Golang ecosystem is quite complex and as you know you use PyPy or Konda for package management in Python. Similarly Java as Maven and in Node.js we have NPM. In Go you have some kind of ecosystems of package management, but they are not as mature as all these ecosystems and NVD feed for public CVs is incomplete. A significant number of vulnerabilities only get initiated after an issue or PR is filed. So people will not report these publicly a lot. You saw that graph right at the bottom right. Golang is like the least as compared to the other ecosystems and we talked about the time lag of issue being reported versus publicly being filed. So what is the architecture we are using for our vulnerability detection? We focus on the OpenShift source code base which is around 850 plus repositories or dependencies let's say. And we tap into all this. We pull in GitHub events around issues, pull requests commits, we pull in things around mailing list conversations, bugzilla, bugs being filed and so on. And we will have an events collector which pulls in all this data and we'll store it in a data store. And now what happens is we have built some deep learning models let's say on historical data of past vulnerabilities which were found versus regular issues, pull requests, mailing list conversations and so on. And what happens is all these new issues, pull requests which are being filed, it goes through our first deep learning model. This is a bi-directional gated recurrent unit deep learning model with attention, seems to work pretty well on text data because we are parsing all the text descriptions with regard to issues, pull requests, commits, conversations. So these are all natural language data. So this is basically a deep learning on NLP problem. And then what happens is we are using our first model to go through all the source data and find out what are potentially documents in our case events around pull request issues and so on. What are potentially related to the security domain because people will file issues about other things also, right? Like some feature is not working, something else. So what are potentially related to security that is what we first focus on in our first model. So it's like a binary classifier, right? Security versus non-security. We filter out all the non-security data and then we pass in only security relevant events into our next model, which focuses on out of all these security related issues, what could probably lead to a vulnerability. So every security issue or whatever, commit or pull request or conversation being filed is not always leading to a vulnerability. Something could be like, okay, I'm putting in a security feature request or maybe I need to change my authorization technique. So everything is not about a vulnerability. So the second model is very tough to focus on because it's all related to security, but some are a subset of it, which is vulnerability related. So our second model tries to take in all the security data and say what is potentially a vulnerability. And then basically our final predictions go to the security team, they validate it, find out the false positives. Obviously there will be false positives, right? I mean, this is a tough problem. And then we will feed it back to our models after doing a triaging and then retrain our model. So that is the pipeline we are following. And to summarize, regularly monitoring 850 plus repositories, extracting all the public data, filtering out security issues, using the filtered data to predict events which are about probable vulnerabilities and then triaging and improving our models. So deep learning model architecture, like I mentioned, using pre-trained models for, well, let's say pre-trained embeddings, not pre-trained models, we are going into that in the future. Using a stacked two-layer bi-directional GRU deep learning model, feeding in the GRU hidden states to a global attention layer and your regular fully connected dense layers to make the final prediction. So with regard to embedding layer, I think almost everyone here knows about embeddings. You have your text data, you map it to some numbers and you start with some random initialization of weights for each word. And then with back propagation, you try to improve all the embeddings. But we are not filling it with random weights. We are initializing with pre-trained embeddings. Many of you may have already used, like Glove, Fast Text, Word2Vec, and so on. For us, it performed better than random initialization. So first we did random initialization, then we did it, it performed better for us. And GRU is basically similar to an LSTM or an RNN model. So instead of the forget and the input gate for an LSTM, it uses a reset gate and an update gate basically. And what happens here is the update gate acts similar to the forget and input gate of the LSTM. It will decide what information to throw away also and what new information to add. And reset gate basically kind of focuses on what kind of past information should be retained. And it takes fewer tensor operations, dot products, and speedier to train than LSTMs. So what's the need for bidirectional GRUs? We could have used just GRUs. So the reason of this is to get better context. If you see these two sentences, if you go for the first three words, here it means beers, but here it actually means that it's the precedent. So what if we go from front to back and also back to front? So I talked about this yesterday in my NLP workshop. The idea here is if you can put two LSTMs or two GRUs, one trains from front to back and back to front, concatenate the final hidden states, they will be able to preserve much better contextual information than just going from front to back. And what we do here is, instead of sending out the last hidden state from the GRU, which typically happens, like in this case, let's say I have four GRUs, typically you will have the input going, the next hidden state goes into the next one and so on, and the final hidden state goes to a dense layer and you make the prediction. So instead of that, I take all my hidden states and I put it through a global attention model and then I get a context vector which helps me focus on all these weights where if this is higher, maybe this word is more important. So that kind of an aspect. So we use an attention model where instead of using the output from the last GRU cell, we send the entire sequence of hidden states to the global attention layer and then we get a final context vector which is a weighted average given by alpha into the hidden states. These hidden states, there are T time steps. In our case, instead of a time step, it's basically the words, right, sequence of words because we are parsing descriptions, conversations and so on. So there are just three simple equations here. The weights and biases are randomly initialized, obviously, for the attention model with back propagation, it improves. For each time step or each sequence of the hidden state, you apply a nonlinearity after this regular neuron equation, right, WX plus B. Do a soft max to squeeze it between zero to one and these alpha values kind of say which hidden state is more important so that the model can attend to those words knowing that if these words are occurring, this is the outcome. That's it. And other models under development, we are definitely focusing on BERT right now. There are some nice results we have got, but it's still in experimental stage, hence we are not sharing it yet, but there seems to be some promise with the transformer-based architectures. And hands-on tutorial, we'll go through the code briefly before I move on to the final results which we obtained. So this code is available in our GitHub. All our code which we work on is even open sourced. You can even check that out later or feel free to reach out to us. We are even looking for people who can contribute and improve it over time. So we load the necessary dependencies here around TensorFlow and text preprocessing and so on. Obviously, I'm not sharing the data here because it's huge. We are dumping it day in and day out. We load the events data from GitHub around issues, pull requests, commits and so on. All of this is text-based data. We do some text preprocessing here, trying to preprocess all the documents. And then as you can see here, our data is highly imbalanced. So this was last three years worth of data for Golang, it's much less. So we have around 22.5k potential issues, pull requests and so on, which are non-security related and only 671 which are, sorry, all of these are security related. 22.5k are non-CV related and 671 are CV or vulnerability related. So huge class imbalance. What we do is we do a regular train test played and we take in our pre-trained word embeddings by loading, first we create the word sequences with a regular tokenizer, create a length of 1000 sequences, pad it as needed, like you do in regular text processing for deep learning. And then what we do is we load our pre-trained embeddings here. We focus on the fast text, paragraph and glove-based embeddings. And what we do here is we, I think here we used fast text and paragraph. So we load these 300 dimension, two million pre-trained vectors, right? And we get the vectors for our words and we average the embeddings. So we fill our embedding layer with these pre-trained embeddings. This kind of code is anyway available even online, like you don't even need to refer to this if you have used pre-trained embeddings. So just populate our embedding matrix with these pre-trained weights. And like I said, instead of using just the last hidden state from the GRU layer, you focus on putting all the hidden states to the attention model. So this is the attention layer and just briefly covering this. Eij is basically for each hidden step here. You are computing the Wx plus B. Alpha is the softmax, but before the softmax, you do a tanh as you can see here, right? So those three equations is all that is happening here. And then you do a tanx by taking an exponent and then dividing it by sum of all the exponents. And hence you get the context factor after that. So just simple math at the end of the day and using this attention layer, plug it into the bidirectional GRU. So two layer stacked GRU, plug it to the attention layer, regular dense layers, model is done, right? So using this model, we kind of build it on 75% data and tested on 25% test data. So as you can see here, this is great, right? 99% precision we call F1 score, but that's not really of our interest. Our interest is this because we want to catch all the vulnerabilities. So as you can see here, out of around 158 potential vulnerabilities, we were able to predict around 109 of them. So got a recall of close to 70%. So that is a nice baseline which we got. Obviously we tried other models, machine learning didn't work at all. And this is basically our model architecture, right? Which we are currently using day in and day out. And briefly to wrap it up, current results, data set was focused on GitHub events data for last three years, our input data on which our models were trained and evaluated. As you saw, it was highly imbalanced, only 650 plus vulnerabilities. And these were our results, as you can see 70% recall of probable vulnerabilities identified. So this kind of brings me to the end of our use case. And the next steps like I said, working on transformer models, doing weekly scans, improving our models and definitely engaging with the community maybe if you can provide us with better models and improve it for everyone. So now we'll move on to the second use case. Okay, thanks for that. So the second use case that I'm going to talk about as the partner mentioned is previous use case around security, we are trying more on the high-risk and radar products. So this use case that we targeted the more public facing use case, I would say. And definitely the more notorious use case in its potential to kind of change the world and the way we can do that work. So this is the level productivity of the product of the informant. So I'll briefly explain the use case over here. So if you remember the opening slides of this particular presentation, you would remember that we talked about how developers at this point of time nobody got everything but themselves, right? So then how do, what does the developer work through technically look like right now? So a developer who's writing, we'll just minimize it, just go to that one. That's okay, just click on this one. Yeah, so a developer who's writing any kind of application right now, the way a typical development process starts is you have some particular objective in mind. And these are not particular objective, some dependency that you have built. So let's say you're building a machine and you already know that, okay, I have to do the recommendation system. Densis flow is going on, you're using Psy to do it, there's another thing I would be using, Psy can use for all kinds of stuff because a couple of these components depend on this particular thing. So then our idea is that, okay, now the next thing that comes in this process is that the developer typically wants to add and work in just their application, right? So at the end of the day, maybe the next thing that you want to do is you want to plot your charts. You want to plot your charts, the data in your chart should look up. So these are two okay requirements. So what would you typically do? You would go to Google, you would search a bunch of different things, you'll add some dependency, you will know, okay, what is you doing, should I use it, should I not use this? So that is the entire parameter right here. So we've tried to build a recommendation system that actually tries to understand your intention and also based on the stats, give you the best recommendation on which component you should be using. So in this case, the best component would be MATC and C0. All of them use that. MATC, I would say, I understood the chart value of five and C0 makes the chart look better than I did. So those are the two things here. So, initially the audience that you need to solve this problem when we are in RTO CQS. So, we are beginning with a very small set of data, I would say. So, we started with your to do the probabilistic tracking model based in process and stuff. If you know anything about PGM, you know that everything in PGM rather inference, the posterior, every single node is recalculated. So, that means the inference times are actually higher than your trading times. And as the size of your data increases, basically the recommendation times will increase because the number of nodes in your PGM will increase, the number of procedures you have to calculate increases. So, it wasn't really working out. So, if you work with hosted services, you know there is something called an SLA and something called an SLO. So, if you take a few seconds for a single recommendation, you are needing a SLA or an SLO. So, everything lands on your hands. So, then what did you do? So, if you remember the graph that we showed you at the beginning, MDM was the one that was the fastest growing ecosystem right now with showing exponential growth compared to everybody else. So, we thought okay, let's try and give out a problem first. Now, when it happens with something of this scale, right, 700,000 dependencies, I think there's 100,000 to 200,000 dependencies uploaded on there every day. Then, you are dealing with a very large scale. So, depending on what you do with this period, there are around 650,000 packages over there. A package is nothing but a dependency force. Data back 700,000 stacks for MDM. So, this size of the data was actually both a challenge and an opportunity. In that, this was really, really rich data. All of the packages that are available on the MDM registry, they have really rich meta-tiles associated with them. So, based on those tags, you can kind of figure out what this package does. So, that's what I was thinking, and that taught us exploring different recommendation techniques. So, the one that we did for a minute, the original research of this is from Hong Kong University by a client, Jovain Lee. We link to that in the resources, and you can read it further. But, of course, the model related to R, if you want it, which is what I'm talking about. So, the biggest change that you have to do was that, how do we actually use this for our use, right? So, this is exactly what you're seeing right here. So, I'm going to go with the system itself. So, the system is actually a combination of a variation auto-enforter with a probabilistic matrix factorization element to it. So, if you talk about probabilistic matrix factorization or any kind of matrix factorization in general, it's a purely collaborative approach. A collaborative approach means you're not taking any kind of ideal similarity into consideration. So, when you're not taking any ideal similarity into consideration, what is the point of having all of that differentiator coming in from MPM? So, for a high-precision approach, what this model is thinking we're doing is that, we build our capillary out of all of the tags that we have got from the MPM registry, and any dependency that we get, we have a way to encode it using that capillary to form this expectation. So, in this x-sector, we actually get the latent space and the dates for this. Using those latent space and the dates, we actually train our probabilistic, no, sorry, probabilistic matrix factorization model. Once we go through the code, this will be much more clear. Right, so, the x-sector representation are not an assumption. I'll do this model assumption. So, what is that assumption about? So, let's say it's a dependency, right? And it depends on number i and let's say tensor p. So, the assumption was that any dependency from the registry today for any kind of ecosystem, it will do either the things that it itself advertises that it does or maybe some additional functionality that would come in by way of its own dependence. So, let's say there's tensor flow and also a numpy algorithm dependent. So, tensor flow doesn't explicitly, let's say, appetite is added also to the linear algebra. But of course, if it's dependent on numpy, this kind of problem is that they are generally exposed. So, linear algebra would also be one of the things that would be exposed to tensor flow. So, based on that, and I'll look at it, if we form a representation, that's not technically a one-on representation. It's an encoding that we did with the algorithm to get into our recommended system to actually reprain the neural network that you are using. And now the next thing in this step was that, okay, what do you consider a user? So, if I'm a user who uses tensor flow to build a learning application, if I'm building front-end application tomorrow, tensor flow isn't building the correct algorithm. It is important. So, what we actually had to do was, every single stack that we get into our platform today we consider that a different user. Because every stack is unique in terms of it being ADQ. So, before we go to the architecture and how we kind of run and process, this kind of gives you a complete summary of thinking behind this project. So, we have the topics and the keywords from MDM days coming in, the tags, the bridge metadata that I'm starting with. The rest of the particles first, the items that need to be tagged. So, the items that actually need to be evaluated. You do the encoding of the tags and you feed that to the variation of the encoder to get the intermediate representation. So, of course, once you train the variation of the encoder if you are aware, you get a mean and variance parameter based on which you have to draw from a distribution. That distribution, once you draw from it, you actually get the intermediate representation. You use that along with the methods where we're getting into actually training for a basic method factorization model. This, in turn, makes it from a pure collaborative approach to a more hybrid approach, where the items in RIT are considered as well as the user-active data. And, of course, when you train model, it is very useful. So, I'll very quickly go over the training architecture for this thing. So, if you talk about the training architecture, we use a combination of Amazon Web Services and OpenShift, of course, to do the training. So, in terms of the Amazon Web Services component, we use the SCF course to store our data. We use Amazon's elastic map reduce because we need a lot of compute in the market at certain times. And, basically, what happens is that all of the data we have, I'll come to the system, complete platform, marketing, PRTA, but all of the data that lands on the S3 is loaded to AMR. We go through this step, which is, of course, training the variation of important, training the collaborative method factorization piece, and, of course, it gets stored back to EBS which is connected to the AMR test, and we load it back to S3. And, once it's in S3, then it's a native OpenShift, which is, of course, Kubernetes-contained as on the online app. So, those are the scaling bits of this presentation, and those also we'll talk about later. So, the user is actually the power of Kubernetes that is stored back. So, this is one of the sample recommendations that we talk about before we talk about devaluation, of course. So, another big JavaScript app Express is a very popular JavaScript framework, a means tag I think it's called. Mongoose is supposed to be an adapter that ready with MongoDB and stuff. So, just a scientific recommendation, MongoDB is coming out as one of the recommendations. And, of course, the actual evaluation that we do. So, we go through 29,000 different things when we're in the actual experiment, and around 50,000 stack. So, 50,000 stacks is 50,000 different users for us, 40,000 different user items, and 29,000 items to record. And, of course, in the case of recommendation, the recall is one of the best methods to calculate. So, number of items that you're recommending correctly. So, we did the recall calculation at 350. And, to set up a baseline for this particular work, we used the baseline that's from the original research as well, which is to actually train the order of order and start the order of order pattern, which is also the training step for our particular model. And, again, of course, the previous attempts using the PGMs and everything, what kind of recall to get. So, on those methods, the recall was 0.5, which I'm going to have to say. Let's pretend here. Of course, I've run it down, but that's, of course, where you would not go. And, the recall at 350, that's also 0.51, which is a 51% rate. This is good. Our main concerns, of course, are related to our SLA as it goes, right? So, if you can come down from these seconds for every recommendation to 300 milliseconds for every recommendation, you are in a good place, and you are looking at yourselves. And, of course, because of the different aspects of this and the major factor in the aspects of this, we recommend taking time to really increase the size of the data, which actually enables us to build really great patterns for this data. At the very time, we of course, because we use Amazon, we have access to all of the data and the data and the technology. So, it's only about two hours. And, if you go to instances, there's a lag in the P&R process. Now, it's a much faster process. Let's take a look at the model. Right. I hope the control results look great. No? So, of course, let's explore our data a bit first before doing anything with it. So, to do this, for today, there will be a load, something called NPM feed watcher. So, NPM earlier used to have this policy-based relation, but they moved to this feed system, so you actually have to collect it here, and it's really big data set, so, of course, it's not shared. And, if you look over here, okay, one of the things we had to do was, of course, if you know anything about matrix factorization, it's any way of, like, pass matrix, and even then, you should control it. So, anyway, when we had legend type track or anything, we threw that out because, in a vocabulary of, let's say, 29,000, anything that is only one value set to one is not really going to go down. So, we kind of went up, we went down with some dependencies, and then what we did is that earlier I showed you, right, that our modeling option was that packets advertise function, and also the function of its dependency is what we considered. So, here what we did is basically we changed which are all the dependencies where all of its dependencies also are like the function. So, of course, we need the data to be named, and based on that, we had a column that tells you whether that's so or not, and we considered the ones where all of the dependencies have, or rather the dependencies of the dependency we're dealing with are advertised as function. And, like you can see over here, so we have the dependencies and all their tags over here. So, when you see these tables over here, right, so you know what the dependencies name is, and you also know what other thing it depends on, what the advertiser do. And, of course, the next step is to actually create that inspector over here, so we create our vocabulary. The vocabulary side is, like I said, $29,000 out of the dependencies for that. From that vocabulary, the next step is of course creating the mapping. That will help us create our data into a network. So, the executor I showed you at that time. And then, of course, the same way time we move to the training pages. So, in the training pages, yeah, before that we do some more processing where we've created the dependencies, of course. And then we move to the first page when we do the training of this model, which is the tagging of the pattern. So, I don't think I have time to talk about the source code of that, but basically inside this module we've provided this A-level thing out of library for anybody who wants to use it. It has variation auto code embedded in it and our system coded in it. So, the pre-training of this variation auto code is where it's always done in a pattern where you do pre-training, followed by a pandering of the latent layers, and then you train the entire thing. So, I just showed that briefly. And, of course, this is just one of such layers. So, this is A-level pre-training, and then you proceed to training the entire thing and then we proceed. And the training of this, of course, you load from the pre-training module itself, and from there you're going to train both the unit of factorization bits and the unit of the variation of the unit of bits. So, here, I think you can briefly see this as well. So, for the initialization of the unit of factorization, that's the standard version of it. If I show you where you can see the run function over here, sorry, the bit function over here which calls the run functions. The very first thing that we actually do is train layer-wise. So, every single layer is trained, all of that, and the other train it literally. Then we train the latent layer of this auto encoder. The layer which is supposed to give you the mean and the varying speeds. And then we train the end product. And similar to this, if you look at the actual model itself which follows from the pre-training model, of course the way we have pre-trained them, that unit of factorization is something we use to get our initial transformations. So, once you get the transformations, the transformations to this unit of factor, you feed them to to the, of course, the probabilistic matrix factorization training. So, the way we have trained, I mean, I'm sure a lot of you might be aware that already you calculated mean, like, you've done everything and you start with the normal prior and then you can read the procedure, like you do for any problem. So, that's about that. And the last thing that I would touch upon is how we actually Yeah, so the next steps are the kind of work in the switcher. So, definitely we are working on a re-training pipeline like I talked about, an automated re-training pipeline. Also, an interesting thing over here is that not a lot of people give us explicit feedback. So, to collect explicit feedback, what we're actually starting to do is, when people go inside the industry, you go right, if they act on our recommendations, those recommendations will start getting added to their stocks. So, if they are adding them to their stocks, we can start monitoring their stocks and actually figure out if they are acting on our recommendations. So, that way we are also getting implicit feedback on top of our explicit feedback. So, other reports that we've broken. So, not everywhere in the case that we get a lot of rich data and data. So, maybe it was one such ecosystem and of course, father was one such ecosystem. And then we use a very pure collaborative approach that's called high-intensity factorization and we did other experiments in building the external cost-effect department. So, that is that. Now, let's talk about the more interesting bits, I guess, which is the scaling and the platform architecture of this. So, so, of course, this is the actual, the 5G server platform. It's through ES4 extension. Being retired completely and open source, you can contribute, you can use it whatever, you can order it from over there. But, basically, we do this thing called the start report. And this is a very interesting architecture of the platform that we made the start report. So, this is more of the client-facing architecture, but all of our components are going to be included. So, we have our recommendation model service for the license chart, the CD that he was talking about in the main. All of them are integrated into the barcode that deals with the APS server which is barcode SAS that we call this thing. It has a traffic control for us. It is a pro client. All of this is done under one nine yards and the rest of these are for AWS. So, all of our assigns are on AWS wherever we need a lot of compute. Our central, I would say lifeline is a ground-reader server that we have. So, we use the dynamic in the background with a part-time DevOps chart on the top of it which is, yeah, this is a brand name of the part-time DevOps chart. The plugin itself is called DMS chart. So, the function is work which is currently being used in this thing. It is being integrated as one of the many keys that we actually collect in the platform. So, all of the external CVs we need such as NVD are not entirely exposed to everybody else but this work, the probable CVs as well, push our time into this system. Technologies that we use for first scaling of AWS. OpenShift, so if you are aware of Kubernetes, Kubernetes is more of a, I would say, platform. Red Hat OpenShift is the hard distribution of Kubernetes which is because they are on the idea of an application.com built around Kubernetes. So, it gives you all the many of Kubernetes with some more user, I would say, tolerable interface and everything. It is easier to use stuff over there. And, of course, containers. So, containers are something most people popularly know as Docker but Docker is not the only kind of container in this world right now. There is this entire open container initiative and there is a set specification. There are other tools as well and stuff which you can actually use to build containers. So, inside Kubernetes and OpenShift right now, P.O. is the, I would say, a container of choice. So, of course, Kubernetes, the advantages that it gives us is the single-store balancing, it's an application for storage and I guess I have some time left. So, the more interesting thing I can show you over here is the after 10 grids that we need to write to get this through. So, I'm sure many of you must have already seen a Docker file at some point. But, if you're not, this is kind of what you use to build something more than any. So, this might be looking quite complex but it's really nothing. It's a bunch of 12 months that you do that's attached to something called Docker Directives. All of these together make what's called an image. So, an image is basically a container or an image. So, image is a lot of containers but not to be pedantic about that. An image is basically a completely reproducible copy of everything that you package into that container. And once you have a container, the question becomes how do you actually run that container? So, that is where Kubernetes is something to picture. So, everything you want to contain it, that is what Kubernetes helps you with. So, this is a method that actually requires quite a bit of background to write but just to go over it briefly. So, in our case, what we wanted was multi-compact data. So, you can either create a core config or a deployment config, what you created is something called a deployment config. Deployment config actually has something called a directly bus. So, you can tell Kubernetes or OpenShift in this case that, okay, I always want to find where to run that container. So, yeah, that is one of the aspects and you include that along with that out of the other details of how to actually run this container. So, basically, that you create something called config apps, you create secrets, there is a whole bunch of things. And otherwise, yeah, so that is that about Kubernetes. So, there is a lot of theory around this but it's basically one of the ways to actually access the Kubernetes radius. So, you can access the API directly through HTTP device. You can, I think they are also just an API available for this. But if you guys are messaging your company, this YAML model is technically able to use and it's the one that you should be using because it would enable you as developers to actually deploy your own model onto OpenShift or Kubernetes or whatever your company is using in this case. And it's a very difficult thing to set up. But once the setup is done, it's very easy to manage your model's capabilities. So, that's where the good parts are. So, something like a scenario where you have a very heavy load. You can just scale down the replicas. You have very less load. You can scale down the replicas. There's an auto scale that can take care of the scale and point you. There's no balance in everything. So, that is the very interesting work that we do around deployment. And coming back to the presentation for a bit. Yeah, so all of these services that you see that they all have these kind of templates for them. All of our models that we run on OpenShift, they have these kind of templates for them. So, this entire architecture that you see it's actually has the possibility to scale itself under load. So, every time we see a random traffic coming in over here, all of this has the capability to start scaling as long as you just came in the necessary infrastructure which is caused with AWS. It's something you can get on the ground. And the digital and digital jobs are something that we kindly started. So, we're doing this project where we're actually automating the training of all our models. Upon this model, once it's ready, it's going to be part of that. Yeah, and I guess that's it for the talk. And if you guys like it, you guys want to contribute, we are going to look for building a community around that thing. And of course, as a red hat, we are always open to external contributions and stuff. And also, the ideas like you guys are building this machine learning or deep learning models. You can potentially, let's say once you build a web service, you use something like Docker to dockerize it so that you can deploy it anywhere. You don't have to worry about the OS, the dependencies and all that. And you can use Kubernetes where you can scale up your applications also. So, these are some things to keep in mind as you guys, let's say, want to work on bigger datasets and serve more and more customers and all that.