 I've been building machine learning systems for the last three to four years. I was part of the startup called AltaCloud. I was building AI and machine learning for them, which got acquired like one and a half year ago. And after that I was lead machine learning engineer at the Canadian company for news aggregation. And right now I'm working on my startup and I'm here to share with you some of my insights that I've learned over time while building machine learning systems. And just to show you the level of sophistication those systems can get and to show you that the train is only a smaller portion of those systems. So what this talk is not about, it's not about statistics, it's not about gradient, it's not about how to train machine learning models. It's not about all these great methods of training all sorts of very useful neural networks over data. But instead it's about steps and challenges for building AI with all this great technology for that end up turning into your recommendation systems, your robots. So essentially we're going to talk about how to get the algorithm, the training and then understand what happens before the training and what happens after the training. So this is not what we're going to talk about. This is the topic of today's seminar. And why do we not need to talk about this? Because right now the system is happening in a way and I'm pretty sure that's what only they're going to talk about. So there's a lot of people focused on this, so that's why I'm interested in these two and there's a lot of people who are doing this. And there's not a lot of people who talk about pre-processing data, understanding the data and then again turning your models into a useful set of systems that you can deliver your machine learning with. So yeah, without further ado, this is the AI diagram. You pre-process your data, you train your model and then you deploy your model. That's like the simple one. This is how everyone starts thinking about how these systems work. Yeah, it's pretty straightforward, doesn't look too complicated. But actually some of the systems look like this. Just before moving forward, I just wanted to have a remark of the format of how I'm going to present these two parts. So a lot of this is going to be like these types of boxes which I'm going to describe on a certain level of high level and just show what they're supposed to do. And if I would go another level deep, that would end up being about four hours or more conversation with you. So that's why I'm going to stick with that. But anyone who is interested in any part of these boxes that I'm going to explore with you, we're more than willing to go deeper. So the goal of this is just to show that it's a lot more sophisticated than it looks like from when you read the newspaper or from when you read your regular YouTube machine learning video of how to train. So yeah, is that fine if that's kind of the rule for this game for today? Yeah, so this is like a general overview of what I'm going to show you today. And then let's get into it. So why does it get so complicated? Why it's not just like those three boxes and then why don't we have, you know, why isn't the life that simple? It's because in the last, I don't know, five years or maybe more than seven years, the way we build software has dramatically changed. Like, we don't do any more monoliths. We now collect data. So the software, the way it used to be, you know, RDBMS hooking up to your .NET server or, you know, something like with Java faces and stuff like that. So we now build microservices which allows you to, you know, add separation of concerns and stuff like that. It allows you to, you know, build like, you know, modular systems that you can plug in and out. We use things like Kubernetes which allows you to schedule, you know, containers to run simultaneously, you know, in a cloud environment or on your local systems. So the Kubernetes along with Docker and like cloud systems, they allow you to, you know, build like a pretty much world scalable, you know, software at this time in the, you know, at this day and age. And obviously there are like a bunch of different types of applications and this is only the surface of it. I haven't even depicted with the robots all sorts of hardware-based applications which get to work with the internet. So essentially anything that gets to work with the internet is going to interfere with any one of these, right? And by saying AWS, it doesn't necessarily mean only AWS. It includes Google's cloud. It includes every other cloud. This is just having a nice, mighty logo so I have to put it in. So yeah, and this is kind of an overview of what we get to deal with when we build the like internet software nowadays. So let's understand the data engineering and pre-processing with all that in mind. Essentially if we simplify, kind of try to, you know, take the juices out of all of this, and then we have our applications, just to show you. This is your applications, right? And then we have our other services which could be any part, any microservice in your architecture or anything else that is in your set that, you know, has the ability to do like requests, response and stuff like that, right? So those are pretty much the two, you know, entities that get to generate data. And that data could be of any sorts, right? User data, it could be images. Like a couple of my friends on the Internet, they get like, for instance, you know, aerial images of, you know, fields that, you know, that they take, you know, that the airplanes take a picture of, you know, that they can, you know, then process. So any source of data, you know, we got to understand this as the S side. So, and this is a general pipeline and we're going to go kind of deeper with this in this. So essentially you, you know, I'll talk a bit about storage, about the data lakes, you know, retrieve data, like for training, like pre-processing, feature vectors, and then data set. And then we will see, like, how even this, you know, straightforward looking thing can become sophisticated. So, let's start from the beginning, the data engineering and pre-processing what does it mean to have, you know, storage data lake. How many of you have, is familiar with, you know, this term, data lake storage? So, it's a, it's a, essentially data lake is a mechanism. It's kind of a name for a sophisticated or like a distributed file system that allows you to read and write pretty fast and allows you to kind of infinitely scale some of your files that you get to store into that file system. Usually on the cloud or somewhere in your, like, own data center if you're courageous enough to build one. Or, like, if you're building such a system that needs to have your own, then yeah, right. This is kind of how it's good to think about the storage and data lake. So why, why do we need this? We need this because we have, like, a lot of, you know, different types of data that comes on our way when we build these systems and grant it that we are successful with many people using our systems. And we need, like, we need a sustainable way of storing the data, such as, you know, storing the history and, you know, terabytes and petabytes of data. And, and, like, storing those data into, you know, relational databases or some kind of, you know, structured databases sometimes get to be not as efficient as just dumping them into some type of storage and then figuring out what to do afterwards with them. Right? So this is, this is pretty much this captures that idea, essentially. So, so, okay, now that we have a data lake, so what do we do with this, right? Because our initial goal was to train a model and, like, see how we can use it. So then we build systems that get to retrieve the data. And what does it mean retrieve the data? The best, the best example of such a system is essentially, it's obviously the Apache Spark or, like, Dusk from, which is a Python-based, you know, data processing, you know, tool that allows you to parallel, parallel, you know, do offline, you know, offline, you know, processing of data, right? So, and depending on the, on the format you store, for instance, parquet, which is a distributed zip file, that allows you to store, you know, tabular data. So, there are efficient ways of, you know, retrieving your data from the, from the storage. So, and then this chunk of the system gets to do that. And then we will be building over this. So, the next system after this is essentially the processing steps, right? So, we do retrieve the data and what we got to process it. So, what does it mean processing the data? The data we store here is not, you know, the best shape usually because you got to pretty much throw data at your, you know, file systems. And then, you know, afterwards this is here and essentially these two blocks is where you get to, you know, bring them into a shape that you can really understand and start to, you know, reckon, reckon about it. So, for, so after you retrieve the data from, from the lake with these two blocks where you get to, you know, process it and bring it into a good shape. Now you are starting to think about machine learning. Now you're, so what's, what is it that we feed to our machine learning algorithms? It is the feature vectors, right? And here is an example. So, if you have the user data, for instance, you have like a terabytes of user data that you have had saved. This process would look like you have some jobs that get to retrieve the data and then maybe save it back into here with a bit of a pre-processed manner. And then another, another step that gets to take that and then, for instance, clean it up and make it better, like, tabular or like the format you prefer. For instance, you know, because I'm not there, with such data that you save here it could be, you know, historical, you know, historically, you know, backwards incompatibilities and stuff like that. So you need, you need certain pre-processing before you even think about your feature vectors. So, and then what the feature vectors are. Feature vectors are, like, you know, various different representation techniques that involve machine learning, you know, also, for instance, like board vectors is essentially you train them, right? You train an inter-representation, you know, of your textual data. Or, like, you know, you can have, you know, auto-encoders and stuff like that. So, or, like, even, you know, simple thing as one of the encoding. For instance, if you have a data that, you know, there's, like, this grid, you know, has, you know, for instance, you know, it can have, like, only three to four values, you know, have a field in your user object that can have only a few values. Then, probably, it's not a bad idea to think about one whole encoding. So essentially, this layer of your system is get to deal with, you know, building those features. Yeah, but, so what do we do with those features afterwards, right? So one way is to go and then just keep building the data set if I go, if I, if I go back to my initial diagrams, right? So I showed you this, you know, preprocessing and then feature vectors, right? And then immediately after that, the data set. But, actually, in a, in certain circumstances when, when you have a lot of people who need these feature vectors, you've got to save them somewhere so that people can access it when they need to. Here's another remark about all of this. So, a lot of these systems, they are not necessary to build, right? It will be like a tough call to say, hey, we got to do everything over here to be able to do machine learning or like properly do it. But, so my goal here is to show you to the extent to which, like, bigger companies go to, right? So, for instance, like companies like Airbnb, companies like, you know, Netflix or Facebook, they have such, such, such, you know, databases that store the feature vectors, right? Which means, like, they have a, they have a, if you have a user object, right, that comes through into your system, that user object ends up being, you know, represented in a number of different ways and gets stored as, you know, as part of the, you know, as part of your feature vector bound. And I'll show you why you need this, like, a bit later. So any of the system is not super necessary to have, but you need to, this is more like to show you the steps so that everyone was building these types of system. They can pick and choose which part they need based on their problems. So essentially this whole pipeline kind of together encompasses the data engineering side of, you know, machine learning, you know, building machine learning systems, right? Because you need to, the most important, you know, the very important part of building machine learning is to have a data in a shape that you'll be able to train your models or like even think about, you know, like which models to pick and stuff like that. So this is that part. So any questions by far? If I talk too fast, we'll let you know. I can slow it down. One question. The pipeline that we have, it depends on the types of models that we have. So we are going to dynamically change our pipeline depending on those or which part we separated on the future vectors or on somewhere else. Yeah, of course. So that's a very good question. So if you have like, I don't know, five different models that you get to train for your system, right? Then you figure out that when you, for instance, sit around with your, you know, colleagues and then you decide, you know, what type of data these models need to, in order to get trained properly, then you very soon realize that, hey, we got to have like five different pipelines. And this is what I mean when I say, like it gets sophisticated very early, right? So, for instance, one way to think about it, and I've actually implemented such system myself, is to have some kind of, you know, data store, right, where you can have, you know, five different pipelines, but save those features. Because if your goal of the pipeline is to have those features, right? You can save those features agnostic of your models, right? And think of it that way. And then when you're building those models, you can pick for each of the model, you know, depending on the feature. And then that will also give you the freedom of, in the future, when you have a model, where you need to, you know, concatenate some features and stuff like that, then you can, you know, use this to efficiently, you know, do that. So you do not limit yourself by just those five algorithms. But that's a good enough incentive to already start to keep about building a storage like this. So, okay, why do we need that? So why do we need that? As we already discussed, and that question that you kind of be going over this already is, you need this feature vector bank to be able to build the data set for your training. So what does it mean? This is a place where it contains, you know, for instance, for each user, it's going to have several different representations and features for each of your entities. It's going to have several different representations. For instance, in companies like Facebook, there are teams dedicated to just building this. So there are like people who got hired to just build features out of their, out of their, you know, data, right? So, and then the scientists actually get to pick and, you know, choose those features to build their data set for which they're going to train. Of course, they also pick the algorithms for it. So, yeah. And then obviously after they said, you get to train your model, hopefully. So this is the part, again, we're not talking about. So, yeah, but actually, it's not really the full cycle because at the end of the day, you've got to evaluate your model in a way and then, you know, go back to, you know, training. So there is a bit of a loop here. So this is where you get to explore, experiment and evaluate, you know, the stuff you're dealing with. This is pretty much where the data scientists lives, right? And they get to, they get to, you know, have, you know, stare, you know, interfaces over, you know, Python notebooks and iPython notebooks or like with the Hatchi Spark, it would be, you know, zappling or like data briefs and stuff like that, right? So they can experiment. They don't need to, you know, they can do whatever they want because at this stage, you have already like safe space and then your researchers, they don't need to know what led you, you know, to have this future vector bank. So, yeah, this is about that. And I'd say this is, if you follow this practice, it's a pretty, pretty efficient to separate like the pure researchers from more of the engineering type people because then they get to, you know, they get to, you know, focus on what they're really good at. Instead of, you know, instead of, you know, getting their head into, you know, stuff that, you know, it will take longer than like an engineer. So, yeah, essentially this is the, this is the kind of the data processing pipeline altogether, right? So it starts from, you get your data here. By the way, I'm not even talking about these two arrows because it's another, there's a whole different discussion here, right? These two arrows, like how the data gets into your database. You know, there are things like Apache Kafka, a bunch of these streaming services. Yeah, that's like a totally different, you know, maybe a topic of, you know, lecture. So, yeah, but let's assume it happens. So you get your data and then you essentially, the data propagates down to a feature bank and ended up training some models for it. So, yeah, obviously if one, you know, I think all the sides of the, you know, the company or the product, you get to, you can very easily, you know, omit any of those steps and like how fast you need to get stuff done. Yeah, but this is like to the extent to which it can go. So, yeah, and then now we train the model and after the model we have, we need to understand so how do we, you know, get to bring the model to the reality, right? So, because training the model is only, you know, that part, after you have the model, what do you do with it? Right? So, who in this room gets to train models? Who gets to, you know, deliver them in some way, shape or form? So, okay, so I'm going to go through like some kind of a, right, so I'm going to go through some kind of a pipeline and then in the end I will ask how is it close to the way you do that and then we can, you know, maybe generate some discussion already, you know, to get more interactive. So, yeah, so at this stage we already have to train model, right, which is presumably some kind of file, like Finkle, whatever, right, like ONNX or anything that becomes like checkpoint tensor flow, right? And then we save that model somewhere, right, like for instance S3, right, or wherever we feel comfortable. So that's the first and most immediate step we do. And then after that, we've got to figure out like the environment we're going to run our model at, right? So there are several different environments, right? For instance, if you're going to run your model in the cloud you probably should consider like something like Docker, right? But then there are a bunch of other runtimes too, like NVIDIA is another runtime when you could be running, you know, like your model somewhere, you need some other hardware like Intel and ARM, you know, like Core ML from Apple actually runs a lot of the stuff on their GPUs that I have inside there. So runtimes are different. So we will probably kind of bring all the examples over Docker because I assume this is the most familiar one, right? But there are many, many different runtimes for running your models. And then this step is actually about the artifacts. Like what does it mean to deploy? It means to produce an artifact that you can then launch in your environment. If your environment is in cloud, then Docker image is the artifact. So Docker image is going to contain your model file and then maybe some kind of API on top of it that, you know, that will allow you to serve the model, right? That's one way of thinking about deploying models. There are a bunch of other ways, but let's stick to that to not complicate our world. So, yeah, this is the place where you create the deployment artifacts. Yeah, and then after that you actually got to deploy it. So what does it mean? It means take your Docker, for instance, and deploy to Kubernetes or deploy to somebody in AWS where you can access it, like where you can get to serve, you can deploy to any other container management service, right? And then you can scale them depending on... Like if you think a lot of the deep learning models, like convolutional networks, a lot of them could be thought as black boxes, right? So they put an output. So then those are like stateless systems, so kind of not too difficult to scale. There is a different level of scale. For instance, if you have a huge model, you've got to scale the model in a way that, you know, this artifact becomes too big, but that's, again, like another big discussion, right? So essentially the way you scale is to distribute it into cloud in many, you know, many different instances. And again, a Kubernetes would be one way to do it, for sure. Yeah, so essentially this wraps up the kind of our, you know, the, you know, serving system, right? This kind of essentially being... After you deploy, you get the REST API. And then in this way, you essentially end up having the system that serves your model, right? From having the model, because you got the data processed and then you got your training model and then you processed it up until you have the REST API. Which you can now serve, right? So this is that part of the system. So what do we do with the REST API? Yeah, there I... So you remember when we trained our multiple feature vectors, right? So the feature vectors come... The feature vectors are a result of preprocessing, right? And there is also another aspect to this. When you train those models, the output of the model is not always the output you want to, you know, serve to your app, right? Or other systems, you know? So one example of that would be... Like if you have a classification, for instance, you just, you know, set of... It just produces set of, you know, probabilities. Maybe you want to change that into maybe like one answer or two answer points, like top three classification and stuff like that. So that's like a very, you know, naive example of post-processing. So what I'm trying to say, anything that hits this REST API that you've deployed with your docker, like the raw data, that it needs to get preprocessed in exactly the same way that your data got preprocessed for building those feature vectors. Why? Because your machine learning model is designed to take in those feature vectors rather than like the raw data. Because, yeah, we took the raw data and we preprocessed it and now we've got to do it again every time we are going to apply our model. So, yeah, so this kind of altogether constitutes our ML services, right? So this is kind of end up being, you know, what we call, you know, hey, now our apps can get to use these services. So essentially it hits it with the raw data, right? We have deployed this service in some, you know, shape or form, maybe in the cloud. And then it gives us, you know, the output of the post-processing, right? Any questions so far? And you guys, the guys who get to deploy those models is how different your process to this. I know this is a very general one, but I'm just curious to hear from you. So one thing that it just brings to mind is the preprocessing part of this part and for the training models have to share a lot of features. So if the model changes that preprocessing changes, this one has to change. Oh, yeah. So they are very interesting. Exactly, yeah. So, yeah, those things that keep adding dimensions on how much systems you get are built to be able to serve these models, right? So, and I'm going to actually touch that after in a minute. So, yeah, let's move forward. So kind of these systems together just brings together like the whole, like, second part of the machine learning services, you know, serving the model thing right after we have trained it. So this is our system that we, you know, for instance, could deploy and serve model. But, hey, actually, we forgot this one thing, right? Like, what if we train the second model? Like, what happens when, so we train, so here we train one model and then we deploy it all over to here and then we have now machine learning services. But then we, like after three weeks, we collected a lot more data, maybe better quality and then we end up training a completely different model than that. So what do we do? We just, like, what's the, you know, sequence of actions after that? Yeah, there is a, there is, like, another subsystem is forming which is, like, kind of called model management. And so what do we do, right? We have a trained model. We save the model somewhere with some name. Hopefully that also includes a version of your model. If you have, like, three models built consecutively, you've got to probably name them appropriately so then you can, you know, tell when and how the model was built so you can also version it. That's this part, right? Model versioning, right? The model needs to reflect and why and how they're built. And then we've got to somehow lock the models. And I'm, yeah, bear with me, I'm going to kind of go a bit more extensive on, like, why we need to lock the models. And then, yeah, we've got to deploy the artifacts. So, locking the model means, yeah, okay, I'll show it after. Yeah, so now we have our new, like, 1.0. After, like, we built one system, we realized it doesn't work for some use case. Now we have the second version of our, you know, model serving and management system now that also contains the management. But hey, what happens that when we want to, when we want to understand whether, like, the model that we have built today and, like, two weeks later, which one performs better, right? And here is an example. So if your goal is to do recommendation, right? And then your, so how do you understand whether your recommendation is successful? There are a bunch of different metrics. There is, like, you know, if you have some kind of an app, you probably, I assume, have, like, monthly active users, right? So then that way you measure, you know, how many people are, you know, like the churn and, you know, like how many people are active per month. It's a very, you know, classical measuring metric. But then how do you understand, like, the recommendation system that you have built kind of correlates to your monthly active users? So essentially you can build, like, hierarchy of metrics which correlate to each other. So for instance, if you have a page where you need to serve a recommendation, right? That recommendation page can have a metric that reflects whether your recommendation is good or not. Like the very classical and the most obvious one is the clip through, right? But then based on the specifics of your recommendation, maybe what you care is, like, to get people into, you know, this rabbit hole, like I get sometimes on with YouTube, right? You click on the recommendation, then it goes one after another, right? And it happens. So maybe that's what you need to measure, like how deep they go once they get hooked with your recommendation. So there are a bunch of different metrics you can measure. The idea is that you essentially you build those metric systems that correlate with the highest level metric you want to, you know, associate with the success of your website or the app, right? So we have this example with the recommendation, but a lot of these things could apply with robots, for instance. If your robot has one or two tasks, you can always collect the data in the robot or maybe with the internet and then, you know, build those metrics. Like, for instance, I gave you an example of a robot that in the beginning that was, you know, taking out the weeds in the field, right? So it's a robot that walks through the field, you know, it takes a picture of, you know, it's surrounding maybe a couple of cameras and then, you know, it identifies the weed and then takes out, right? So, like, one measure would be like the number, you know, like a frequency and stuff like that. So there is a bunch of different metrics you can have with every, you know, such system that you get to, you know, deploy your machine learning because the machine learning systems can have two, like, evaluation metrics as far as I'm concerned. One is your rate of accuracy, the mathematical evaluation, right? Where you can tell based on this data set, my model works well, right? Because my model, you know, I have a test set, I have a validation set, I train my, you know, I train my model and it performs well on my test set and my validation set, right? But that's the mathematical one. That's the one that researchers are concerned about. But if I'm going to convince my boss that my model is doing well, I got to also convince them on the second metric, which is, like, how does it contribute to your monthly active users or any of those, you know, metrics that you care about as a business rather than as a mathematician or a researcher. So, therefore, we kind of get to care about... Oh, nice. Therefore we kind of get to care about two metrics. Any questions thus far about that? Please let me know if I'm speaking to you too fast. The only question that I might come is that these models may need different processing. So we will need different pipelines within our pipelines, right? So what do you mean by, you know, what the model is? Do you include progress? Well, yeah. I've actually imitated that part, but this has got to be assuming the description of the same exact pre-processing steps of the ones that you get to, you know, build your feature vectors, right? So if you are dealing with text, you know, you applied some word to that model, and I got your word vectors, you got to apply them here too. And this pre-processing could actually be another such system. There is a bit of recarcity here. If this pre-processing is a machine learning system, you got to probably have like a machine learning system inside the machine learning system, right? So this is kind of the reason I didn't go into that one, because I'm kind of very of the time, but maybe I'm going too fast, actually. So yeah, I omitted lots of details. Like this is a kind of simplified version. This is another nasty part, actually. Yeah, so why I was talking about all these metrics, right? So again, if we come back to our initial problem, we have a model right now, and then we had the model weeks ago, and then we got to figure out which one does better with the recommendation, right? So the classic one is to AP testing. We're like, it's easier said than done. You've got to implement the whole system that deals with it. You have one model, and then you deploy one, and then you've got to distribute them. It ends up, you know, we already seen this system before, right? So we probably could reuse it in some way. But this new machine learning system that you get to build has to be able to take several of such models deployed at the same time and serve them over some kind of load balancing that can say, hey, if you hit me, 30% of the time I'm going to serve you this, and then 70% of the time I'm going to serve you another model. And then here, there's so much hairy parts to this. This is one way of doing this, but there's a bunch of other ways. This could be no load balancing, and then it could be like you just hook this, you just redirect these APIs to ops to a certain part and then redirect some others too. So there's no one way of doing it. There's no one truth of how to build these systems. But my goal is to just show how not friendly it gets actually when you get to do that. So yeah, so we kind of did a significant change in our machine learning services. Yeah, and then we have the 2.0, right? It's a significant change, right? Because we totally changed this part. And then if you notice, we also took these deployed artifacts and then put it back into the model management. And I'll share why. Why do we need to do that? Any questions as far? Yeah, what if we realize that the model that we serve, like two of them, like with the A.V. test, we just figure out the one that we, you know, this one that we don't need, we just need this guy. So you've got to have like some kind of rollback, some kind of feedback process where, you know, your A.V. test lets know this is your model deployer. Then you just have to deploy, you know, one of them only. So there's got to be some kind of communication between these systems that, that you need only one of those models, right? Because your A.V. testing has concluded that statistically this one, like, that's better than this one, right? So yeah, but I don't think this is a very easy arrow to implement as well. Like the thing with these guys is I put everything into boxes, but each of boxes is probably another topic of its own. But my goal with this lecture was to kind of really show the complexity of these types of systems that they can get. And then, you know, opening up each one of these boxes is another kind of warmth, essentially, that you, yeah, so actually we kind of have extended our initial system that actually is sophisticated. So we have a bit more of a different system over here, right? So as I mentioned you in the beginning, so the format of this would be a niche, like going to a length of, you know, showing all sorts of these possibilities with building these blocks of systems that interact with each other in some way that then get to constitute your machine learning system along with data processing. And then only a bunch of different details so that we can have like a deeper conversation on how to implement any of these boxes, right? So that's why, and also I speak maybe too fast. So yeah, so that's why this kind of constitutes the formal part of the talk, right? And then I assume we would have, like, people who would be interested in building these things and then we can have a discussion of which part do you think is worth talking about in terms of details. I guess one would be, yeah, the one with the preprocessing, right? If you have several different types of preprocessing steps. So actually the reason with that, it depends on how you serve your model. So I have only shown one way of serving the model, right? So that's the API version of serving the model. But if you, for instance, have a system that has, like, Kafka or, like, some kind of streaming service where you need to have an, like, asynchronous, because serving model is, like, not asynchronous, not like event-based, right? You just request and respond that way. But then maybe you need to have your model as, like, for instance, like a message comes into, you know, Kafka and then it gets to, you know, change by applying model over it. And then another type of message goes down the system, right? So that's another, that's another, so that's the reason I didn't really go into the preprocessing. So, for instance, if you're building a system with, for instance, Kafka, one way would be you get, like, modules of preprocessing where the same modules you could apply both in the system where you had your data preprocessing, right? Like a library or something like that. Because you just focus on, like, one event at a time, right? So your, like, library or a system just deals with that one event, right? And then you make it as part of the code for enrolled places. So that's one way of doing it. Like, there are a bunch of problems with that too. For instance, like, managing that library would be a problem, like managing all this versioning. So it's not really uniform versioning if you do that way. So, yeah. And that's why we have this, the name of this talk challenge is... So in that case, how we sterilize the model? I mean, if we... by model we mean all the code, all the weights or even preprocessing are and also there are libraries in there just how we can sterilize all the things. Like, the easiest answer may be that we sterilize by having doctor images. But is that, like, are there any better ideas? Like, one way you can do for sure you can do, like, pipeline of doctor images and then just call that pipeline. So in terms of sterilizing the model, we have definitely a way to sterilize the model in a way that you have the weights, you have the architecture. Like, ONNX does that. So ONNX is a standard built by Microsoft, Facebook, and Amazon and a couple of other big companies together that it's a protobuf-based sterilization. So protobuf... Who is familiar with protobuf? Yeah, so protobuf is a format that you describe, like, some kind of data, like graph, right? So neural network is a type of a graph. So you could describe that graph and the weights in the protobuf and then what it does, it just generates in any language you want and they have a list of languages, like Python, C++, Golang, anything, right? Java, and it generates a source code based on your description of your data that you can serialize with. So if you have a Python code that needs to serialize certain object format in a binary format, then you have, like, the protobuf files and then you compile them into Python source code which has the serialized functions and then you just get to give the data to it, like it's a dictionary format or, like, object and then it just serializes it. So why does this matter? ONNX is a format based off this, right? And it's a description of a graph which includes operations which are implemented in C++. So if you know with neural networks there's, like, a lot of multitude of operations, right? Mathematical operations, you know, hyperbolic tangent, you know, sigmoid and everything, right? There's a big list of functions. So what ONNX does, it just has a list of functions actually implemented by hand and then being invoked as part of protobuf but underneath it's C++ implementation, right? So essentially the way they, and that kind of works over in their runtime, right? Because they invoke it. You get to serve your model as a binary described by their file and then you retrieve it and run it in your runtime. And runtime could be anything. Like NVIDIA, for instance, has a tensorRT. To tensorRT is NVIDIA's runtime. It works probably best with TensorFlow but it also compiles ONNX. So compile means it will take your ONNX file, take the graph out of it and compile it into their intermediate representation format. Too much technique. Into your intermediate representation. So it's like another representation of your graph internally inside the tensorRT and it just compiles that format into the executable that you can run in UDA, for instance. Yeah, so that's like one way you can get from your code of model up until something that executes on UDA. Same is for Intel. Intel has a thing called nGraph which kind of does the same thing, but for Intel. I think like they do for every hardware but I haven't checked it myself. Yeah, but what was the original question? So how we serialize the post-processing thing or the versions of the libraries? Yeah, so ONNX is a way to serialize the model. There is no known way of serializing post and pre-processing. So you've got to deal with it in some way. So one way you can probably have other executables that have like black box type executables that you can include as part of your model. You can save them as seedless tools like binaries or something like that or some kind of a library. So that's kind of the best. I've done something like that myself and it's actually a problem I'm making for myself. Because when we build a model with Python and with PyTorch, if we want to serialize the model if we use ONNX to serialize the model itself we have to somehow, some other way serialize the pre-processing with post-processing. So by making it easier to model the serialization the overall is even going to be easier. I can't easily serialize and run my Python code on mobile phones. So it is very strange that ONNX will provide for that. Maybe I could apply and something I'm going to try maybe in the future to essentially build those. So a lot of these pre-processing steps if there is a way to have operations predefined you could maybe do very similar to the ONNX the way they do with actual neural network operations like pre-implement them and name them and then have some format that will save it in a binary that you can call during your runtime. So that's one idea. Other idea would be to ask you to write code in a certain way. So if I ask you to write code in a certain way maybe I can get the chance of your code and then convert them into other binaries. But the thing is that if we use ONNX, ONNX is static graph itself we can simply use say PyTorch code to compile the code to ONNX code. PyTorch does compile with a bit of brouhaha with the tracing and just in time compilation. So what PyTorch does so if you give PyTorch your object or your neural network module something that this sends module what it will do it will take that module and turn it into intermediate representation this is the kind of PyTorch 1.0 feature so just in time compilation so I think it's called torch.jit so it will take and turn it into your intermediate representation and then there is also a function that you can call that you can call that it will actually export an ONNX binary which you can very easily run on. So if we can serialize any arbitrary code with PyTorch JIT compiler why can't we serialize any Python code? I don't think it does arbitrary because it does with tracing it's like a regular problem in compiler theory or something like that which is if you have a code that you need to find some kind of binary representation the way they do is you give it an input and it records all the operations that your input costs whatever function you have to execute while you gave it your input so it's the tracing so essentially the thing with PyTorch is that if your model has some other passes based on data I don't know I don't know if it's a common practice to build such models but what if so if your model has other passes based on some features of your data then it will only detect the part of your model that your input actually triggered while you did that so that's the look with PyTorch thing but I think it's some limitation it's a very interesting problem to solve actually to figure out how to and I think the reason with that is that your code is not static like if you gave a static tensor flow graph then that would be a different type of problem than you give like a class that is an object of a class that is descended from nn.module so then there could be any like if statements inside which could you know screw the things up so that's kind of the reason but maybe there is a way to it's a problem that's very interesting I hope that all of these very big questions of ours don't worry about too much because I promised it wouldn't be too technical but it got pretty soon pretty quickly so any other questions? One question is the communication between data science and the data engineer in small companies both of them are the same team of 100 people but there are parts that I'm not sure like how it should be like the feature banks that we talked about could be like for example a new model requires different types of features like different representation so the request goes to the data engineering and they have to provide or because it becomes a really different scenario sometimes those feature engineering for the future bank is a model itself for example so then it becomes the society's job to provide that so it becomes really unclear it's a good question and I can give you an example of that so I was talking to a friend of mine who works at Facebook and so he told me like I asked the exact same questions they have people who get to build those feature vector banks and he works I think on the news feed so I think that those people who build those feature vectors they build so many of those I think the problem is the other way like the data scientists to figure out which one they can use and if it's a small company you could have that problem like a lot of these boxes that I showed they're not mandatory to build it just depending on the size and what they're trying to get done so a lot of them could be squashed into one then you could have just one system everybody works with that until you get big and then get successful and then you get to become a bit bigger and more sophisticated so if you have like two guys who are both data scientists and data engineers probably you gotta talk to each other it's a different type of organization when it comes to that so you gotta also have a good relationship with each other they do what you ask ok thank you