 Hello everyone, thanks for joining. My name is Eduardo and today I will talk about a subject that's very dear to my heart, which is an alops, and why DevOps is lacking in the machine learning world. My name is Eduardo Bonet, I'm Brazilian, I've been living in the Netherlands for a few years already, six years, and I am a staff incubation engineer at GitLab. I'll explain why it is later, but I've been in GitLab for two years right now. I know a quick overview of what I'm going to talk about. First, just give the names to things, I'm going to talk about why this is an alops, then I'll try to start the conversation about some examples of where DevOps solutions or platform teams have been failing data scientists, and then explain a bit on why there is this difference and how we at GitLab are tackling this issue. Finally, a quick word about LLMops, which is the new kid in the playground. What is LLMops, and how is it different from DevOps? DevOps is not a technology, DevOps is a culture, is a set of processes or techniques or ways that you use to improve the quality of your deliverables, the quality of the software that you put out to the users. So it's not only about coding, but it starts all on the agile part and how you create that software, how you plan that software, deploy, monitor, so on and so forth. And LLMops is pretty much the same in its idea. It's about creating together software that includes some parts that are powered by machine learning features. So that is the only difference on the definition between DevOps and LLMops. So LLMops is not just after something is created, it's the whole process of creating, planning, packaging, deploying, verifying, securing the whole loop, which I want to talk about. So I was a data scientist before joining GitLab, I was in machine learning before engineering, before joining GitLab, so I spent a few years working on this realm, and there was always a discussion or some misunderstandings when we had to deal with the platform team and what the toolings that we would need. I'm going to share some examples of what happened in the past. I was working for this largest company, we were about 300 data scientists, and to configure our data pipelines we had something called Uzi, which is the Hadoop pipeline orchestrator. And it wasn't good in itself, there was nothing special about it, but we had this workflow that would find existing code, copy, change, to create our new pipelines, and it worked perfectly well for us. At some point it was decided by a platform team that they shouldn't support this anymore and they would provide a different solution, which was something that would rely on, would be built upon on their platform for microservices, and each pipeline would be a microservice that would be owned by a team, which sound pretty on paper, and for software engineers it was great, but for data scientists that didn't solve any of their problems, like they had no problems with the tool, and it just added so much into their workflow that this happened four years ago, and they're still using Uzi, even though there was a lot of investment to change, because no one's problem, they didn't solve, the platform team tried to solve their problem and not the data scientist problem. Another issue that I faced a lot as well was data scientists use Jupyter notebooks for their flow, for those that don't know, it's kind of a file where you code and then you see the result of the code and you can iterate over this code over there, and everything is saved into the notebook itself. Which, when you tell this to software engineer, they go crazy because you're putting output and input together and you're trying to push this to Git, it sounds like blasphemy, to have output on Git, I like to have the final output of what your coding intricate, like Git should be only for code or whatever, but what happened is that Jupyter notebooks are important for the data science flow, it solves a problem for data scientists because we need to see the output of the graph that you generated, discussing that graph is as important as discussing the code that was created. The code is not even that important in the whole sense of what we are doing, yet over and over there's this resistance of pushing tooling that makes software engineer, brings software engineer practices into the data science platform. I even heard of, like, when recently a company released notebooks diffs, like code reviews, and I saw a tweet that said, oh, I wish you didn't have done this because that means there's one less reason against notebooks now, because they just didn't want notebooks at all, like they just want to abolish. Another one, this happened with me as well, I needed to deploy a model, but the platform team was asked for an MLOps platform and they deployed Kubeflow, which is a great platform, but it doesn't have a model registry. I came back to them, okay, but I don't have this component that I need, but it didn't take the time to understand what are the components in that tool. They just said that this is an end to end, so it should have everything and you're good to go. So if you don't have what you need, it's probably because you are not using this in the right way. And finally, this is the most, this is like a writ of passage for every data science, data scientist trying to deal with a platform team in the beginning. We need production data in the development environment. Why? Because models depend on data, and if you have only development data, like test data, you cannot build models, and you cannot test those models, so you need production data. And there's always, every time we say this to a platform team, you can see the cogs running, like why is this happening? Why am I in this conversation? So there were two things on these examples that I showed. First, why is DevOps or platform teams failing the machine learning world? First, it's failing to understand the who. The people that are involved with developing machine learning are different than the people that are involved with developing with software engineers. They have different backgrounds, different ideas. And second, failing to understand the what. Even though the goal is the same, we are trying to build better software, we are trying to deliver more value to the user. Developing software that includes machine learning is fundamentally different than developing a regular software. I'm going to show why in a second. Why, and then, now, why is different? Why is this a difference? The first that I mentioned is the who. Data scientists are not software engineers. We are still trying to develop software engineering solutions to data scientists, but they are not software engineers. They are different crowd. They have different backgrounds, so the most data scientists that I've worked with at least don't come from a computer science background. They come from philosophy, from geology, from chemistry, from econometrics. It's one of the most unique crowd, like most diverse crowd I've ever worked with, even after being a software engineer. And within those folks, code is not a craft. For software engineers, one of the final delivers is code, but for data scientists, that is not the case. For some of them, the deliverable is an analysis, so there are some data scientists which are focused on helping business make decisions. And the others work more on machine learning and product that uses data science. And for those, the final artifact is a model or anything or something like that, but the code itself, for them, is a way to get there. It's not the final. I see a lot of software engineers that have as a hobby trying to prove their code base or learning new patterns of how to code or anything. That is not the case for data scientists. You see this by this how the community interacts. The whole community kind of like flocked into a single language. Everybody kind of uses either Python and some people use this R, but there's not this spread of languages that you see. Not only that, everybody flocks into the same libraries within the same language. You have like a few dozen libraries that everybody uses, and it's not like everyone is trying to create new libraries all the time. This happens, this creation of new things, this new, new, new, new happens on the models, on the algorithms, on the new ideas, but not on the code. So they're not worried about the same things that software engineers do. They have different incentives. And they have this low tolerance when installing new tooling. Like they do want to set up and install and go through 100 hoops, 100 stack of a flow, and try to figure out how to set up that small thing. So they just want something that works for them. Thus a lot of the data scientists still use Excel for use case or Jupter or whatever. And while we don't understand, we don't capture this, while DevOps solutions, either point solutions or platforms don't understand it, the folks are different that we're developing for. We're still going to continue failing them. And the second difference is the what. Why is developing software machine learning different than software data science? When you're working with regular software, this is a growth, a very big generalization of software. But you take some input data, you pass it through your code that encodes your logic, and it outcomes some behavior, transforms the data into some behavior. This can be show something on the screen, save something to the database, whatever. But the important part is that the logic is explicit in the code. When you write software, you are writing the logic. When you code review software, you're not calling code reviewing the specific letters, words that are there. You're actually looking at the logic, trying to go through the logic to see what would become the behavior. So it's explicit. The process of input to the output is explicit in the code. For machine learning, that doesn't hold. When you're working with machine learning, you pick some training data, so a collection of data that you're looking at, and it might or might not have some logic, but you can only see the patterns of this data. So you write some code to extract these patterns. Extracting these patterns from the data will generate a model that will have its own logic in itself. And then you pass input data for the behavior. The difference here is now that the logic is implicit. You don't have that explicitly that you had before on the data. The logic here is implicit. You don't even know if the logic is correct. And this changes everything. This is just like, this fundamental difference changes everything about the software, about the development of machine learning. One, extracting those patterns can be really expensive. You can, by thinking not an hour, thinking pipelines that run for days, that they consume GPU for days, months even. The code that you write to extract pattern, it does that. It extracts pattern. But you're not even sure if the pattern is useful until you go into the production data. You have some approximations of the effectiveness of the model, but you only know if it is really doing what it's supposed to do on production data, with A.B. testing, with shadow deployment and other forms of... So you go in the dark for a long time. The learned patterns are about the data itself. So if the underlying data changes, which happens all the time, users change their preferences, they get tired of whatever TV shows they are watching and they want something different or music or whatever, they will get stale and you have to keep retraining models. So you have to deploy new versions even though the code didn't change. The code is just one aspect of the machine learning model. The data is as important as the code itself, even more so. Like I said, the environment requires production data. Why? The model is a reflection. Again, the model is a reflection of the input data. If you have... We call it a trash in, trash out. If you have trash input data, you're going to have a trash output model. If you have only test data or fake data for tests, you're going to have a model that you can even read a test. There's no point in testing that in the end. There is some ways to do this with synthetic data, but either way, to generate synthetic data, you're going to need some examples of your production data. And data also, like I said, data is as important as the code. How do you version the data? How do you version the data? This is a challenge that we've been going through forever and there's no good tooling. There are some toolings around, but they manage to be even more complicated than Git. So you can imagine you show this to a group of people that don't want to try to... Tooling that's too complex and you're going to see the adoption of this kind of thing. And another one, testing code is really hard. You can test the explicitly, but how do you run unit tests when the output is, by definition, probabilistic? Like you're trying to get the most probable thing, but 5% of the cases, like if your code has 95% accuracy, 5% of the cases will fail in general just because of the data. So you're going to have a lot of flaky tests. How do you manage those? Developing is different here, both the who and the what. And trying to cram in all the time existing solutions into this workflow is not really doing what it's supposed to do. And how are we tackling this? Like I said, I'm an incubation engineer at GitLab. An incubation engineer is the first investment of GitLab into a specific area. And my area that I am incubating into GitLab is MLOps. So I was mentioned, I was a data scientist, I was machine learning and I was a software developer that used to love GitLab. And then I became a data scientist that had not much use for it. And I want to understand why that. And enjoying GitLab to start fixing this. So first and foremost, we divided GitLab, like DevOps has gone through a few different eras or phases. You started with, you have a culture, of course, but you need tooling to implement this culture. And the first step of DevOps was a lot of different unique point solutions for specific problems. But then you spend a lot of time implementing each one of these specific solutions. And you couldn't create an ecosystem. Then it came with the standardized two chains, each one of these point solutions created APIs that could connect to each other. That allowed at least communication and some level of integration. But that integration, communication fell into the users. So there was a lot of time spent implementing glue code to get them to work together and failing at that, of course. Because, well, if one changes, then you have to keep sync and it's annoying. And then the third phase is the digital duct tape or what I call the stitchers, companies that have managed instance of different products and already create the connection between them. Already implement this for you. But there's a problem with this one as well, which is you still have, like even though it's implemented, you still have to learn n-tools to do your job. So even though it's already there for you, you still have to learn each of them, each of the UX, if each of their wordings, how anyone calls, like each of them call each thing. And MLOps is now getting at the second level. We are going through point solutions. We are now getting some standardized tool chains so that they can communicate with each other and some vendors are on the stitcher level. But we want to be different. The fourth phase of DevOps is the platform. It's not a stitcher. It still provides all the solutions, everything connected, but with the same user interface, with the same language across the whole process of development. So you don't need to learn everything. Like all different tools. You have one language across the whole team. This is one slide that I will read, because it makes sense to read this one. So GitLab is the DevOps platform. But our vision is that it will also become the MLOps platform. GitLab is a single application powered by cohesive user interface, agnostic of self-managed or SaaS deployment. It is built on a single code base with a unified data store, allows organizations to resolve inefficiency and vulnerabilities of unreliable DIY tool chain. This is where we are going at. We want to create a platform for data scientists at the same place where they can collaborate with software engineers that helps them improve their flow from planning their work to creating their models to packaging, to deploying, to monitoring across the whole stack. And this is what I mean. This is what GitLab is for the DevOps ops. We want to find on each one of these steps what is missing for data scientists. How can we improve this and don't really push what is in there? Sometimes we already have the solutions. But we are lacking and we need to know where and how to bridge this gap between what it's there and this new user type that you want to provide value for. So as I mentioned, what we want to do, a GitLab native experience. We don't want to just offer self-managed installations of other software that you still have to learn the language. We want to give a native experience from the beginning to the end. Minimum setup. Users need to do the least for everything to work. No changes to their code base as much as possible. No setup. No, I don't know, installing something and trying to create a bucket or a service to have that run. If you have access to GitLab, that should be working by default. The same language shared by the developers, by the product managers, by the SREs and the data scientists. So everybody working together in the same tool, using the same language to remove communications efficiencies. Just the fact that you can share a notebook easily with your product manager is something wild for the data scientists sometimes. And last but not the least, open source. GitLab is fully open source. Even our premium solution, our premium feature are open source. You can see what we're building. We build on the open. We are transparent on everything that we do along the way. And I said what we want to be, but I'm going to, that's not enough, right? And I want to show a little bit what I have been doing in the past few years towards that vision. The first one that I implemented was code reviews for Jupyter notebooks. As I mentioned before, it's a very weird file system. It's actually a JSON file with everything crammed in. So it has like basics for images with HTML, with markdown, with code itself. And what I did, I implemented a way to create this diffs and display along the regular GitLab diffs. So the data scientists don't need to do anything. They can still whatever Jupyter notebook they were using. They don't need to store any library, anything. It will display over there. You can now discuss on your code review, images, the code, the markdown, everything in a single place using existing GitLab tooling. This was released already two years ago, 14.5. The second one, this is my most recent release, is model experiments. When you are developing a machine learning model, I mentioned two components, code and data, but there's another one which is hyperparameters, which is a pretty word for configuration file. And since it's non-deterministic, the output of a machine learning model, we don't know what is the best of the configurations that we need. So what we do is throw a lot of, we test them across all possible configurations and find the best model. What did, what did, the impact of this is that for every change that you make, you're going to have this huge number of, of trials of candidates for models that you need to keep track of. What is the performance of which one? Which what was the, the, what was the metrics, the performance, the metrics, the parameters? What is the commit for that, that generated model? This is what model experiments do. It's, it allows you to track down, to track this metadata, save the artifact of each model directly into GitLab. It was released at 60.2. It's native to GitLab. And this is where it gets interesting. It connects to CI pipelines, MRs, package registry, but now this is what gets interesting. We could, the, the ML flow is the most common solution on the open source market. We could, and it was an idea initially to just deploy ML flow that you could access to GitLab. But that wouldn't give the experience we want to give to users. So what we did, we re-implemented this into GitLab. It's native to GitLab with compatibility to the ML flow API. You can still use your ML flow client. You can still use the code that you already have. You will connect with just a change to the environmental variables. And you already connect to GitLab. And it will save everything into GitLab as, GitLab act almost as a backend. No, almost no. It works as a backend to ML flow. But the beauty of this is now that it's really easy for us to integrate across the platform. If you run this from a CI pipeline, we will identify that. And we will show the, what, if there was an MR that triggered the creation of that model, this will show up over here. User management, ML flow doesn't provide user management. On GitLab, this is out of the box because every project has its experiments. So only users with access to that project will be able to see those experiments. And again, zero setup. If you have a GitLab installation, you don't need to do anything as a data scientist. It's already there. You can, you just start using. If you already have a model experiment, you don't even need to change your code. You just change how it's run, the environmental variable to point to GitLab and you're good to go. It will already start saving into, into ML flow, into GitLab. Again, it was really fun to build this. We had a lot of people coming in, of data scientists coming in discussing the solutions, still do every bug that someone reports. It means that somebody is using it, so it's a good thing. Every feedback is really welcome on this, on this future. Now what we're building is the model registry. It's similar to model experiments, but model experiments is more a scratch pad for models where you store a lot of trials. And the model registry is how you to, how you communicate your application, which models should be deployed for production. So it still saves everything into the GitLab package, it's like a package registry targeted for the machine learning workflow. And another thing that's very interesting here for me on how this is being approached. When I started working on this, I thought, okay, this is going to be a package registry. It's just going to have a version and that's it. The, the logic for managing what is deployed should be from the application. But that's not, that's not how data scientists do. They prefer, they want to be able to toggle which model is available directly on the screen. They want to tag each of the models, each of the versions, whether they are staging production or death. And we need to accommodate this use case. We, sometimes we can be a little opinionated, but we need to be weekly opinionated to our users. It needs to be user driven. And this is what we're going to do. This is development. If you have any ideas of how we could do a better job at this, 9-4-2-3 is the epic of, of, of where we are discussing and keeping track of, of the evolution. And a final note about LLM Ops, which is hard to say I trained for a while before this talk, LLM Ops, LLM Ops, LLM Ops. Yeah. Remember the drawing that I showed about the interaction of logic of, of machine learning? It gets worse when you're talking about, about LLMs. Worse because you still have that model, but that model receives a prompt. And the prompt plus the mod, the, the, the, the large language model becomes an app or something. And this app will receive the input data. So there's a second level of interaction here. There's a second level of, of implicitity. That's also a large, a hard word, uh, on the development. And this will bring a whole set of different issues that ML Ops has, has not covering so far. The L, like this large language module hardly will be built by the company that it's using it because it's really costly to build them. The training date is, the necessary is very large and so on. So for the company, how they, how do they audit this model that is coming in for them to use for example? How do you measure across different models that are, that are appearing? Uh, how do you track these changes? What's in the data that was used for training this large? Prompt in itself is an artifact. Like we're right now, we are at a level that we're just writing prompt on the code and even on the front end code, I, I, I see sometimes and just praying that it's gonna work. But the thing is that it's as an artifact as a, a binary or, and we need to be able to create, uh, uh, interface where not only software engineers but UX, uh, writers and product managers and other folks involved on the, on the creation of this can add it and collaborate properly. Right now it's only on the code. A lot of places only on the code and we need to move away from this. So this is gonna be a lot of fun, uh, in the next few years how to tackle this. We haven't even done MLops properly yet but we're already pushing the next phase so that's, uh, that's gonna be a lot of fun. Quick summary. We, MLops is very similar to DevOps. Our goal is to create software faster, uh, that solves users actual problems in a way they want, in a secure way, in a, in a fast way. But DevOps is failing, it's not enough at this point. And because the two main reasons that the person who they are developing for is different and if we don't understand that, if DevOps doesn't understand them, they'll keep lacking. And second, uh, the what is difference. There's a fundamental difference on how you see software with machine learning and how you need, and the tasks you need to do to verify that they work or not. Um, that's also need to be taken account. And the current landscape of MLop solutions are mostly point solutions with some emerging level of, of, um, of, uh, of, uh, interface between them, uh, that is not there yet with a few vendors that try to connect all of them. And we're trying as GitLab to go to, to already bringing the next step of becoming an MLops platform, uh, and create a cohesive user story, a cohesive platform where you can develop software with or without machine learning and collaborate with your peers along the way, reducing, uh, inefficiencies. Well, thank you. Yeah, again, my name is Eduardo. And if you want to know more, it's right over there on, uh, well, everything is open. I have a handbook page about MLops, if you want to check, just search for MLops, GitLab, Eduardo Ponea, you're gonna find that. That's the easiest way to find me. Uh, I'm also on Twitter and LinkedIn. Uh, I usually post my updates. I try to post updates every two, three weeks, and you're gonna find them there. That's it. Questions? I think we have a few minutes for questions. Yeah, I think, uh, thank you for the talk Eduardo. Um, so, um, my first question would be around, uh, data versioning. So how do you talk about applying any Git principles or concepts to how you version the machine learning artifacts? Um, I've been thinking about this, that we have an issue discussing, uh, this support. My question that I have right now is, is Git principles valid for data? Is something that, uh, like, the most, most of the solutions that we have for anything versioning, and somebody thinks, I'll say it's versioning, somebody on the other side screams Git. Uh, but we don't know, Git is made to track changes in text, in raw text. This is what Git is good for, because it allows also in software to track logic. Is Git enough for data science? What would be the right, like, it's the tool that we have, but it is, it is the right tool for data science. Um, and for, for data versioning, that question is even more important. What will be the right tooling for, for, we could, and probably what we will do is support existing tooling that Git, but we're trying to also go further to see if, like, the underlying, uh, solution, uh, is the correct one. But there is, uh, there is an issue that we're discussing this on GitLab. It's, it's an open question. Thank you. Other questions? Uh, question. So, in, in this machine learning ops provided by GitLab, does it provide management, uh, of data side and, uh, models? This, for models, this is the model registry that is coming up. Uh, so the model registry is how you manage models, like the versions. And once we have model registry in place, the next step on the circle is to work on deployment, on deployment of these models that you, that your registry on the model, uh, registry. So once you have model registry in place, then, uh, it becomes deployment and then trying to start implementing verification of machine learning models. Like, what does it mean supply, uh, to, what does supply mean attacks, for example, mean on, uh, for machine learning, uh, date injection and things like that. We also want the scope of what we're going to look at, uh, take a look next, eventually. Thank you. I don't know if anyone else has any other questions, but yeah, if you know, um, have you started working or designing the serving states that would come after, I guess, after the model catalog? Okay. So the serving and deployment of models. This will come once the model registry is in place. Uh, the serving is the next step. We have some ideas already. I have already started discussing with the teams. So I'm trying to get other teams also to see this problem and, uh, work. We have a team specifically for deploy. Uh, so I'm discussing how to, to work for deployment of machine learning models. What does it mean not only to have them deployed, but also on merge requests, how do you test? You see that you can test like on on hugging face. You can see the thing. You could also see this on merge requests. Uh, how do you monitor those and do you use Kubernetes or not? How do you deploy, uh, this is, this is on the scope. I think I'm going to create an issue already, uh, to start collecting some ideas around that, uh, that, uh, to explore that next step. Thank you. More questions. Nope. Yeah. Another question. Yeah. So, uh, in this platform, I would say, uh, could I use these machine learning ops to deploy my own large, large language models based on the computing resources provided by this platform? Yes. Um, you could. We have, uh, there is a limitation in the size of the files that you can have at GitLab, uh, artifact registry which is 10 gigabytes. Um, but we're trying, if you're on self-managed, that doesn't apply, but if you are on the side, using the source of GitLab, it's 10 gigabytes. So for some of the lambs, that's not enough. Um, and on our CI pipelines, we offer also GPUs, uh, enable runners. So if you want to use GPUs for your runners, you could use that as well. If you're on self-managed, there are no limitations. Like you can use whatever your storage, so you can create your own. Uh, but you have not gone anyway on LM Ops itself. So prompt management and things, we have no products. Um, we don't offer anything yet on that, but also something we are exploring. Uh, have you considering, uh, integrate some auto box solutions into your machine learning ops, something like, uh, long chain? Um, our API is pretty open, uh, and pretty powerful. So a lot of what you can do at GitLab on the EY, you can do through an API. So it makes easy to integrate with other toolings. I don't know if that was the question you asked. I couldn't understand it properly. Does that answer? Okay. Yeah, so we try to build everything as an API so that you can use whatever tool you're using, uh, and still connect and still make use of GitLab, uh, in some part of, show the things that we don't offer a solution yet or some solutions that we don't offer the full, uh, experience that you would expect. Hello. Um, we work a lot on time series forecasting and, um, we have, um, major subject, which is explainability and, um, is this time saying you will integrate in GitLab because it is very, very important. Uh, we use, uh, today many tools. So if GitLab integrates those solutions, it can be very, uh, yeah. Um, we have not integrated explainability yet. Um, and we want to provide a way to display the explainability of different tools. Like, we are not going to implement a library, a Python library that will compute the sharp values of, of, of, of your time series or whatever of, of your model. But we want you to be able to see those differences on GitLab, uh, to see, to be able, like, open your MR and see this new model that was generated, these candidates that were generated and see why, they are, what, why are they behaving like that? What's the problem? We want to give that power of, of some sort of report that, that, that you see on GitLab for, for the model creation. Yes. On MRs, on the model itself, on model experiments, for example, would be a good place to have this, uh, the explainability for each of the artifacts that you generate. It is, it is the idea. How we're going to implement this? Uh, it's still being discussed, but it is something we want to provide. Yes. If you want, you can create an issue for this on GitLab and tag me, and then we start discussing that, uh, it's the most productive way, uh, to, to, to talk about this async as well, uh, and not just leave it here on this, on this conversation. Okay. An issue would be, uh, the next step. Any more questions? Good. All right. Thanks, everyone. Thank you for taking the time and walking.