 One of the most exciting things to happen to software development in the last 10 years has been the explosion of machine learning. It has changed the applications we're building, and it has changed the way we're building those applications. Today, I'm excited to have not one, not two, but three GitLab speakers talking about how AI and ML fit into the GitLab platform, including a discussion of a recent acquisition, our plans to incorporate ML into our platform, and also our plans to help you incorporate ML into your apps as well. So, please help me give a warm, commit welcome to Mon Mayor Ray, Taylor McCastlin, and David DeSanto. My name is David DeSanto. I'm Senior Director of Product Management for our Dev and Sex sections of our portfolio. Hi, everybody. Welcome. My name is Mon Ray, and I am part of the partner product marketing team at GitLab. Hi, everyone. My name is Taylor McCastlin, and I'm a Principal Product Manager here at GitLab, working on our ML for work. Thank you, Mon and Taylor. Before we get started here, I just wanted to give a brief PSA here for everyone. So, of course, this presentation and all of its link contents are going to relate to our upcoming projects and products and our new functionality. It's important to realize this content is presented for informational purposes only, and should not be reused for planning or purchasing decisions. As with all projects, this may change. GitLab, we plan very ambitiously. So, please keep that in mind as we go through this, and that we may change any plans at any given time. Now that we've concluded our public service announcement, what's happened, and talk a little bit about what the ML is up with AI and DevOps. So, to kick us off, I want to talk a little about the journey we took to get here. ModelOps is a newer term in the ML AI market, so it might not be something you're familiar with. So, to kind of explain what is ModelOps, let's talk about what makes ModelOps happen. So, when you talk about the components of ModelOps today, the first is DataOps. DataOps is the processing of data workloads. This includes ELT or ETL, whichever version of the term you're familiar with. In DataOps, you are collecting data from multiple sources, aggregating it into a unified format, and then you manipulate, clean, and verify it before you use it. The next part is MLOps or machine learning ops. In this, you're taking that data you collected in your preparing models for production. This includes model experimentation and training, as well as testing and deploying those models, both partial, deploy, and partial rollbacks. There's a newer component that people talk about when talking about ModelOps, and that is applied machine learning. So, this is where your data, your ML model, and your application all get together to accomplish use cases. These use cases include everything from improved user experience. You see that today when using things like Google Docs, where it makes recommendations on what you should say next and it auto-corrects things, that's all powered by machine learning. Also, financial forecasting, large organizations that do investments will use machine learning to be able to pick out where they should put their bets on. And of course, trends and analysis, things like healthcare studies, and those sort of things are all uses of applied machine learning. And of course, lots of other things as well. However, with the way ModelOps happens today, there's a couple of pit falls and common pit falls, I would say. Teams use specialized tools, and I put that in quotes, because what ends up happening is they choose tools that they're comfortable with that may not integrate with another tool. And when that happens, you end up with teams working in silos, your data scientists are talking to data engineers, they're not talking to DevOps engineers, they're not talking to software engineers, security people. And before you know it, everyone's in their specialized tools, and you end up with things like handoff friction, finger pointing. It's not my mistake, it's the other person's mistake. You also end up with some other things that don't always jump out, like guesswork, unpredictability, is this going to work? And the sad part of that is the outcome is sometimes the application doesn't know how to use the data well at all. And that could be the big disconnect that's going to impact you. The other part of this, and this is near and dear to my heart, as well as Vaughn and Taylor, is security becomes an afterthought. And you can think about how dangerous that could be. You're analyzing a lot of data, that data could have healthcare information in, financial information in it, and so making it up with escalator privileges, or worse, you could have a data breach and you could tie back to how you're handling your data and your machine learning. And this is like the age old thing that happens, and you want to avoid is, you know, you finish the work and it's not working, you go, well, I mean, my computer worked great, so it clearly is not my team, it must have been the other team. Like, they should look at what they did. And of course, that goes back to that finger pointing again. So what we want to talk to you about today is really the model ops of tomorrow. And we consider that actually here today, which is the exciting thing. What you need is to be able to have visibility, efficiency, and collaboration across those teams that are working on NLAI within your organization. When you have collaboration, visibility, efficiency, you have your team starting to work together. And that enables seamless collaboration. Instead of people looking out and pointing at the other person, they begin to work closely together. And we see that today, which is traditional DevOps, where security people are now working side by side with developers for a common goal. You actually enable teams to feel empowered to act. They no longer feel blocked by someone else. They can go ahead and do what they need to do. Security can be baked in from day one. You can actually have conversations about who needs the least amount of privileges, how do we implement that in, and it's not something happening at the end. And of course, this gives you those defensible and repeatable actions, great for compliance. And lastly, and maybe the most important to a lot of you, is your applications will know how to use the data, use it well, and you'll get the best result. And how we see that ModelOps tomorrow happening is the combination of ModelOps with DevOps. So if you actually then take DevOps and you label it, or layer it over those three pillars we talked about earlier, you start getting a lot of the benefits from years of development and effort within DevOps. Some examples are version control. You can now version control your models, even your data sets, using things like Git and having that type of version control that's going to be powerful for you. Containers integration of your data science workloads into your models as you train them at scale. DevOps is great for being able to have baked in testing capabilities, and you can use those to validate your ML models to make sure they're ready for deployment. And of course, when you get to that applied ML side, you want to have continuous delivery of your ML models. That way you can do things like partial rollouts, partial rollbacks, as you need to. And of course, because you're doing it this way, you end up with your developers working side by side with your data scientist and your data engineer teams. And because of that, you know your application is going to be able to handle those workloads properly when it hits production. And then lastly, and I would definitely not say the least important, you end up with good collaboration across development, operations, data, and your security teams. And now that collaboration is just going to make everything better for you. So today we want to talk to you about GitLab's newest stage and our single platform for the entire DevSecOps lifecycle. And that is ModelOps. I know very unique name. We tried really hard and we like it. And so with that, we are going to focus on some things in the near term here, which Taylor is going to talk to you about. We're looking at building MLOps functionality into the GitLab platform. This will empower data scientists to run their workloads as part of a traditional CI pipeline. We're also beginning to look at leveraging applied machine learning within GitLab to improve our user experience. And as those areas mature, we'll begin to look at data ops functionality, coming to GitLab as well. And in the short term, where we've partnered with Meltano, recently started up, focused on data ops on how GitLab and Meltano can talk to each other for data ops. So before I hand it over to Taylor to talk to you about all the cool things we're doing right now and where we're going, I did want to highlight something. In June, we acquired a company named Unreview. And you can see they're on the right, the press release. We did this to begin adding applied machine learning functionality into the DevOps platform we have today. And Unreview is focused on developer experience and improving code review. And it does that by leveraging knowledge within Git to be able to assign the right code reviewer. And there's a lot of cool things to go with that, which I'll let Taylor share with you. So with that, I'm going to transition it over to Taylor to talk about our plans. Awesome. Thanks, David. It was a great overview of our model ops stage. And now I want to take you through what our plans look like today and where we're headed in the future. So to start, I want to talk a little bit about the applied ML group that David had mentioned. Applied ML for us is really trying to enrich GitLab features with data science capabilities. We see that as three different areas of our product initially. The first one being automated portfolio management, which is really about how can we leverage data science to help you predict and track portfolio changes managed within GitLab. The second is insider threat. We want to leverage machine learning capabilities to help identify anomalous user and CI behavior to help stop threats before they become a problem within your GitLab instance. And then finally, we want to leverage this knowledge and understanding of deep source code management and CI CD data to really power the future of code security. We offer security scanning capabilities today, and we want to enrich that with data science capabilities to really allow us to detect more advanced types of vulnerability in your source code. So all of these things together will be the various areas that we're going to be looking at of actually using data science capabilities to enrich existing and new capabilities within GitLab. And David talked a bit about our acquisition of unreview. That's one of our first steps as we move towards automated portfolio management. We're thinking about this really in the sense of recommendation engines. We have a lot of data about your epics and issues and how those progress through various stages of your development cycle, how that ties into your source code, and how that progresses through your CI and CD processes. So we want to leverage the core data that we have about your development practice and push that through machine learning models to help do things like suggesting labels or suggesting appropriate code reviewers for specific types of code changes. We also want to enable smart defaults for GitLab based on the type of project you've got, the type of workflow that you're using. We want to change the way that GitLab works so that you have to do less configuration and you can focus really on the product's features, functionality, and code that you're contributing within the GitLab platform. And then finally, we also want to do things like code navigation. When you think about the extensive series of developer tools that GitLab offers, we want to make all of those things smarter. And that is really one of the reasons why we acquired unreview is this is about us starting to think about how do we actually leverage data science to deliver very real capabilities that our customers like you are looking for. We've seen it in the news and on Twitter where companies go and talk about machine learning and artificial intelligence and deep learning, and it doesn't actually add up to anything for product features. It's just buzzwords on the internet. We really wanted to take a lens through this problem and say, what are the biggest challenges facing our customers today? One of those that we've heard is code reviewers. So today, GitLab actually offers a code review functionality where you can choose a reviewer. It'll notify them of your code changes. They can approve them or provide suggestions and feedback to the original author. But today, you have to manually choose who those people are. Part of the acquisition of unreview was this is a really great opportunity to leverage the data that we have as a source code management platform, your commit graph of a particular repository, and leverage that to smartly recommend who might be a reasonable person and a knowledgeable person to actually review your code. And that's exactly what unreview does, is it leverages that source code contribution graph to recommend who is knowledgeable and who may have previously interacted with specific code changes. So we're looking to integrate this directly into our code review functionality today. This is really about us trying to get familiar with developing machine learning models and deploying those within GitLab to enrich GitLab features. But this also strategically sets the stage for us to deliver on model ops and the ML ops and data ops capabilities as well. Because at GitLab, we dog food everything we do. We build GitLab with GitLab. So as we build this feature and functionality end, we'll be building out data ops and machine learning capabilities within the GitLab platform, which we'll roll into those other stages that David had talked about. I briefly want to talk about two other areas that we're going to extend here, insider threat and intelligent code security. We're starting with automated portfolio management, and we'll leverage that as a first use case to really get our hands dirty and understand how to actually interact with the core data that exists within the GitLab platform and then really expand that to these other use cases. For our insider threat capability, we're really looking at abuse detection. When you think about GitLab, your source code management, your CI CD, there's a lot of things that can be abused here. For spam, malicious users, certainly issues with people spending up crypto miners, bot activity, especially if you're opening and working on open source repositories, we want to help leverage all of that core data within our platform to reduce abuse of the platform as well as let you focus more on what you're trying to deliver. Eventually, we'll move that to user insti behavior analytics, UEBA, and focus on anomaly detection and correlation between events happening inside your GitLab instance. Finally, the intelligent code security piece. We want to take that deep knowledge that we have about how your code is developed, how it changes over time, and the security vulnerabilities that we're detecting in that source code and leverage that at scale with machine learning to really detect deeper security vulnerabilities, detect vulnerabilities at scale across the entire GitLab ecosystem, all code that's contributed to GitLab, and really power the future of our security scanning engines. We want to do things like detect, not only detect security vulnerabilities, but also suggest code changes to remediate those. When you think about it as a source code management and CI CD platform, when we suggest those code changes, we can actually push them into your repository. We can run your CI CD, make sure you've got passing builds, make sure that those security vulnerabilities are remediated, and then actually provide you a MR to merge with those code changes. There's lots of interesting capabilities that we'll be able to really enrich our existing capabilities to leverage machine learning techniques to really make our platform smarter and more intelligent so that you can focus on contributing. That's an overview of our applied ML functionality. As I mentioned, as we're developing all of these capabilities, we're going to be building out data ops and ML ops functionality. Data ops is really about how do you get your data connected to the GitLab CI CD platform so that you can start running CI jobs that are training models, developing models. You'll see a little bit of that here in a moment from Mon as she talks about what we've seen from the field and what capabilities our customers are looking for. When I think about data ops and ML ops, it's really about delivering capabilities within the GitLab platform for data science workloads. On that data ops side, David had mentioned that we're working with a startup called Meltano to really allow you to connect disaster data sources and provide access to those within a GitLab CI pipeline so that you can then build jobs on top of it. Meltano also supports ELT or ETL data pipelines that you can transform that data and interact with it and get it into the right shape and form to then run data science models on top of it and then hand that over to the ML ops side where we really want to enable tool chain integrations with popular data science libraries like TensorFlow, Core ML, PyTorch and others. We want to provide a really great experience for data scientists who may not be super familiar with CI CD platforms to be able to quickly get started with CI jobs that run their machine learning workloads to develop, to explore, to train models and then deploy those to production. Mon will talk about a little bit about the way that we see that developing in the future. And I want to touch briefly on some of the things that we're doing to actually enable this today. As we think about trying to expose new compute options to GitLab Runner and make those available to the CI platform, we've actually expanded GitLab Runner to support GPUs. So you can actually run a ML job that requires a GPU with GitLab Runner. We expose the capability to customize and access that GPU so that you can train your models or do other GPU intensive tasks within GitLab Runner. And you can see how we're really trying to bridge the gap between data scientists and DevOps to where data science can work in the platform where you're already deploying and developing your source code. So that's how we're starting today. And now I want to hand it over to Mon to give us a look at what we're hearing from our customers. Mon, take it away. Thanks, Taylor. And thanks, David, too. So we have a good understanding of our plans for today and tomorrow. And now from the field, the findings from the field is based on what we're seeing from our customers already using GitLab or want to consider GitLab for the new addition in your DevOps platform of deploying machine learning models. So from an MLob stack, based on the customers, we've put it down to 11 different areas where our customers would like some help or have challenges. It goes all the way from starting from the value proposition to data sourcing and versioning of the data, then going into the data scientist workload of analyzing experiment management, and then finally helping with retraining, storing the features into feature stores, going into then the code repository, having an orchestration CICD pipeline running from that data to the feature engineering, to the retraining, then to model deployment, model registry versioning, prediction services, and finally the monitoring and metadata store. So all in all, all of these complex areas make that full MLob stack and each of these individual pillars is where our customers are looking for tools to help through that journey to taking that machine learning model from a POC perspective to scaling it and actually being able to impact and have business value through that process. Further, there is still a choice of what kind of platform you would want. It's all in three different areas, would it be the cloud, would it be managed, or would it be in-house machine learning implementation. When it comes to clouds, looking into what serverless can mean for MLobs can be quite different to actually having managed servers where you are implementing an end-to-end platform such as data robot or Pachyderm or other auto ML tools and then having that in-house full management of it using just all open source models but having the full orchestration of deploying that open source model. Now the first part that the customers ask is really about how do we actually enable ML models as product. By that it's not just the data scientists or the developers or just or data engineers collaborating together, it's beyond that. So this is just an example where it's the product, the scientists, the DevOps, the security team, all of it coming together speaking the same language or having it to being able to translate the workload and the language to enable that continuous collaboration. So that's kind of that first customer ask that we see where they come to, we come to good lab to really understand that full flow of continuous model ops without any friction. The second part is really then going into more on these MLobs pillars. There are three areas mainly where, which is first the data and features that is part of what David and Taylor talked about the data ops, really the questions come out is where the data is coming from, versioning of the data, how do I actually monitor bad reproducibility practice, all of that as part of that data and features. The next part is about the models, really how to build that continuous integration and deployment of models, how to choose that best platform when there are so many different tools being able to do auto ML, feature stores, what is that choice of tools to really be able to take that model from idea to production and lower the cost of prediction and the best practices to retrain the model seamlessly. The third part is the monitoring and tracking of it. We've seen various kinds where there is manual tracking that's done in even Google Sheets to Excel files. Really taking that to the next level because going from that POC to actually building scalable models, there needs to be better ways to monitor track from the production to all the configuration to the storage, all of that having a better framework to track it. To sum that all up, it's really into these three areas where you're looking into the experience to be more personalized, which helps with supporting no matter what kind of frameworks of models, whether a data scientist wants to work with PyTorch or TensorFlow or looking into even a language, whether it's Python or Julia, so having that personalized experience as well as giving that infrastructure personalized experience for the ML lifecycle, then making it seamless, so ensuring that model security and integration, the CICD all the way from data ops to ML ops to applied ML, all of that that's having that seamless journey for it. And then the last part is really about that freedom where people don't have to worry about scanning Python packages or actually data scientists don't have to worry about that version control or being able to not have the explainability of it or spending time in figuring out how to explain and govern the model and giving them, giving the tool to be able to do that and giving the data scientists the freedom to actually do what they're supposed to do in actually building these models. So in reference to this, then we've staged it into these five different sort of areas where our customers are and a lot of it, it can all depend on the domain or even geography location or wherever that stage is. The five stages are where there is literally no ML ops, so most of it is done through traditional manual building and training model. Most of the models exist as black boxes and the teams are in silo. Then there's the second and third part, which is DevOps but no ML ops and continuous training where a lot of our customers are there. They're already using GitLab for DevOps and now the DevOps engineers have not just to deploy a Java code but also they have this new entry which is to deploy machine learning models and how can actually have those similar DevOps practice, how can we use that to actually help deploying this new framework in this ecosystem. So there's DevOps but no ML ops and then there's continuous training where there's a little bit of we've already started using those DevOps practice within the ML ops. We've looked into looking into the reproducibility of the models. Some of the manual work is already taken out and there's a little bit of continuous training and model management. A lot of our customers are kind of in those two different boxes and then it's all about then taking it to the next level of having that continuous model deployment where with every new data you're getting in you're able to actually track that data which can be quite different to the training data and being able to feed that and have that full traceability of post-production continuous feedback loop as well. The final stage is that having all of that fully orchestrated from data ops to model ops, giving the sense of alerts with data drips or model accuracy drop and having that sort of zero downtime system where there's not a lot of time built in that operationalizing of models but more about actually focusing on the business case and building value-based models. When it comes to what we have already talked about, Taylor talked about our GitLab runners, how we've enabled GPU. This is just a rough architecture of how that can actually help so GitLab can with our CI service can trigger the pipeline and continuously integrate these changes so all the way from the source code to CI service to using the GPU in GitLab to train the models we can help with that CI CD of the ML ops and that can be integrated to your front end, your model serving service or the object storing services as well. So that's just a rough overview of the GitLab CI for ML ops and this is sort of just all the sort of findings that we have from the last year from our customers and then with understanding our customer voices, having a direction of model ops, knowing our different areas we want to focus on. So we've looked into what our customers are looking for in GitLab as well as we have a direction of our model of stage and so with that we are starting to look into hiring as well. We're looking for all sorts of roles from product managers to engineers to developers to security, QA, cyber lab, also some engineers so really, really recommend people if we are keen and interested in taking GitLab to that next direction and model ops to go into our career opportunity pages and be able to apply for the role and help us build the model of stage. So thank you everyone and thanks David, Taylor and thank you for listening to us.