 Hello, everyone. Very much welcome to Cube DDR, a dance of predictive scoring with MLOP step by step. This is a very big, fancy room for one of the final sessions of Fridays. I'm very excited to have you all here for such a fun topic, hopefully, to finalize an amazing Cube call for everyone. Hopefully, it's a fun topic. Let's see how it goes. Even though our session itself, it's about dance, dance revolution, DDR, all of these kind of maybe fun things, actually, the space of AI, ML, and sports is a big one. So as you can see here from some of the newspaper clippings that we show here, there's a lot of different players working in this field with big budget, big teams. We obviously were two people doing a really simple demo here on CubeCon and everything, but as you can see, there's, you know, articles from everywhere, from game-changing AI applications in the sports industry to hobbies predicting rugby matches, to proper journal academic studies looking into predictive models around this, and it's not just limited to physical sports, there's also eSports, so you can see the article there as well. The eSports one is quite interesting because there's a really big difference between trying to get data and like make sense of stuff from the real world using sensors, right? But in eSports, everything is very objective and you can often instrument the game. So there's like a huge difference there. Exactly. So there's a lot of different people, a lot of different varieties of approaches there. And it is not just CubeCon that's been taken over by AI, it is the world of sports as well. We can, I think, make a broad assumption here as you can see. But it's definitely coming for sports as well. So we are here talking about dance dance revolution, but who here has heard about dance dance revolution before they saw this session? I figured, so I thought we would have a lot of people here who are actually enthusiastic about the game, so that's really great to hear or to see from all of your hands. So if someone is completely unfamiliar, it is a game that was released in 1998, and it has really kind of kicked off all of these gaming fitness hardware kind of space as well. It's very popular. It's been featured in South Park, for example, and we have here the pads, actually soft pad versions, as you can see, and it is simply played by, you know, putting your foot to the right arrows and so forth. As you can maybe see from the fact that we have the pads here, we might have a demonstration coming up in this presentation as well, which is, I think, hopefully the final part as well. So who are we and why are we bubbling on here? So hi, I'm Annie. I'm CMO at Vision. I'm a CNCS ambassador at Azure MVV as well. I've been doing a lot of things around communities and cloud-native for years now, so very excited to be speaking about MLOps and dance dance revolution here today. Yeah, co-presenting. My name is Lee Kapili. It's been really fun working on this with Annie and I'm a developer advocate and R&D engineer working on Tanzu with VMware by Broadcom. Also a Kubernetes contributor, and you might know me from the Flux project. You'll see that I have a little boba next to my picture there and that's because me and my friend Kunal also created a initiative called SIG Boba, a very unserious group of people trying to have fun in parties that maybe aren't so focused around beer at happy hour, so. Definitely a fun event. So then dance dance revolution, MLOps, Kubernetes, why are we talking about all of these three things here? Well, I think it's actually a very good example of sports AIML space, so you saw those fancy examples in the beginning and the big teams and big organizations spending money on this thing, but actually when you have the eSports space and then you have the actual physical, like football, basketball space, dance dance revolution, as you can see in this video as well, combines them both. So you have the eSports element because you can actually get data from the video screen, you have all of that eSports kind of combated to there, but then you have the actual physical playing as well. So I think it's actually a fun, but at the same time the perfect example for this space as well. So we have broken it down essentially to a few different parts for a very simplistic view into how predictive scoring for dance dance revolution could look like, will look like, and looks like on this presentation as well. So you obviously kick off with computer vision, so we need to have a visibility into what is happening and obviously in some cases it could be the physical playing of movement of the legs and so forth, or in some cases it can be computer vision into looking into the screen off the eSports portion of the game, AKA the scores and all of these that are running in the screen. Predictive scoring would be the other one where you're actually then taking that data in and figuring out insights from there and making the predictions. And then ML Ops would be obviously the cloud page of the communities part where you actually put stuff into production if you were to run this as a massive experiment or actually running in production for teams on a larger scale than just on one machine on your own essentially. Yeah, but let's continue. So the computer vision aspect, of course it could be in a simplistic way. So we have one demo coming up, but we don't have a demo for the simple computer vision part because I thought we wanted the challenge, we wanted to go for the second part already, not just for the first part, but in the most simplest view, it would be detecting when a dance pad is occupied. So you would have obviously a bounding box there that shows you that, okay, this is the area that is the arrow that we need to hit. And then when you have a foot hitting it, you would see that, okay, that is actually affecting this and we are seeing the actual data, that is when the score is scored and it's all good. There's a lot of opportunity actually in this technique because you would think that just like a foot touching a panel is kind of a subtle thing to try and pick up in a camera. And the camera angle and things can like really change how a computer model can perform. I mean, if something is just a couple of degrees off then like you might not be getting the same kind of detection rights, but something that we can't really demonstrate with just like some soft pads that we bought off of the internet and put into a backpack here is that when you find a dance dance revolution machine at an arcade, it's actually quite a spectacle. And every time you touch your foot on the pad, they light up and that's pretty clear signal to a computer that you could get from a camera. This strategy is pretty viable, I would say. Yeah, it is, it is. And obviously if anyone wants to try it at home, you can use, for example, yellow and open VC and train it with your own dataset of, you know, taking, for example, images of your foot on the pad and then other objects or something that you wanna train that it recognizes what are the objects and why are they there and so forth. And obviously in a even grander scheme of things, as Liz said, go for different other, you know, indicators for what's happening in the game and why is it succeeding, what is going well. And even then, for example, in sports like football or some of these, then you would obviously relay on different, you know, indicators such as, you know, I don't know, hardness of breathing or something like that or speed of running or whatever it might be that's an indicator that player is performing in a certain way. Then predictive scoring, which is the kind of part where we focused a bit more on this. So we have a bit of an OCR and algorithm demo here coming up. So OCR in itself, if you're not familiar, it's a technology that has changed the way that we can manage all of these things. So OCR optical character recognition is for exactly that recognizing the characters. So you have YOLO and these all kind of other computer vision models that are full then identifying, for example, objects, for example, a foot on a pad, but then OCR is for text, digits, numbers and so forth. So it's been around for a bit since properly conceived in the mid 20th century. So it's been really helpful in helping out with this. Now our demo here today will be using OCR with OpenCV, Pythasarac and non-vegetarian term in winning the year-to-year player real-time. We won't go to the demo immediately after this because we're gonna show you a bit about how did we arrive to this actual iteration that we will actually show on the stage then. But in general, so we are using OpenCV and Pythasarac. We're using OpenCV for the image processing and Pythasarac for the simple interface with OCR. And then we capture a video feed, which is actually because this is a bit of a fun demo by using an actual camera, which we determined was honestly the simplest way to do this because we tried a lot of streaming with a lot of different options, but it all was a bit not as good as this option for sure. Then we do a bit of pre-processing on it. So we do grayscale by including Gaussian blur and reduce noise and thresholding to create a binary image as well because the numbers we're not showing up properly. And then we do the number detection. So we detect the numbers and then we determine who is the winning player at any given time. We can definitely show the differences in the layers for the pre-processing on all of the image frames to show why those techniques were necessary and we'll pull that up in a moment. So you can see here, this was one of the first iterations. We used YouTube video footage for this because it was actually really easy to control the amount of points or the amount of points that the player has because if you're playing, we would have to constantly be playing a game to get more points. That's pretty floored when you came up with that. Yeah, yeah. You see, it's just like, this is the cool thing about collaborating, like Ani went and was like, hey, I kind of need some input data. Like, do you think we could find a video of this thing? And then the problem is, oh, well, we only've got like a single player. The nice thing is that Dance Sense Revolution is pretty symmetrical. And so like, maybe it's not like being fully clear here, like Ani was able to point the webcam and then use her window manager to just clone the video, put them side by side, and then slide them at different points in the video to make somebody lose or win. And this was like a really great way to synthetically get training data to iterate on the computational technique using OpenCV and all of that without actually having to play the game over and over and over again because we tried a bunch of things and it did not work. Exactly. For sure. This is the iteration where you see where I had applied the grayscale because without it, it was not showing up at all. But as you can see here, we have actually a lot of data coming in here as well as in addition to the numbers, so the scores, the points, we see also all the text that is on the screen as well, which is obviously not useful for us for the predictive scoring point of things. Then moving on to the next iteration, we still have the grayscale and everything so that we can actually clearly see the numbers. And this is exactly the point where I had two videos running that Lee described before where I could switch then between the two videos to show which one has better numbers or who is winning and I could really in real time see that, okay, player one is currently winning, then I switch the point in the video so that the other player was winning because I could control it exactly who is winning and then it switched to the player two winning, for example, so I did it a few times here and it is not perfect so we have no numbers detected happening occasionally as well, even on this version, but you know, this is not such serious demo so that we would need to have perfect accuracy at all times, but we have there, you can see it was player one, player two, then player one, player two, so those were the points when I switched between who is winning because I could control that very clearly. Okay, so now we're moving to the actual live demo part where we're using the OCR to do this and for this we obviously have the pads here, I will be our camera operator with me, as graciously being well-entold maybe. Yes, yeah, let's clap it up for with me. Whitney Lee, everybody. Join us our second dancer here so that we can see. I do wanna give the heads up that we had some challenges with the background coloring at the moment so it was not as accurate as in some of our tests, but we'll see how it goes. You should be seeing the data pull up at least, but let's see, does it detect the winners as good as it could be? Yeah, we have a very noisy background that literally says no background image and it messes with the contrast of the numbers, yeah, cool. Yeah, to the AV team, maybe we could pull up the feed of the video game now. That would be possible. Sweet, and then is there a way to kind of show both of them at the same time or do we have to flip between them? Maybe there's not a way to show both at the moment. Cool, yeah, if we can't show both, then we just want just the laptop screen, which would be, yeah, awesome, thanks. So now you can see the data starting picked up there and we will start the game soon. Yeah, and then you'll need to pull that open screen to the output window up to, yeah. We're using extended mode because we are brave presenters, so two separate screens. Yeah, there we go, nice. Great, yeah, so we've got a webcam feed here and then we've got the game loaded up and let's go ahead and step up here. I feel like we've done this before. No, yeah, Whitney wants everyone to know on the stream right now that she had no idea that she was gonna be on stage playing Dance Dance Revolution, so yeah, okay, let's pull up. This is, I think we're on like a five or a six right now. It's pretty tame, but it's the default scroll speed so it's a little bit quick. Oh yeah, and then, sorry, one second. Let's make sure that we get our audio too. That's a good point. And cool, we should have audio now. Yeah, like not as good as in our pay pad, huh? Yeah, we're kicking them around a bit. What do you think, Ani? What? Do we have some data coming in? There's definitely data coming in, but I think the numbers are not as good as you can, but I think the audience can see them being picked up though. And then, Dewey, do you want me to disable that printer for all of the stream of text that's coming in? Because that's a little bit of a visual noise, isn't it? This is a lot, yeah. All right, cool. I love that the audience is dancing along as well. Perfectly done. Thank you, Whitney, amazing dancing. Particularly surprised. That was really amazing. Well done. OK, then we can get back to the deck as well. There we go. I would be able to take maybe some time to show some of the moving pieces there. Of course, yeah, sure, sure. Yeah, I just know you're not... We're using my laptop. Ani wrote a lot of this project, and then I was able to pop over to her Git repo. I like the extra challenge of dancing and then showing the code for Lee as well. Isn't it extra fun? Be like out of breath. Cool, yeah, all right. So this is the code that's currently running up here. We have a Python file that has a pre-process function and OCR function, a determined winner function, and then popping into the main logic loop here. We have to open up a capture. So you can see that there's a couple of different kinds of ways to capture things. One of the most interesting, especially like when you're in development, is this particular sort of strategy. I'll make sure that this is a little bit larger so we can probably see. And so you can just pass in a file of a video, and that video could be from the real world or it could be from your window manager. But then there are a couple of different ways that you can get streams of videos. SRT is secure, reliable transport. I don't know what REST stands for, but it's kind of a competing alternative. Both of these protocols are actually not fully adopted yet because they're newer than what's like, say, if you stream to Twitch, it's RTMP. And even before that, a lot of webcams, like IP cams and stuff, use RTSP. It's like real-time streaming protocol, real-time media protocol, message protocol, sorry. So these work another way as well is to serve your file on HTTP. And FFMPEG is all under the hood here. OpenCV is leveraging FFMPEG to do all of the codec and encoding and decoding transformations so that we can actually get the frames. So once you set up the video capture, we are able to basically go into a loop where we can read frames. Behind this is actually something that's messing with the buffer. We were using a subclass, and we call the preprocess function, which Ani, you implemented this over here. What were these pieces again? So these are the ones that make it grayscale, grassy, and blur. Because before we added these, it did not work at all. Well, to be honest, you could get a lot of the text feed by that point, but the numbers were not standing up enough so that the machine could actually properly detect them. And to be frank, that is also some of the challenges that we have right now with the background, because it really, this is one of the parts. Like if your data is bad that's coming in, you will have a lot of challenges, no matter what, like the basic principle of ML and AI. So with these, we were able to at least attract some amount of numbers, and depending on the background color, we would get better data or worse data. So with these three, we're settings that work really well with some of the step-mania images and themes. But if you were to try this at home, because actually we will give you a link to the code after this, you would obviously have to optimize for the theme that you're using. You would obviously have to optimize for the setup that you're using, on which exact parameters for the pre-processing work best for your setup. Kind of going beyond that, we want to then get the text out of the OCR function. Here, we're just doing some quick hacky Python string manipulation. There's a list comprehension over here that's grabbing any sort of percentage. That actually worked on a different theme that we were playing with. And then this bigger numbers list is what was being used in this example. Oh, that's because it would be actually quite simple if we just have one number that we could track. But the great part about Dance Dance Revolution is that you have all those numbers popping up, like perfect and different scores. Yeah, your combo counter, and that's not always there. And so you have to sort of make sense of the screen space based off of what's coming out of OCR. Some bounding box stuff would definitely eliminate that. Yeah, we also explored using ROI, so regions of interest, to highlight the area. But on this timeframe, getting that working was a huge issue. And I spent way, way, way too much time debugging that. But that would definitely be a good way of implementing so that the OCR could focus on only a single region. And actually, you know, get that. So going cloud native. Because what was that, Annie? Some 78% of models? Yeah, so most models never make it to production, which means that for MLOps in particular, which has been a theme across KubeCon right now in general, is most models never make it to production, which means that the bottleneck is in the going to production phase. So developing models on, you know, on data scientists, Jupyter Notebooks, your own little Python, you know, your own local machine is actually surprisingly fast. So then when you wanted to go, you know, for example, if you would have a bit bigger, maybe use case scenarios, so then dance dance revolution dancing, then you would want to actually go to production. You would want to have multiple different people collaborating, maybe data from different sources, globally and so forth. So that is going to production with all of those elements is the part where things fall true quite often. And it's over 50% or even over 70% that never make it to production. Let's look a little bit at where the pain starts to come from, right? So I'm a data scientist, I'm not actually a data scientist, but I'm just pretending to be for this example, right? I'm in here, I'm in my Python, I've got my Jupyter Notebook open, I'm trying to, you know, sanitize my inputs and outputs, section my data, run experiments and that sort of thing. And then my DevOps person says, oh, gosh, can we try to figure out how to run your app? And I'm like, what is a Docker file? And he's like, oh, well, I mean, like it's like how we're gonna set up your environment. I'm like, okay, I mean, that seems like reasonable. And so, you know, if you're lucky, you know, you can just try and do the things that work on like your MacBook, you know, or the things that work on your Linux workstation, and maybe you can get them to map into, you know, whatever operating system is under here. But then you find yourself in like a world of pain, and you read some stack overflow and suddenly your Docker file maybe looks a little bit more like this. And it's not fun at all. You know, I mean, I don't know what all of this stuff is. Like why am I trying to learn how to build FFMPEG and OpenCV, and what is a codec? And at the end of the day, I'm really just trying to get my app in there so that I could go and run my fun computations. But there's all of this infrastructure work involved. Let's not even talk about Kubernetes. We don't even have enough time in this talk slot, time slot to spin up a kind cluster and get all of this running in there and then show you, you know, flux. And you know, why do I need Carvel now? And what is a image shaw? And how come I have to pin it? Is that, I guess that must be important for security. Hopefully we don't need signatures and all of that, right? But the reality is when we try to do real stuff, there's all of these people that are involved in the productionalization of these models and of the execution of the thing that actually produces the valuable questions and answers. So we do have the ability, you know, to build this into a Docker container. Right now, this was just running on the host. But then, you know, you can come in here and you're like, okay, you know, Docker builds, you know, and then let's give it an image name and we're just gonna pretend that we don't need to tag that in version it, whatever you can see. I've already done this build and so all of the painful things that normally happen have already now happened, right? Then we can run a Docker container, right? And then now we can try and run our application. This was currently trying to read from a webcam and that's not gonna work, right? Let's exit here and then we won't enter bash. So I'm just trying to execute and you can see I'm getting a bunch of gibberish because if I pop up to here inside of our Python program, we cannot just pull up a webcam. But what we do have is the ability to load test data. Now I can't just load from a local file system because that's a container. It came off of a container registry. It's running on a distributed computer that I don't control, right? So one of the things that we can do is we can like put our test data or stream our test data from some network accessible server. Here we're using tail scale just to have a way to get back to the host from the Linux virtual machine that's running on Mac. And if I come into here now and then I run, let's just do a build really quick. Make sure our code is up to date. Yeah, I enabled that. And we've got a little bit, I think that's a web server log. We'll just double check. Yeah, so I'm running a little Python web server on my machine, nothing complicated there. And then hopefully if I were to try to actually run my code now, then we are able to actually, yeah, so we can actually read from the frames of that. And if I change to a different video capture strategy, I learned all of this from hacking on Anni's code. So just do another build just do another build just so that we're not faking, mounting volumes or anything. Then we should be able to get a few more, sorry. We've got to also turn off the fact that we are displaying some builds. It was complaining about the lack of a display server there. There we are, cool. All right, so we're now doing OCR on all of the frames in the video as our capture mode, cool. Yeah, perfect. We covered a bit of this already. So as we talked about it there before most models never make it into production and as Lee really good highlighted here some of the practical challenges and thoughts that go into these things. And as a kind of side note as well, there's so much bust around LLMs and how to make MLops, Cloud Native and all of these things work for them. Well, as a reminder, Generative AI and Predictive AI are obviously have a bit of a different need so that obviously impacts the ways that you would do MLops for these two as well as on what actual demos that you can run, for example, in these kind of scenarios. So this is a picture from the CNCF AI White Paper that was just published what two days ago, a really, really good read, highly recommend. And I think it highlights really well what are the different needs for Generative AI and Predictive AI and what we are doing here is Predictive AI AKA it's taking in input and predicting or describing that in a certain way, but Generative AI is the LLM, the Chachi PTs of the world, which then actually generates something new based on data that it has ingested before essentially. So those are the two different things. And as you can see, the scale needs are very, very different across the board there. And then we... Generative AI is also not usually used for analysis per se, right? But then we produce artifacts through Generative AI. And so like while in the sports use case, like we might be really trying our best with computer vision and integrating with APIs of structured output like in say League of Legends or Dota, then we're trying to find information about what's either happening right now or also very likely like what happened throughout all of history. But then with Generative AI, maybe a DDR, Dance Dance Revolution example might be like, what if we looked at every step chart in the world and then tried to produce some new ones? Exactly, yeah. And actually a good example of one application could be also, using historical player data from one of these Belayer repositories and using that to determine before the battle has even started who would win the Dance Dance Revolution battle as well. So you could get into sports bidding with that front as well. But then just a bit of recap there. We talked a lot about this already. So Q-readings and Cloud Native are a really great fit for ML Ops, but it does really, you need to kind of figure out how to make sure that your environments and everything works for the data centric, model complex and scale needs of your applications across the board before we have very specific industry wide ranging like ML Ops best practices using DevOps best practices are really, really helpful way to get started. And then Lee spoke a bit about Q-dots as well and I do want to add something there or? You know, there's just, there's a lot of pain for somebody who's just trying to slice data to actually operationalize their code. 70% of models not making it to prod. That's a huge issue there. And so you can see that even just in trying to containerize some computer vision code, we run into issues like, oh, the developers laptop had, you know, a display device. So the developers laptop had webcams, you know, and it was, they were able to debug. And so it's like, now you're thinking about the distributed computer. Kubernetes, like, how is our device for it? Is it easy to use? Definitely not. The complexity starts to get to a point where, you know, like running these on the distributed computer becomes an interesting operational problem. Exactly. Then just highlighting a few parts. We are running out of time a bit, but I highly recommend checking out QPLO, Open Elementary, for example, for MLOC projects and KSGPD for actually, you know, Generative AI for Kubernetes. And these are all also highlighted in the new AI white paper by CNCF. So you can read a bit more about them there. And then we have some resources. So you can see the link here where you can find the code. We will probably add some more did-fits today, some of the other variations that we tried, some of the things that we tried to fix for this one and so forth. So there might be a few different versions that you can try out at home as well, but there's already one version. And then if you're looking for a practical guide to MLOPS, I really like the book by Noah and Alfredo. So I highly recommend checking that out. CNCF AI white paper is linked there as well. And then if you're looking for specific materials for computer vision, learning, open CV has both free as well as paid materials that you can check out. And then we were meant to have a... Yes, Lee? I think we're getting kicked off. Yes, I think we are getting kicked off. We had an idea of having even more dance dance revolution playing here, but I think we have run out of time. So we probably don't have the time to do that. But thank you so much for everyone for tuning in.