 All right, so I think we can start and we give ourselves a couple of minutes. So next we have Shreenand and he'll be talking to us about multiplayer machine learning with Metaflow open AI whisper and Kubernetes Thank you. Thank you. Hi everyone. My name is Shreese Audekar I am an engineer at Outerbounds. Outerbounds is one of the main companies behind the open-source Metaflow project today We are going to talk about multiplayer machine learning essentially with Metaflow open AI and Kubernetes I know lots of jargon lots of terms. We'll try and justify the existence of each of them So well, let's start with Metaflow. So what is Metaflow? So Metaflow is an open-source project Metaflow makes it easy to access resources needed for any data science and data intensive application So typically you would require compute workflows data and versioning So for doing any of these if you want to use a Python based API Metaflow is one of the projects you can use Metaflow began as an open-source project at Netflix It's still actually open-sourced under the Netflix group of projects You can go to github.com slash Netflix slash Metaflow to learn more about Metaflow So Metaflow makes it very simple to create Python based data science projects Bython is one of the most commonly used languages for data science here I have a quick example of what a Metaflow flow looks like So on the left-hand side if you see kind of sort of the terminal based black-and-white screen It's a simple example of what a flow looks like you have a three-step flow where someone creates a hello flow or any class Which actually Implements this flow spec that Metaflow has and there are three steps the start step as you can tell is the first step That begins the flow this one says the next step is the work step So the work step gets called next and then work says the next one to do is end and that end actually completes the flow You can do sequential or parallel execution of these steps You can have data passed from one step to another or can have data passed from one step to multiple steps that run parallel again Take a look at the open-sourced documentation for more information about this But the best part about Metaflow actually is that you can run these steps Locally as well as remotely so as you go through the process of actually writing the code and actually Building the flows or building your data science code You can keep running this thing locally using something like hello dot by run So you keep running this locally once you think you're ready to actually run this at scale on a bigger back end like Kubernetes Where you have access to more memory more CPU or maybe more GPU resources or maybe access to data that is only available in Kubernetes You can actually just run this with something called like hello hello dot by run with Kubernetes And then if you want to actually deploy them and have these run at a particular cadence or run it based on events You can use our go workflows for doing this And you know those are the two commands again all of this is available in the documentation Okay, so then let's move to the next part of the of the multiple jargons that we had open AI whisper Open AI whisper is actually a machine learning model created by open AI that does speech to text translation So the simple thing that it does is given an audio file. Give me the text output of that audio So that's what it does it supports multiple languages and another interesting facet of open AI whisper is that this model or the model weights rather are available in multiple sizes They have t-shirt sizing So you basically pick as a user you have to pick whether you use the tiny model or the small model or the large model or the Medium model and that size of the model decides a few other things So the size of the model actually decides how much resources you will need when you actually use the model for transcription So if you use the tiny model, obviously, you will need fewer CPU for your memory If you use the large model, you will need the maximum amount of CPU and maximum amount of memory Just so that the model actually fits into memory this actually decides so the size of the model decides the resources It decides the accuracy of the generated transcript So, you know how accurate is the transcript and then of course it also decides the time required for actual transcription So again given a file like how long does it take to actually transcribe it size of the model has an input on that So so we came up with this fun challenge that okay metaflow says that you can do all these kind of sort of cool dag related interesting stuff using Just Python APIs and whisper is a model that can do transcription with multiple sizes Can we do something like this if you see this flow from left to right? We have a start step and there what we want to do is we want to transcribe three URLs So basically three files that are available as URLs and in each of these cases We want to use the tiny model and the large model So we have to the tiny open-air whisper model and the large model Transcribe the same file with both of these and then do a join just so that you know you kind of sort of get the results In theory at this joint step you could do some sort of post processing to see whether the output of the tiny was better Or the large model was better or the time difference between the two or what have you so you could do some sort of post Processing I have a demo in which this post processing does not happen But it is something that could be done and then you complete the step the whole flow So let's see how we can actually go about doing this So I have the source code for all of this here Where we have the start step we we decide or we actually set up three URLs in this case Of course the number of URLs in this case is three But it could be any number then you call this transcribe Transcribe says that when I am at transcribe call two steps called tiny and small So I know I mentioned large model in the previous step, but instead of large. I'm using a small model It's just it's easier to do it So you have two steps that are happening in parallel for each URL for each URL you call this tiny and Small steps and then they both at the end say join So they combine the results and then join finally calls the end step which actually completes So then let's take a look. Oh, by the way, what other what would be the best way to run this? Obviously, this will be on Kubernetes, right? I mean that's what brings all of us here the interesting bit here is that you can use Kubernetes decorators In Metaflow and you can pick and choose how much resources you can provide to each of these steps So for the tiny step, I'm actually giving it two CPUs and one gig of memory because that's what the open AI Documentation talks about and then for the small step I'm actually giving it four CPUs and eight gigs of memory just because that's the amount of memory needed to actually run this step So and this will automatically get taken care of when this runs on Kubernetes Corresponding parts that get spawned up Metaflow will make sure that these spawn up with these resources So once this actually happens you all you have to do is basically run this with Kubernetes in this case and and this will actually spawn each of those steps one after the other or some in sequence seven parallel depending on how The flow was actually written and then each of those steps will take its time We won't have time for actually waiting for this to complete even though it just takes about a couple of minutes to complete I had a previous run that was actually Here that you can see so this multi audio transcription flow is this is the one that is actually running right now for the last What a 15-16 seconds and this one was the run that I ran just before For some time, but you can see in this case. This is the Metaflow UI This also is an open part of the open source Metaflow project You can see that the tiny step here completes like had its output where this is the actual transcription of The of the of the first audio file that it received and then this is the small model This is also like it'll also have its own output and it's kind of sort of the same output if you look at the overall DAG for this You can see that You can see that that the steps started in parallel and how much time they took and so on and so forth The DAG here actually shows that we had the start step then transcribe Then multiple of these where you had the tiny and small steps running in parallel the join and then the post join So this actually Kubernetes is a great mechanism for actually running this and Metaflow makes it super easy to be able to use these kinds of models and these kinds of Resources for the use cases like model inference in this case So that's awesome What can we do with this like this was like a cool demo, but what can we do with this? And how can we extend it further? Well that we had a fixed set of three URLs or three files in this case Imagine that this pop this list of audios that you want to actually Transcribe is actually getting populated based on some other input that you go out and look at a YouTube channel or you find like search Or for audio file somewhere and actually or look at your zoom call history and find all the zoom recordings that you have So you can scale out the number of URLs that that you are actually transcribing at any point in time You could use GPUs So in this case we saw that we could use GPUs Sorry CPUs, but you can easily use GPUs for this that amount of time it takes for transcribing Audio to text using GPUs is orders of magnitude less than what it takes for CPUs You can schedule these flows periodically so you can have these let's say run every night and you have some YouTube video that gets updated every day So you can actually transcribe videos every day or every night and then run these flows based on events That actually is the really cool part So I really want to do this where there is a zoom call We it gets recorded or Google meet call it gets recorded the moment the recording actually completes It actually triggers a flow that actually takes the video transcribes it Uses another the other like summarization machine learning models Passes the entire transcription to the summary summarization model generates a summary and sends it as a slack message to the team Saying this was the summary of the call that you just completed That would be a fun thing to do and it'll be much much more fun if you do it in metaflow So thanks a lot. That's all I had in mind Metaflow is also available. There is a slack channel open to everyone So feel free to join us on the slack channel if you have more questions about Metaflow and the company that I mentioned that we are behind Metaflow is called Outer Bounds We have a booth called a booth 0 20 so come join us and talk to us at the booth We also have an event day after tomorrow in the evening So if you want to join us for the event feel free to come talk to us at the booth Or you can scan the QR code and join us there. Thank you