 Yeah, it seems to be a song. models which they're released in open source. So open AI is now being sometimes people say it as closed AI also for fun, but then they release those in open source and basically they release the model weights and the corresponding inference code in September 21, 2022. So what is whisper basically? I asked Bing GPT this question and it said this whisper is a computer program which can listen to people talking and write down what they see. So what that basically means is like whisper is a speech to text system and it is an ASR system and it is called in technical terms people call it as automatic speech recognition system. And another thing which it said Bing GPT said it was this per can understand people speaking different languages and can even translate what they're saying to English. So what that means is like whisper is not just trained only on English or something like that. It is specifically trained on almost 99 languages. It is a robust ASR model and it is not just only for ASR that is the even funny part. It can do translation, language identification and a few more tasks. You can check the whisper paper to know more. So these are the whisper models they released in the open source. And let's look into some of the features of this book. This book performs phenomenally well in English speech recognition. So this is a picture from their paper and you can see that they have compared company A company B and all because you know they can't reveal the companies when they do when they do it in the paper and also NVIDIA STT was like the state of the art system previously before whisper in the open source world as well. So you can see that it is performing phenomenally well in English. Another thing is like whisper is trained on 99 languages. But even though if it is trained on 99 languages, I won't say it is performing well on all the 99 languages. Probably it is working well in 57 languages at the moment because that is what they in the open AI whisper which API which they have released they just support 57 languages. Another thing is like that they have released this in open source, right? So once they release this in open source, one awesome developer Georgi Gagano, he created an open source project called whisper dot CPP. It supports a lot of platforms. So now whisper, which was a very large model can be running even small devices like Raspberry Pi and lot more devices at the moment. And this has a lot of amazing community plugins because it was released in the open source. So I'm not going to explain all of this in the top. So since I'm assuming who will know what is fine tuning in the audience. Okay, so this is a relatively new audience. So I'll just try to explain a bit more in depth. So if you are an open source, one of the best courses to start learning AI is what I what I always recommend is called fast AI course. And this is what Jeremy Howard usually teaches. So let's say we have a large model. So most of the time we have some large model like large pre trained model which is being released in the open source world by companies like Google and stuff like that. And then basically if you want to work on your smaller subsection of data to get good results, you collect the data and corresponding to that you train it on your particular data. So if you look at in this example, so we had a language model which is trained on wiki text 103, which is like a text data set. IMDB is a smaller data set, which we want to perform really well here. So we collect the data in the IMDB and train a language model for it. And now if you want to like it was a language model, but if you want to work very well for classification task or something like that, we can do it. So that is basically what fine tuning means. And I just wanted to say that fine tuning is a new training. Because, you know, at the end of the day, not a lot of people like me or something, we are very not very rich or something. So we can't afford to train models and open AI scale or Google scale or something like that. So what we can do is fine tuning only most of the time. And in this fine tuning, it is a very common practice for a long time in image classification, especially in computer vision domain. But now it is getting more and more relevant in audio domain. It has been more and more relevant in the NLP world. Even it is getting more and more relevant in the chat GPT in these big LLM models as well in the future. That is what I see. So you may be thinking, why should we fine tune whisper? So you should only fine tune whisper if you are not getting good results with the whisper model weights which are released by the community. So if you are already getting good results with the open source whisper model weights, no need to fine tune. But if you are getting really bad results in your particular task, you need to do fine tuning. And sometimes it may improve the performance. So how to do fine tuning in whisper? So I won't be covering it, but basically it is well covered in this fantastic article released by Sanjit Gandhi of the Hugging Face team. And he has basically explained the steps for doing it as well. I won't be covering it in my top. Once this particular article was released, there was an event which was organized by Hugging Face team. So Hugging Face team organized this event called whisper event. And it was an event to achieve state of the art results in languages, low resource languages like in Malayalam or small, small languages. There are a lot of languages where like whisper is now performing really bad. So it was a event which was conducted to get, you know, to train some new models. And they even gave some GPU credits as well. Almost 100 hours of GPU credits was being sponsored by a company called Lambda Labs as well. So, yeah, it was great. And for Malayalam, fortunately, we got a lot of good models. So because of that, these many models were being released in Malayalam. And the winning models in whisper event was Tennel Whisper Medium ML, as well as for Fluors, which is another dataset. It was Param Bharat's Whisper Small ML model. But personally, I was not really convinced with the results of the whisper event. The reason was because in Malaya, achieving 10 percentage word error rate is, wow, if it is ever happening, we can use it in production. So I wanted to check this hypothesis. Okay, can we get this particular results? And we don't have much yardsticks as well. Like no one has done such a benchmarking exercise. So no one doesn't know, okay, for this particular dataset, what is the result at the moment or something like that. And another thing is like Malayalam, unlike English or something like that is a highly morphological complex language. In this link, I have linked the research paper, which basically mentions that. So even achieving 30 percentage word error rate or something like that is a big thing. And another thing was because I'll just show you here also. So the way hugging face evaluated the model was something which deeply, you know, it felt wrong to me because in the model cart, someone basically put the word error rate. It is like they had in the step they automatically generate and calculate a word error rate based on the validation dataset you have used. But if you look here, I think I'm not able to show that commit. So basically, this is the 10 L Whisper Medium model. And if you look here, the word error rate was 38.6207. While in the leaderboard, you see his results was 11.49. Anyone can edit a read me and he could have got good results. That is, you know, it's something deeply suspicious for me. So I thought of building something new. So I created a new project, get a project for doing Malayalam as a benchmarking. And when I just tweeted about, you know, I have been working and then I just said, I just benchmarked first on a very small dataset called common voice dataset. And Kabir Manohar, who is a speech researcher who has been working in Malayalam for the, who has been doing her PhD for five years, said, you know, no one has done this before Korean. Thank you. Because there is no one, there are so many ASA papers in Malayalam, but no one has ever tried to do a benchmarking so that to validate stuff here. So, so at the moment in this talk, I'll be trying to present the results of benchmarking in Malayalam. So we had these models and we compared it with the six model versions released by OpenAI. So this is how my model results would have been with my benchmarking tool. So if you look, I calculated the model name, the word error rate, the CER, the model size and the time it took to for the model to run in that particular dataset. So obviously smaller dataset will run, smaller model versions will run faster while larger models will obviously take time. And these are the results. OpenAI is Bismar models, if you look at the results, the word error rate, which I forgot to explain, which is a metric for calculation similar to accuracy in ASR world. In a word error rate basically means like we always calculate word error rate as 100 minus word error rate is equal to accuracy. That is the formula for accuracy. In Malayalam, you see OpenAI models, it is like 0 percentage minus 50 percentage and stuff like that. So tunnel model achieved good performance here. It got 11 percentage results. And for common voice dataset also, for another metric character error rate, which is equivalent to word error rate, some slightly different, you can see that from 180 percentage to 5 percentage. And another dataset which I benchmarked was Malayalam speech coppers dataset. So if you look, this is the model names and the same percentage. And in Malayalam speech coppers data also from 139 percentage to 2 percent with fine tuning. Same here, 177, 200 percentage to 1 percent. So, I have almost reached the end of my talk. I just have proved that fine tuning in Malayalam can achieve great results. So if such a model really exists and if we want to do more and more benchmarking obviously, this is just early start of this project and I have just started working on it maybe two months or three months back, just thought about working on this. You know, we got very good results in Malayalam. Previous state of the art result in with other methods as well, not just with whisper is almost the word error rate was 80 percentage in Malayalam speech coppers dataset that was the only thing I could find when I try to research something. It is very hard because there are not a lot of researchers who are working in small languages like Malayalam. So, and since we got phenomenal results in Malayalam, if you are working in a low resource language, if you are a speaker of a small language or some other language, you can also get amazing results by fine tuning. I think that's all and I want to thank all these people for doing phenomenal work and helping me out. And finally, I want to end my talk by offering tributes to Reep Jamal. That's all. Thank you.