 Hello everyone and welcome to another episode of Code Emporium as we continue our journey through the wonderful world of building a translator using a transformer neural network. Now in the last dozen videos we have built the transformer neural network from scratch and here is the code to prove so. So in this video though I wanted to actually go through some of the inference translators that we get after actually training this model and I also want to offer some insights of what you can do to build your own translator. If you want to see this entire code and with some of the data sets that I've used I've uploaded everything to GitHub and the link is in the description below. Before I get started with this video I wanted to give a quick shout out to this user account Tiger07 where they helped point out a specific issue where in my last training videos for building out the transformer I kind of made some errors. They pointed out like two lines of code that I kind of needed to change throughout the network and this is the line itself. It was just reshaping up these tensors because I think it was PyTorch when I tried to do I was just doing a values like some reshape which would have completely discombobulated all my tensors. I needed to do a permutation of some layers. It's a very minor technical issue but it was a major one in the sense that it stalled me for a very long time and I'm really grateful that you know when I reached out to the community you all responded very well. I also wanted to give a shout out to this account PingNG, sorry for mispronouncing that if anything, who recommended the same exact solution but also detailed why that was the case. Thank you so much as well and that just shows how well this community does respond because even Slack nor ChatGPT could really help me out but you all did so thank you so much. For now let's just go to the part where we're actually instantiating the transformer and the only thing I really want to point out here is that we are going to use only one encoder and one decoder layer so this is going to be the simplest of the simplest transformer neural networks so that it could just train this faster to see some reasonable results. That's the only reason I did this and I did this training for about one and a half hours where I trained it for 10 epochs where I'll scroll here you see that's 10 epochs with the data set of around 200,000 English sentences to translate into a language called Canada and all of this training epoch I've printed out 100 epochs at a time. For now let's get to the good part of transformer inference so this here is a character language model where we're generating one character at a time and so you'll see everything is going to be within a for loop where I am generating one character at a time you generate the token and append it to the sentence and just keep doing this until you generate an end token which will signify the end of a sentence. On doing so you'll get a few examples so let's just look at a few of them. When I say to try to translate what should we do when the day starts the translation it gives is this sentence over here which says which is what do we do about this although it doesn't translate exactly to you know the sentence at least we can kind of see that there's some commonalities that are retained like this last two words over here and these last two words over here it is retained in some way so you know Marabeku that's these two words that just means what should we do and that just corresponds to this part of the sentence so it kind of gets a part of the sentence right but very clearly it distorts the entire meaning so it's not quite getting everything correctly you can attribute this to a few cases so the first thing is like the model is just too simple it only has like one encoder and one decoder and if you increase the number of encoder decoder components you probably might be able to pick up on more idiosyncrasies in language that's probably the biggest reason but you can also try increasing your training set or increasing the number of epochs for your training time before continuing with the video I want to tell you about our sponsor tarot this is a social platform to help software engineers grow in their career so say you land a software job but then what it could be really hard to navigate your career and it's tough to get good career advice tarot facilitates these discussions whether you are an entry level or a senior you can be a part of discussions to get advice from software engineers across many companies there are many non-technical questions that I wish I could have asked someone in the past to advance my career but really never found a good forum to do so but I think tarot is that good place I'm a machine learning engineer which does overlap with software engineers and while the platform does not have too many machine learning engineering questions at the moment I'm doing my best to answer any questions that are there still whenever I can and I think this community is really nice to be a part of still so if you're looking for a premium community of software engineers to be a part of consider signing up for tarot using my link in the description to get 20% off your annual purchase thanks for listening and let's get back to the video now the second sentence is how is this the truth the actual translation is whereas this here is the generated translation which is so this is not really a meaningful sentence but you can see that there are some commonalities between them again so you could see this is generated as well so it means like this is how is truth so in it translates to how is this the truth so it does generate some part of it and you can see just by looking at these two examples you can kind of see that it is definitely learning something although it might not be complex enough to pick up everything about these sentences now this example is my name is a J which should translate to Nanna Hesuru Ajay but it translates to Nanu Kutumbada Hesuru so you can see again there are some commonalities with Hesuru which means name Nanna is my but Nanu is it's very close to it but it means me it did not pick up at all on my name at all Ajay also although the overall translation is off once again we see some words that are actually common and correct now this one's interesting why do we care about this the actual translation that it gave was Eke Karunaheno with punctuation it'll be like why what's the reason that would be the actual translation of this which is actually very close to the to the translation of this initial sentence over here so not bad it did pretty good there the next is this is the best thing ever whereas you know here it dread it generates this sentence but actual translation is this sentence which translates to this is very unusual so though the the meanings are kind of off you can see that there are just some some commonalities between them again now this is probably the most interesting example throughout the lot where I wanted to translate I am here the actual translation is Nanu Illi Dene the projected translation by the translator was Nanu Keili Dene so there are different meanings this me this means I am here whereas this means I've heard something and although there are different meanings though you can see that from a character by character generation that this translator is performing it's actually doing extremely well the only thing in the translator's eyes that it got wrong was this E and this K like these two alphabets are like the only thing that are different in this entire translation barring some small like love this they're both loves but either way just one or two characters are the only things that are wrong so in the translator's eyes this translation is actually a pretty good one but this kind of made me think more about the fact that okay this is a character translation but in general word translations might actually perform better but the the caveat of using word translations is that your transformer will need to have a much larger vocabulary as opposed to what it has now so you can see if I scroll up to see like what is the length of any possible characters that are possibly generated or tokens that are generated you can see it's only a small set of values here maybe like you know maybe a hundred fifty to a hundred tokens or something like that but if it was words then all possible words were in this list this entire list would explode to the tens of thousands because there can be so many words that need to be generated so there's always this trade-off between larger vocabulary size but interpretable values so you need like much more complex systems like you need probably a more complex translator and also way more parameters in order to account for you know words themselves but with words though the sentence length will technically decrease while here I have provided like the maximum sentence length to be 200 characters the number of words in a sentence doesn't have to be like 200 character 200 words it could be just you know a dozen or something like that that we can cap it out here's an interesting one too it says click this and this would be the translation which is either new click Marty which is like this but the actual translation is can click click click click click click click click Marty so although it does get this last part right click Marty and it does get it here it just loves click click click click click click click so it's just funny but it is once again understanding what the task is at least to an extent now the same thing is here where is the mall the translation is well the translation that it generated was yelling yelling which is where where where so at least you got the where part is but it didn't generate anything else now what should we do the translation is you know Marbeco now it generate this correct but it absolutely fumbles on this one here today what should we do this I have no idea why it generates this it says I don't know Marida a melee Marida Marida so it just loves Marida Marty is that is to do and I guess that's like a very common phrase that you see everywhere in both English and also in this language Canada and that's why you're seeing like all kinds of forms when you see like oh do it probably tries to do Marty you like every single everywhere it just tries to to create this scenario which again very interesting but it completely fumbles despite you know in English when we see this it's kind of like the very same sentence as what we did before just an extra word so that's just it's an interesting note but if I phrase it as why did they do this this sentence actually generated almost perfectly well but again this is something to do with doing something so it's a very common common sentence in general or a common phrase so that's probably why it did so well this last part here is also a very interesting one it's what it's the word on the street and the generated translation is either a bug game so what is the topic of this or what is this about is the translation which kind of does semantically relate to to what this actually means this little idiom here let's now go through some insights where I'm probably gonna give you some information and some tips when building out a transformer on your own with any language so first of all I wanted to say create a translator with a language that you understand ideally because it's just so much easier to see where the transformer is doing things correctly and where it's also doing things incorrectly so generating that insight for yourself I think is very important and you can better do so if you understand the language itself so in this case people were saying why did you use Canada it's because I can understand and I can properly evaluate it otherwise I wouldn't be able to come up with the insights that I did piggybacking off of that here's a I think I'm pretty important insight that I haven't really seen anywhere but I'll describe here so when training typically the English character set is known as what we call an alphabet where every character kind of has a phonetic representation to it in a language like Canada it's more of an alpha syllabary so you have individual units to actually be complete like syllables themselves and in doing so that means that even though like for example this word I think I typed it out here so this is a character this is ma ma if you write it out in English should be m a with like you know like an accent on top of it that would be so it's like multiple characters in English but it's a single character in the Canada language however when I was dealing with tokens here the way that I'm tokenizing the data is I'm also treating it as like multiple tokens so the ma plus up even though in the current language it's actually supposed to be one character I am treating it as two characters and so what semantically just makes more sense is to create a tokenizer that will not just divvy up the entire you know Canada like word into very sub characters but rather divvy the Canada word into actual Canada characters themselves which may or may not be a combination of two or more of these characters also alpha syllabaries are a type of script that are not confined to just this language Canada there are many alpha syllabaries out there and so just understanding the writing style may actually create a translator that is more meaningful and so I highly recommend you try this out now another the another insight that I mentioned is kind of similar to what I described before this is a generating one character at a time so it's a smaller vocabulary but longer sentence length but you can play with generating like word at a time where you'll have a much larger vocabulary but smaller sentence length and a good mix of both worlds is to use something called byte pair tokenizations or byte pair encodings which are like subwords now the issue with this is that it's very hard to create a byte pair encoding for certain languages if they don't you know if they're not really don't have a great researcher online presence so it's hard for me to find one for the language Canada and hence I went with character tokenizations for now to illustrate concepts and ideas but if you're able to create like byte pair encodings for your languages input and outputs then I think that might be like a good starting point in fact I think this is exactly what's happening in the main paper and a lot of other research associated with generative models these days another one is to make sure that your training set has a large variety of words in general you could see that above when I illustrated here there's a lot of sentences that are like to do right Marty is there's like a lot of these sentences that that kind of go Marbeco and Marty over here in fact if you look at this data set there's actually 10,000 cases at least of like there's like millions of records here but there's like 10,000 cases at the very least where we just have this entire word called Marbeco which is I have to do and that's a very common phrase so I would suggest you try to plot out every single word and their frequency counts just to get an idea of what kind of data set you're dealing with whether it's very catered to like news government articles politics or if it's catered to just like general and random sentences which ideally would be the case for general translators and the other one is just more technical where you're increasing the number of encoder and decoder units as I've only used one keeping it very simple but you can ideally try with more encoder decoder units to pick up more complexities and intricacies in your languages overall yeah this model is definitely learned something and you can use it for you know other languages instead of Canada as well so I still hope that this all of this which is going to be available on github and all of the videos before this illustrate in general the concept of how transformers work but if you want actually I'm gonna be thinking about making a full video just end to end from start to finish of explaining transformer neural networks if you want to see that mega video please do comment below to see oh my gosh I want to see that mega video which will be very similar to like these 12 videos but probably you know in a much more continuous flow if I can make it to do so thank you all so much for watching and we're going to be continuing our wonderful journey through the world and landscape of artificial intelligence thank you all so much and I'll see you next time