 Hi, Maxime here. So first of all, I'd like to thank the Chansucarbag initiative to help us like doing these bite-sized talks. And today, Francesca Bonnet will present us how the transcript of the bite-sized talks happen. So it's a very meta bite-sized talk today. And as usual, please use the slide to your question. So no, it's up to you Fran. OK, thank you. Welcome, everyone. So I'm talking about bite-sized talk transcripts. Just very briefly what we're going to do today. So I will handle the question of our questions, why to transcribe bite-sized talks at all, and then briefly go in how we did it and what we are going to do in the future. So why are we actually going through all this pain? And the big answer really is that we want to be more inclusive. This is one of the reasons why we got funding from the Chansucarbag initiative. But of course, we also have a desire to do this. So not everyone is able to actually hear things. So if you rely on the transcripts that are automatically done by YouTube, for example, it can be very difficult to actually get the gist of what the talk is about. Even if you hear perfectly, not everyone will be able to understand English well enough to actually figure out what the talk is about. In addition, we have speakers from all over the world. So there might be accents that are a bit more difficult to follow. And so having a really good transcript will help understand these talks a lot better. There's other reasons. So one is, of course, to improve the subtitles for YouTube. But also, if you have the transcript in itself without the video, you should be able to understand it. And it will be a resource for understanding of details that are maybe not in the slides. So there will be, hopefully at least, the correct names of all the tools that are used. So you can actually look that up, and then it will be easier to search for that online. But also, once you have a text, there's a lot of things you can do with that text. You can translate the text. You can put it into some AI-based thing and have it give you a summary of the text. There is a lot of things that we might start to think in the future, and it will be text-based. And the better the information is that you give in, the better it is what you're going to get out. So where can I actually find these transcripts? It's, at the moment, a bit difficult. I admit that. So I'm going to quickly show you. What you have to do at the moment is you have to go to... Now I forgot to open it. Well, I stopped sharing for a second. And I'm going to go to post there. So I'm going to share again this one. So if you're on the website, what you have to do is you go to events, and then you can search for only bite-size here. And this will be the upcoming ones. But if you go to the past one, for example, let's go to Text Profiler and you scroll down. What you will find here is that you are now embedded here is the video, the YouTube video. And at the bottom, you will have the transcripts. You can go directly to one of those. It will show up there. And this is, at the moment, the only way how you will get the transcripts for any talk. But it will be uploaded to YouTube eventually. Then I go back to my slides. So how I'm going to do this is how did we do this? We did try to use the automated transcripts on YouTube first. It is horrible. So basically what happens is that you have a lot of these ous and ous and ums that are not removed at all. Also, you will have no punctuation whatsoever. So you have to add the capitalization after every full stop that you have in your transcript. It takes forever. It probably would have been quicker to just write it while you hear it. So that did not work. And that means Incomes a new tool, which I'm forever grateful to Matthias Zepper, who introduced me to it. It's called Whisper. And at the moment that I started this transcript, Whisper was only available as a tool as is. But from now on, you can also have a next flow pipeline for Whisper. You can find it under this link. And Whisper helped with a lot. So it does add punctuation. It does actually surprisingly recognize a lot of the tools that we're using. And it removes all the ums. It removes a lot of the double mentions. So if you're talking normally, often you just stop for thinking about something and you repeat what you have just done before. And so these double mentions, they get edited out automatically, which is super nice. So I can only recommend Whisper if you ever do transcripts of any video yourself. But even though Whisper is great, it is not perfect. I don't think any automated talk transcript ever will be perfect. So the main things that we have to do is at timestamps, we have a lot of nice sections that belong together. But of course also names, specifically names of people, but also of tools, often get not identified correctly. So you have to check and edit those. Specialized terminology is also not recognized because of my query of Whisper. And also sometimes sentences are super long. It might be ellipses or that, yeah, just someone had a thought, stopped in the thought and continued afterwards, which is totally fine if you're just listening to a person. But if you want to just read it, it's very difficult to understand. So these kind of things we have to manually change afterwards. To give you a kind of an idea. So our most favorite words that basically are part of pretty much every bite-size talk, NFCOR and XFLOWs are very commonly misspelled. And of course, where it typically gets misspelled to NFL and NF4. I don't exactly know why, but every third or so transcript I read those. And of course, you also have just some misspelling of NFCOR itself. So sometimes it does pick it up. And in very, very rare cases, you can also type it correctly. Next flow, it also has diverse ways of how it can be written. In the latest one, I had it translated to next, transcribed to next floor. But then of course, there's just some random things that don't repeat. Like illusion will be transcribed to illusion, iterations to situations. One of my favorites actually was bioinformaticians to bi-partitions. Surprisingly, bioinformaticians, which is not that uncommon a word I would say, gets transcribed a lot wrong. And you can imagine that if you have ribosomal RNA, mistranscribed to ribosomal RNA, the sentence will not actually make any sense. So the handy overall summary can also go become a handy oral summary, which would make sense, but which would change a bit the meaning. And just one other example, if you have a sentence, like these processes take a sort of bam from the samples. If you just read the sentence, I would have not guessed specifically what this would mean. Once I listened to the transcript, it turned out that it means these processes take in assorted bam from some tools. So this I think shows very clearly that manual work is necessary and that it's worth going through this and make these changes and not just rely on an automated transcript. So now we're done, right? We are up to date, everything's fine. So not quite, obviously. We have to add these transcripts to the subtitles on YouTube, which will happen in the not too far future, I hope. And also what we want to try and see if it works fine is to do translations of these YouTube transcripts that we generate now to have them in different languages, which would be super nice. And of course, bite-sized talks are not finished yet. In fact, this very bite-sized talk is going to be transcribed. So we have this kind of inception way of a bite-sized talk that is transcribed. A bite-sized talk that talks about bite-sized talk transcript is going to be transcribed. So anyway, this was all. I would like to thank Matthias for his enormously helpful tip for Whisper. And also he was writing a container, I think, for Whisper. Marcel and Christopher, who had to approve all my pull requests for the transcripts. And of course, all the other reviewers, specifically the speakers that went through this horrendous task of reading their own talks. I'm not looking forward to this. So thank you very much. Now I'm open to any questions. Of course, there's no repository and of course pipeline. I just took the... Anyway, thank you very much, everyone. Off to Maxime. Good, that was brilliant. Thank you very much. I will try to allow everyone to unmute themselves if you have questions. We haven't done that in a while. Where is this? Yeah, this is what you get when you use the template. Unnecessary things get included in the talk. Is there any question actually? Yes, Jasmin is asking, oh, has the transcript added to YouTube? Will it be visible as normal subtitles? So yeah, I did look a bit into that. So you can add your own subtitles in YouTube if you are the host of the owner of the YouTube channel. So as NFCORE, I can actually add subtitles and it will be one of the different subtitles that you can choose. I think it's going to be called... No, I don't recall how it's called. But I think it will be the default option as subtitles. Do we have one last question? Or are we good for today? I think we are good for today. So thank you again, Fran, for this... Definitely that was like a question I had about like, oh, everything was happening and stuff. So thank you very much for enlightening me like into that. And see you soon. Thank you.