 Welcome back everyone. Today we're going to be talking about video to OCR, basically analyzing video for optical character recognition or extracting the text from videos and then making that text searchable. We are in Surugi Linux and Surugi Linux comes with video to OCR installed, it's one of their custom tools. If we go to applications, Surugi, picture analysis, OCR, then there's video to OCR Surugi just on top there. Video to OCR uses Tesseract OCR, so if you do want to train models or try to make your results a little bit better, you will be using the same training method as Tesseract. Video to OCR doesn't have a training built into it. If you go ahead and click on that, then it'll pop up with just a command prompt because it is a command line tool, and the only options that we really have are to run the program, select the language that we want to use. The language is based on Tesseract OCR and what languages are installed. I believe you can only choose one language at a time, but I haven't confirmed that. And then the frame rate that we want to process the video at. So for example, video to OCR ENG 5 would process some video with the English language, try to extract English language text from that video, and it would create five screenshots per second. I usually do one screenshot per second with just the lowest you can go. Do what you need to do if there's a lot of quick changes in the videos that you're processing. If we just type video to OCR from the command prompt and dash H, we get the same help menu again. One of the things to observe is that video to OCR wants you to copy your video files inside the home. So do a key zero two computer vision, zero four video to OCR, zero one video directory. So instead of pointing video to OCR at a video, you put all of your videos into one directory, and then it will bulk process all of them. And there's no option just to select one video. So how do we do that? As I've spoken before in other videos, Sirugi has several custom folders set up. And one of those is the zero two computer vision, double click on it, zero four video to OCR. And then we have the zero one video folder. And inside that video folder, I've put the video file that I want to analyze the extension doesn't really matter. I think it's using ffmpeg to do the processing. So as long as ffmpeg supports your video types, then you should be okay. I'm using the MKV video type for a video that I created. So let's go ahead and see what my video looks like. I've just created a PowerPoint presentation. So if you see at the beginning here, I have LibreOffice open has some text at the top and then the file menu. The first slide is completely empty has some little white dots. I chose crazy background. So you can kind of see the limitations of OCR. The first slide is this is slide one, and then D for science rocks. So we should be able to extract that text, hopefully. And that was a very long slide, by the way. The next slide, this is slide two, criminals do bad things. That was a very quick slide. And then slide three more text on white and some Korean text. I've basically set up the background and, you know, an alternative text style to show you kind of the limitations of OCR and specifically Tesseract OCR if you're using out of the box models. So usually you want to retrain Tesseract OCR with any data sets you have to try to make them a little bit better. They can work out of the box. For example, I have another video on PDFs. It works okay, but there's a lot of situations where you won't get great results. So let's see what we can do with this video. It has a couple different problems in it. So I'm going to type video to OCR, and then the language that I want to analyze. And remember, this is whatever language Tesseract OCR supports. Just to show you real quick, if we go to applications, Surugi, picture analysis, OCR, we have the Tesseract installed languages. If I click on that, then we have two available languages installed. So I would need to install additional languages for Tesseract OCR if I want to support them in video to OCR. So then next is the number of frames I want to make per second. The lowest I can go is one frame per second. So every single second of video, I will create one screenshot. You can do more than that. If there's a lot of very quick changes, you might want to do more than one frame per second. It will take longer to process. I usually just do one unless I know that there's a lot of quick changes. And that's pretty much it. It's video to OCR. It's the language that you have installed support for Tesseract OCR. And then it's the number of frames per second that you want to process. I'm going to go ahead and hit enter. And then we have a couple different things starting. This is FFM pig. Yep. Like I said, one frame per second. And now it's just going through and processing that video. And it looks like it's probably already extracted all the screenshots because it was a very short video. So let's go ahead. Notice that there's my video inside this folder, but zero items in here. I usually just monitor this overall zero four video to OCR folder and just hit F5 to refresh. And what that will do is show you how many images it's already extracted and kind of where it is in the process just to make sure that it's still running. Since there is no progress bar in here, the only way really to get any feedback is by refreshing the page and then seeing if your image's number goes up. Now what this is doing inside the images folder, we have our extractions. So I have my Voco screen first PNG, and this is a color image. It's just a screenshot of whatever that section was in the video. And then we can see that there's another file with exactly the same file name just with a .jpg extension. If I double click on that, then it's exactly the same image just black and white. Okay, they convert to gray scale to try to detect the text a little bit better. So the process is first video to OCR is extracting all of the screenshots every whatever frame per second you specified. It will extract the images and then it will convert those images to gray scale. And then from the gray scale images, it will run Tesseract OCR to get the text out of those images. So once it's done processing the gray scale images, then you will dump a bunch of gray scale images into this folder and then start processing the optical character recognition against those. So whenever that's done, I will come back to you and you'll see how these folders have changed. Okay, so you see that the number of images inside the images folder went down and the images gray went up. And then now the images gray is actually going up. And what's happening is that the OCR output is getting put in those same folders and then they move all of them once it's completely done. You notice I didn't click anything, but this browser window opened up. That's because after we're completely finished processing, you kind of see that what's happened back here, we have one, four, two, four, three, four. So all the tasks are done. Once all of those tasks are done, then we have this index underscore whatever language we processed HTML file is created. That's our ending report. And then basically they're automatically opening it for us. And we can see the text file, which contains the text that's been extracted, a color preview of the screenshot that was created and then the actual OCR to text in here. Now you can say that there's a couple different things going on here. If I double click on this, we can see kind of a little bit of better view of it. We have, for example, untitled leave offers and press. And then we have we have the file menu like normal slides, a little bit of text in here, but probably can't make that out. We have click to add title and then click to add text. So we should be able to extract most of that text out. Let's go see what the results were. First off, slides. So slides was around here. And then untitled LibreOffice and press was at the very top. And then file edit view insert format. So that's the file menu was detected properly. And we have a bunch of random nonsense that was detected from somewhere. English, I think is probably at the bottom. Notice that it didn't, it didn't detect click to add title, and it didn't detect click to add text. Now, why is that? One big reason is that there's a kind of a lot of messy stuff around here. First off, there's a white border around that might kind of mess it up a little bit. And we have these white dots behind it that also could mess it up some. And that could actually be this weird detected text here pretty much get the same results because it's still the same slide. Unfortunately, the main text wasn't detected properly, but it looks like some other text that could be interesting to us was we had a tech scene transition where we basically started up the slides. There was nothing. So nothing was detected, nothing, nothing. And then a slide that was completely empty. And this was the longest slide, I think for 10 seconds, and no text was detected. That's good. But then we have a problem that we have our first slide. This is slide one and D for science rocks. As you can see, nothing was detected. And that's a bit of a problem. And it's because this background is a little bit too complicated for detect to detect anything. So the same reason that we didn't get a good detection on the slide up here is the same reason we're not detecting any text on here, because the rest of the program is gone, we only have that background and the text here. So whenever you have these weird complicated slides, you're going to have a bad result unless you've trained specifically on more complicated text. Tesseract OCR is basically trained on text that's mostly black text with a white background. So if you're doing something a little bit more complicated, Tesseract OCR might not work very well for you need to train up on your own specific cases. So next, we get into another example that is more like what you would expect or kind of more more of what we've already talked about. So for example, we detected this is slide two, and it detected it very cleanly. But if we look at this, this is slide two is on a white-ish background, a gray background, but very light gray, and it's a black text, and it was detected very well under it. We have white text on a light gray background again with some of those dots coming through, and we didn't detect any of it. So just again showing you some limitations of using Tesseract OCR directly without additional training. And then we have a fairly straightforward case. So we have a white background with black text and some stuff on the top. Nothing looks like nothing was really detected before this is slide three. So we have this is slide three, and then we have a scent sign, which I guess is detected as the bullet point. And then we have more text on white. So the first line here was detected properly. Then we have another scent sign, which was probably the bullet point. And then we have just random stuff. Now this random stuff corresponds to the Korean text. Notice it did not extract the Korean text properly. So if we know that there is Korean text or another language or another alphabet, I guess I should say inside the document that we are analyzing, you either have to rerun it twice or maybe just use Tesseract OCR because it does support multiple languages at the same time. Again, I'm not sure a video to OCR can take two languages at the same time, but I don't think it can. So overall, with video to OCR, there are some limitations whenever we have some more complicated text over a difficult background. It's going to be difficult to do OCR on that, especially if we haven't trained anything that has a similar problem as, for example, this slide. Tesseract OCR really works well or works pretty well whenever you have a white background, good contrast with the text. It's going to pull out data or it's going to pull out that text much easier than whenever it has these complicated ones. Does that mean we shouldn't use video to OCR? No, it's a very useful tool. Just in the fact that it makes these screenshots and presents them for us is already extremely useful because I can scan down really quickly and say, oh, here's where some text changed. And then even if it's not detected here, I can still see it and preview it. The problem is if we're trying to OCR a document and then index that and then try to search for that text, in some cases, there's some definite limitations to what's going to be detected. So for example, this white text on a light background wasn't detected. Okay, so this is a really useful tool. I definitely recommend looking at it. But at the same time, think about what the limitations could be. And if you have specific problems that are coming up over and over again, like very complicated backgrounds, then you should probably start to train Tesseract OCR on your own datasets. That way you can detect these with much more accuracy. Okay, so that's it for today. Thank you very much.