 I'm Yucho Tachibana, I'm a Python developer who is involved in some web development and sometimes in machine learning, computer vision applications and sometimes the combination of all of them. So I will start this stopwatch timer. And thank you for being at this room. You know, welcome to my talk, Real-Time Browse Already Computer Vision Apps with Streamlit. So as we know nowadays, Python is very, very widely used in many fields, including computer vision, image processing and machine learning, of course. However, in this talk, our main focus is not the model development or some theoretical things around it, but we are going to focus on the video input and video output on top of such models. This layer is important as it is what we have to consider when we want to create some applications with computer vision or machine learning models, or for example, when we want to create some demos or user interfaces on top of such models. By the way, yesterday, there was a great talk about a similar topic provided by a developer from Huggingface, so please see the recorded video when it become available for some reasons why creating demos on ML models is important. So by the way, in this talk, I'm going to talk about similar topic from different perspective and in different situation, especially for video data. By the way, I think one of the most famous popular and widely used libraries for that purpose to create some video data demos is OpenCV, GUI, and video IO modules. I think especially the developers and researchers in the computer vision domain have seen the code snippets like this example many, many times where, for example, CV2.videoCapture is used to get input real-time video stream from some camera devices and CV2.IM show to display output image frames continuously in the while loop as a output real-time video stream on the local desktop GUI window like this example. So now, I'll show of hands who has used OpenCV for this kind of purpose. So to create some demos or user interfaces, oh, great, thank you very much. So some of you already know something around it, but please do not get afraid if you do not have experiences around it. This talk is, of course, also for you. Oh, oops. Anyway, we know that we can create such desktop GUI-based applications by using OpenCV or similar libraries, although, but this kind of approach has some limitations. For example, we can not easily or simply share such applications with our users because such applications, you know, desktop GUI-based applications can run only on your local environment where you already have installed necessary packages or you already have set up necessary environments. So in this talk, I would like to introduce a brand new way to create a shareable and easy to use web-based user interfaces or web applications on top of computer vision or machine learning models by using a Python web framework, Streamlit. I think this new approach can replace such conventional way of creating GUIs using OpenCV because it has some advantages or benefits. So at this point, I'm going to show the demo app that shows what's possible with this new approach. Oh, now I can use in this cloud-based version of this demo app, thanks to the great Wi-Fi condition and thank you for the tech team from your Python, thanks. And anyway, as you see, this is the web app. I'm using this app on the web front page. Oh, it stopped, right. And this app is now consuming the input real-time video stream from my local web camera like this. And on the backend server, some ML machine learning or computer vision model is running, which is, in this example, object detection. That detects me as a person like this and this object as bottles, you know, this object that I love, and I bought the souvenir from Dublin. Oh, I love this town, anyway. All right, the object detection result is rendered as bounding boxes on the output video frames. And this output, real-time output video stream is now being rendered on the web front end in real-time as a live video stream. And the object detection result metadata is also being displayed in the table of form, including class name or probability. And this app also has interactive web viewer components like slider here, through which you can change model parameter like object detection threshold interactively, even during model execution at runtime. So you see that the object detection result changes when I control the threshold by using this slider. I think this feature is so convenient for users because they can control some application values or model parameters on web user interface with live video stream, right? So as then, as I have explained in this example, web apps have some advantages or benefits compared to conventional GUI applications like OpenCV-based ones. For example, web apps are easier to share using updates, which means that web apps can be easily shared with your users just by sending the links or URLs and the users can also try out these applications just by opening the links or URLs with their web browsers. And whenever you make any changes or updates on the applications, users can access and use the latest versions of that application anytime. And web apps can be also used on smartphones because, you know, of course, smartphones have web browsers. And as we have seen in the previous example, web apps can have some interactive and modern clue-looking and user-friendly UI components that OpenCV-based applications do not have. So until now I have been explaining why this new approach to create web-based applications is preferable. And from now on, I would like to explain and demonstrate how we can create such applications and how easy it is. To do so, first of all, I'm going to introduce Streamlit. Who knows Streamlit or just heard a lot of it. Thank you, many of you already know about it. That's great. Streamlit is a Python web framework. And its unique characteristic is that we can create web applications just by writing only Python code. It does not require any front-end coding. And it provides many various predefined, ready-to-use UI components. And there are also many third party components that we can make use of as building blocks to construct rich web applications with a small amount of code. And in addition to Streamlit, we are also going to use Streamlit WebRTC package. That is an extension of Streamlit, or it can be also called as custom component of Streamlit. It enhances Streamlit to be able to deal with real-time media streams through web browsers on Streamlit applications. So with the combination of these packages, we can create various kind of web-based computer vision or machine learning applications with a small amount of code, like these examples. Since Streamlit and Streamlit WebRTC are only for UI construction and media stream I.O., we can use arbitrary kind of ML, CV models at the application backend. So from now on, I'd like to provide a step-by-step tutorial where we are going to walk through the development and deployment process of a Streamlit application with real-time video streaming capability. The first thing we have to do is to install necessary packages that, of course, include Streamlit and Streamlit WebRTC package. And in this tutorial, we are also going to use OpenCV Python package, but it's a headless version in this tutorial. Because here, in this tutorial, we are going to use OpenCV only for its image filter functions. And we do not need the GUI module from it because we are going to use Streamlit for the purpose of creating GUIs. So I selected the headless version of it. Now after installing these necessary packages, let's start coding from scratch with an empty file named Streamlit underscore app.py. So from this point, I'm going to demonstrate the coding here. In this editor, there is already empty file. It's opened with a named Streamlit WebRTC. So I'm going to write a code. For example, importing Streamlit package as SD alias. And for example, I'm going to call st.title function with some arguments. For example, my first app. By the way, in the Streamlit world, we call these functions as components. So let's say here, st.title component is used. And similarly, I'm going to also use st.markdown component with the string argument, for example, that contains some markdown content like hello your Python 2022. And then I am going to save it. Then I move to the shell and run Streamlit. Run command with an argument pointing to the input file named Streamlit underscore app.py. With this command, the Streamlit server side process spins up. And the web app is now opened on the web browser tab like this. And you see that the content of this web page is based on the source file that I have written. And each element is properly decorated according to the component that I have used. Like this is the title component and the next is the markdown component. And after that, let's make some changes on the source file like inserting emojis like this, for example. And I save it. After saving it, the Streamlit server side process detects the file change and shows these two buttons. So I'm going to click the right one. All the way is the run button. Now after that, you see that the front end page is updated to be synchronized to the source file like this. After that, whenever I make some changes on the source file and save it, the front end page is automatically hot reloaded to be synchronized to the source file. So you see that whenever I make some changes on the source file, the front end content is automatically updated. This is convenient and quick development process. This is a basic development process of Streamlit applications where all developers have to do is just write Python code. And the Streamlit command will do all the rest, including serving and hot reloading the web front end page like this. So now we have learned how developers can create Streamlit applications. So let's move to development of computer vision application. Now I've cleaned up the previous example and start writing a new code where I'm going to use Streamlit WebRTC package and also going to import WebRTC Streamer component from that package and simply use it here. And as a rule, Streamlit WebRTC package here requires a key argument as a unique identifier across this script. So please pass some arbitrary string value to this argument. Here, I pass the string literal sample simply like this. Then I saved it here. And as I said, this web front end page is now automatically updated and new element and a new component has appeared. And let's see what happens when I click start button. You see that? All right. We have successfully embedded a new component that deal with real-time video streams on our web page just by adding a single line of code. This is quick and easy. But as you see, this is a very basic version of video streaming components that does not have any image filter, I mean, any video effect. So this is kind of a boring, trivial example. So what we want to do next is to add some image filter into this video stream. So to do so, I'm going to define a callback function that accepts one input argument frame. And at this point, I will leave the implementation of this callback empty and make it simply return the input frame without any processing like this. And I'm going to also pass this callback function object to the video frame callback keyword argument of the WebRTStreamer component like this. And save it. Then I will also import the evy package. And please note here that the input argument and the return value of this callback is an instance of evy.videoFrame class. That is not a NumPy array. This is an important point around this callback. By the way, what is the evy package at the first place? This evy package is now here as importable because it's already being installed as a dependency of StreamRit WebRTC. But what's evy? This evy package is from this PyEvy library. That is a pythonic binding for FFmpeg. And FFmpeg is software to manipulate some image, not just image, like a media file or media streams like video and audio. And StreamRit WebRTC package is using FFmpeg in its internal. So that's why the reason why it's wrapper library, PyEvy appears on the interface of this callback object here. So what we have to do next is to convert this variable frame into NumPy array by using two NDRI methods with a keyword argument format specified as BGR24. That represents three-channel color image bulb in the color order BGR and eight bits for each color channel, so in total 24 bits. So I'm gonna assign the return value from this method to a new variable IMG. And here, let me show this page. And here, I'm gonna create a new instance of every dot video frame class to be returned from this callback function by using from and the array method that accepts input NumPy array and also format keyword argument specified as BGR24 like this. So now we have obtained a variable IMG that is a NumPy array. So now we are ready to implement some image filter inside this callback. So I'm gonna do that. In this tutorial, I'm going to use CV2.canny function that accepts one input argument NumPy array and it also accepts two parameters. Sorry, although I do not explain the details about this. And here, at this point, I will pass just two fixed adhoc fixed values to this parameter like this. Although I do not explain the details about CV2.canny or the Canny filter, but in short, it is kind of edge extraction filter and that is sometimes used in beginner classes of CV computer vision courses. So I selected it as a sample image filter here. After that, I also have to use CV2.convert color with color code coro gray to BGR because the return value from CV2.canny here is a single channel grayscale image blobs that has to be converted into three color image blobs by using CV2.convert color in order to be fed from any array here. Anyway, now I have implemented a simple but fully functional image filter callback here. So let's see what happens with this new code. So now I have successfully injected the CV2.canny image filter into this video stream, real time video stream running on our web page, right? This is quick and easy implementation, don't you think so? So, but it will be better if users can control some underlying parameters like these parameters that are now fixed value from the web front end controllers. So to do so, I'm going to use some stream read components again, so I'm gonna import the stream read package as it's the alias again here. I'm going to use st.slider component. This slider is the first argument threshold one and the minimum value is zero and the maximum value is, for example, 1,000 and the default value is, for example, 100. And I'm gonna assign the return value from this slider to the variable th1 and I'm gonna do the same for threshold two like this and I will pass these two variables into these canny filter parameters. I mean, I will going to simply replace the fixed values with these variables, then I'm gonna save it. So after I saved it, as I said, the front end page is now automatically hot reloaded and new sliders, two sliders have appeared on our web page, right? Then now users can change the parameters of the canny filter by using these sliders interactively. Now you see that the canny filter output changes and I control the parameters by using this slider. This is very, you know, interactive steps, interactive flow of development and also usage. So and in addition to that, what's interesting or surprising and exciting here is that we could implement a fully functional web-based computer vision application that has real-time video streaming capability and some interactive input UI widgets, only with approximately 10 or 20 lines of code. So this is easy and I think it would be a great deal to switch from the conventional way of creating GUIs using OMPU-CV to this new approach that is based on Streamlit because this Streamlit-based approach does not require additional effort or additional steps, but it provides some more advantage of the benefits. And okay, I'm going to back to the slide here. I don't know if this one. And please note again here that you can use any models of the application background, which means that you can put any code inside this callback. I mean, you can replace this simple canny filter with any models that you like, no matter how simple, no matter complicated it is. So you have freedom or you have flexibility to create any kind of web-based computer vision or machine learning models, of course, including these examples in the slide, for example, pose estimation with media pipe or some stride transfer or object detection with some deep neural networks and anything else, whatever you like. Right, now we have developed the application in our local environment, so what's next is to deploy that application to the cloud environment. Although there are various cloud services where Python runtime is available, especially for Streamlit applications, Streamlit cloud is the way to go. As its name implies, Streamlit cloud is a managed cloud service to deploy Streamlit applications that is provided by the official Streamlit team. And the deployment process to Streamlit cloud is very quick and easy, so let's see that. Here, oh, and at first, please note that I added requirements or text that represents the necessary packages to be installed as a booting a process on the cloud environment, although this list does not contain the Streamlit package that will be automatically installed in the Streamlit cloud environment by default. Anyway, after adding these necessary packages, I am going to add these files to the GitHub work tree and create a commit with these files and push that commit to the remote GitHub repository. Then I'm going to navigate to the Streamlit cloud dashboard and click the new app button here and select the GitHub repository name here and click the deploy button here. Then after a couple of seconds, a couple of minutes of the booting a process, the Streamlit cloud will start serving the application that we have developed and pushed from the cloud environment with a globally unique URL so that users can now access and use this application from anywhere only with their web browsers. Now you see that our application is now being served from the cloud environment, cloud server, and now is accessible and usable only with web browsers. Right, so now I have shown the quick deployment process demonstration to the Streamlit cloud, please note here that actually there are some additional things that you have to take care of when you actually deploy Streamlit applications that use Streamlit WebRTC package to some cloud environment including Streamlit cloud. I'm sorry that I could not explain all about it in this talk because I do not have enough time so please refer, please read the official read me of the Streamlit WebRTC package and its corresponding section that explains the remote host deployment. It does not require so much additional steps that there isn't some necessary things. Right, now we have reached to the last two slides. So my message, take away message from this talk session is please enjoy creating real-time media streaming component, no, no, no, real-time media streaming applications that can be used on web browsers, which means that applications that can be easily shared with your friends or user worldwide and it does not require so much effort, so much amount of code as we have seen in this previous demo. And, all right, so you can find my username with FIX on GitHub or some SNS and the official Streamlit online forum. So please follow me or contact me for any comments, questions, discussions, feature request or something else, whatever. And I'm so welcome. I'm open to all of such things. And, importantly, please put a star to the repository of Streamlit WebR2D package as the author of it. Me, we'd be so glad if the number of star girls, right? Thank you very much for listening to my talk. Any questions? Can you say sessions? First of all, very good presentation. Thank you. I'm just wondering how well the whole thing scales with having multiple clients connect to the same URL? That's a great point, you know. As you imagine the machine learning computation model, I mean the computationally expensive model is running in a single instance of the server, so this is not scalable. So I think it's better for you to think about this new approach to create only kind of a prototype demo or some. Even production-led application, that is only for focusing on small amount of users at single, you know, at time point. So it depends on the server capacity. Okay, makes sense. Thanks. Thank you. Hello. Thank you very much for the talk. I actually had the same question, but I also have a second one. Can you supply a video file to this as a demo as well, not only live video? Oh, thank you very much. That's what I forgot to tell. Thank you very much. You can find all of the code from my GitHub repository. Like, you know, the source code of the demo that I have demonstrated in this talk is found something like, you can, Euro, Python, something, something, something. All right, Euro, Python, Stream, WebRTC. Repository contains the code that I have written in this talk. And also I have shown some demos, for example, like Object Detection or Stytransfer or something else, that is also linked from the official Streamlit WebRTC package repository. And you can find all the links from all the examples I have demonstrated in this talk. So please check out. Thank you. So, but there's still time. So let's see some more demos. I'm not sure it'll probably work. Oh, I will, sorry, I will access it from private hub because it's some cookie issues. This is the demo that I haven't shown in this talk. This is a real-time speech-to-text-up. You know, audio data is not the main topic in this main topic or this is my talk, but please note that Streamlit WebRTC is able to deal with not only video stream, but also audio stream. And probably it'll work now loading. And hello, hello, hello, Europe. Sorry, I think this is due to my non-native English. The audio model is trained in the English native speakers, so I'm sorry. But the interesting thing about this demo is that this server-side process does not depend on any external API, like Google Speech or something like that. This server-side process is hosting the speech-to-text-to model on its own memory. And the model itself is provided by Mozilla, the great project, DeepSpeech. But you know, you can create such self-contained applications by using these kind of technologies, right? Hey, do not see that, the translated text. And actually, I'm not a professional audio field, so probably I will go to ask someone who are familiar in audio technologies, like a Spotify guy who presented a talk yesterday. Anyway, please do not see that. And let's see the other examples. Like this is a style transfer that I have shown in the other screenshot. I hope it will be working well. And anyway, there is one minute left. I think that it would be a good time to show one demo. Sorry, it doesn't work. I haven't checked it for this session. Okay, but I'm sorry for failing this last demo in this talk session, but I hope you feel some potential of this tonight.