 Hello, everyone. Welcome to my talk on inference in the qubits. Thanks a lot for joining this session. First of all, who I am and why am I interested in this thing. So my name is Adrian. I'm a student learning engineer at Seldom. I joined Seldom around a year ago and my background before that is mostly computer engineering, computer software. But in between both, I took a masters in machine learning, which let me kind of learn all of these things that I just didn't know about before. You may also be wondering what is Seldom. Seldom is a company that tries to bridge the gap between training of models and production, productionization of models. So we are very focused in open source. So we collaborate heavily with several open source projects like Qflow and particularly case serving within Qflow. And even Seldom Core and a couple of other libraries that we have done are also open source. We are a very small team and we are hiring. So if you're interested in these topics, feel free to reach out to me. What are we going to see today? So we are first going to have a look at an overview of S computing and why is it interesting and why it may be interesting for machine learning inference. And we also see some problems with what is blocking us now from taking that step and how projects like qubits kind of help us to go in that direction. We are also going to have a look at how qubits works and how qubits help us. And then we're going to see how Seldom on top of qubits allows you to kind of completely bridge the gap between inference devices and running inference on S devices. In the end, we are going to see a quick demo that kind of gets all of these pieces together. And again, with that, let's get started. So first of all, why is S computing interesting for machine learning in particular for machine learning inference? What happens after we train our models? Well, we have to think that sometimes inference requires a specialized hardware. So this could be things like a specialized tips, specialized processors, like inference processing units, which are built by this company, GraphCorp, or GPUs, or GPUs, et cetera. Sometimes we also have, we also need access to sensors. You can think of any kind of phase recognition example, or you can also think of, for example, humidity sensors that may be present in IoT scenarios in farms, in an agricultural setting. Sometimes we also need access to actuators. And by actuators, I mean anything that acts on the world. So for example, this could be engines, like servo-engines, or even lights, LEDs, things like that. So how is this a specialized hardware usually used? And this is, in here, I mean, how is currently used, and before we bring in the inference on the edge component. So we have to think that usually this entire loop goes to something like this. So we've got some sort of controller, and this could be like any kind of processor. It could be a Raspberry Pi, or it could be something smaller, like an Arduino. This controller usually sends this data from the world. So essentially, you see sensors to kind of send something about its environment. With this data, usually when inference is not performed here, this data usually goes to a cloud. And by cloud here, I mean any kind of cluster. It could be an on-prem cluster, just some kind of data center, remote data center. Here we refer to the cloud. It says the data here. Inference then is performed here, and this could also be in kind of remote service. And then the result of the inference goes back to the edge environment, to our controller device, and then based on that prediction, it acts on the world. It acts on the world through the use of actuators, like again, like LED lights or servo engines, anything like that. And then it goes back again. So we have this kind of loop that segues here into the cloud and back again. Now, what are the problems with this approach? So we have to think that this data center usually is quite remote. It's quite far away from where we are. So if you think of any, for example, if you think of a form for an agricultural setting, usually the data center is going to be far away. And that means that any kind of access to this cloud to perform inference is usually going to have a high latency penalty. And not just high latency. We need to think that these devices usually, it's not weird that they have poor connectivity. The connectivity means that not just that sometimes they may just not be able to reach that cloud, but also that it's one of these requests. It's probably going to cost us money because they are probably going to be using some kind of mobile network. And just, but not least, we need to think about all the privacy concerns that relate from here. Every time we send from our ex device a request to the cloud to perform inference, we're going to be sending private data and we need to think about what type of private data we are talking about. So usually machine learning context, this type of data refers to face images for face recognition, voice samples for AI assistance, health readings, like electrocardiograms, so very, very, very personal data. Once we move that data out to the cloud, the user in this case assumes that they just lost control over that data. We didn't know what's happening with that data afterwards. We don't know that data is getting stored. We don't know that data is being analyzed. We didn't know anything about it. So, why do we want to do it this way? Why do we want to send it over to the cloud? We have to think that, in theory, we've got everything that we need on the edge side of things. So, we've got all the hardware, obviously. We've got the data because the data is coming from sensors. So, why do we want to go to the cloud at all for inference? What's stopping us from doing that? Well, one could argue that usually inference workloads can be very heavy and be very resource heavy. However, there are advances in that direction to solve that. On one hand, we've got the most famous and the most mainstream machine learning frameworks are working on that. They are releasing lightweight runtimes like TF Lite, TensorFlow Lite, to run inference on low-resource devices, iot devices. And we also have the specialized processors like IPUs, inference processing units, DPUs, TPUs, all kind of PUs that help us process this load faster. Another point that we can think about when we think about why don't we process all of this on the edge is, well, the edge environment is notoriously hard to manage. And why is it hard to manage? Well, we've got a huge range of devices when we think about edge devices and we're talking about Raspberry Pi, Arduino's, NVIDIA Jetsons, etc., etc., etc. Even within each of these families, there are tens of devices, tens of different models. Each one of them is going to come with its own architecture, with its own system, etc., etc. We usually have in edge context a large number of devices. So if you think of our factory, for example, it's going to have hundreds or thousands of edge devices. And again, sometimes we spoke connectivity. And this even gets worse when you think that it's one of them is going to be usually connected to hundreds of sensors and actuators. It's one of them with its own architecture, its own models, its own way of accessing the data, etc., etc., etc. And that's just the beginning. If we think about how can we add new nodes? How can we check the health of our edge devices? How can we serve new versions for our trade models? How can we support multiple-ameter frameworks? Because when you're bringing in machine learning, it just adds a new axis of complexity. And how to scale all of this? How do you run it? Let's say you want to run your model to serve it across two, three, four devices. How do you manage that in an efficient way? Well, it turns out that many of these problems have already been solved in the cloud. So for example, in the cloud, we've got Kubernetes, which was pretty much built with this goal in mind. So Kubernetes is built from scratch for orchestrating, for managing, for scaling workloads in the cloud. So why don't we change that just running inference on the edge setup to maybe open up access to the cloud, but just for management? So no sending any data, any personal data, no sending, without uploading any inference to the cloud. How could we do this? It would be great, because this would mean that, well, through Kubernetes, we would get access to a lot of declarative APIs that help us manage our resources. And here resources in the Kubernetes world are essentially abstractions of our services, of our notes, of our thoughts, et cetera, things that let us manage our workloads. And not just that, but it also allows you to define custom research. And essentially these custom research are abstractions that encode architectural patterns of our own. It also has advanced scaling, advanced ingress. It has a bunch of things that were built, that are built into Kubernetes and also give us access to a wide range of goodies and tools that help us manage our workloads. So these are things like Seldon Core to deploy machine learning models, starting like Helm and Argo CD to manage our resources in Kubernetes, et cetera, et cetera. Okay, sounds great. How can we do that? Well, as of today, there are, I would say, these are the three main approaches to edge. On one hand, we've got QBET, which we're going to talk about later. On the other hand, we've got K3S and fleet, which are built by a company called RUNTER. And they essentially take a different approach, essentially K3S just lets us, whereas QBET, as we will see, allows us to just attach edge devices into a very large Kubernetes cluster. K3S and fleet take the opposite approach. They just allow you to have very small Kubernetes cluster, and fleet allows you to manage all of this. These are just different approaches. Today, we are going to focus on QBET. Now, what is QBET? I've talked a bit about it already, but what it is? Well, QBET is an open source project initially started by Huawei, but which is now an incubating project within the CNCF, a cloud-native computing foundation. According to the words, QBET is an open source system extending native containerized application orchestration and device management to host at the edge. You're going to dive deeper into this to see what it means. Before doing that, just as a disclaimer, I'm not an expert in QBET. I'm just saying that because I may say some things wrong about QBET. I've just approached this as a user, so this is how I have understood that QBET works. Just sorry in advance to the QBET team for any mistakes that I think. QBET essentially allows us to manage our edge devices as regular Kubernetes nodes. And again, to going back to what I just said earlier, in Kubernetes nodes are just an abstraction. They are just places where Kubernetes is going to put load. It's going to run both services, processes, etc. QBETnet is not necessarily about what these nodes are. So once we bring in these edge devices, QBETnet just knows that it has more nodes. It doesn't need to know anything else. Once it has nodes, QBETnet is going to schedule workloads on them. Because of these transpiring, these abstraction concepts of Kubernetes, that means that these give us, in theory, almost full access to all of the Kubernetes goodies. So for example, if you can run any kind of part service deployment, any kind of Kubernetes primitive, then you are golden. Just as an example, what that means is, for example, here we can see the nodes in our cluster. And we can see how a new node called Raspberry, because we are running our nodes in Raspberry Pi. So our edge nodes has been attached to our cluster in the cloud. And my cloud here is my laptop, but this could be the cloud. And we can see that it has a new role with this edge. Now, how does this work? This is perfect, but how does it work? Well, QBET has the concept of the cloud side and the edge side. So the cloud side is going to be managed by a component called Cloudcore. Cloudcore is responsible for talking to the edge devices to tell them what to do, and also to sync back to Kubernetes any kind of update on the cluster, like a new node has been added. Now Cloudcore communicates through ports 10,000 and 10,000 and two with the edge side to set up new devices and also to sync any kind of a status change. On the edge side, what we've got is Edgecore. So Edgecore is what's going to be running on each of our edge devices, and it's going to be syncing with Cloudcore. So essentially, as an example, so let's say that you deploy a bot, like the one that we have here, you deploy a bot into an edge device. What happens under the hood is that Kubernetes is going to create a bot that is going to go into an edge device. Cloudcore is watching over those and detects that. And what it does is it says, Edgecore, well, I've got this bot with this configuration. And again, because Kubernetes is a declarative API, it allows you to do this relatively easily. And it's going to have this image, it's going to have this configuration, et cetera. Edgecore receives that. And then internally within the edge device, it spins up all the necessary token containers, et cetera, to run that workload. Between them, they talk using WebSuckets. So it's important to mention that because of this architecture, Edgecore is going to need to be installed on each one of our edge devices. So that is the management cost that we pay to use qubits, that you're going to need to make sure that all the different Edgecore nodes, edge devices have Edgecore installed. That they can talk to the cloud cluster just for two scenarios, just for two actions. One of them is for setup, for initial setup, to let the cloud know that there's a new Edge node and also for any kind of syncing. So, and just to clarify that, if let's say you lose connectivity on your Edge device, the workload there is still going to keep running. The problem is that the cloud is not going to know anything about the health in that Edge device of the different processes that are running. What else does qubits bring to the table? So, remember what I told earlier about how Kubernetes lets you define your own abstractions to encode your own architectural patterns. These are called custom resource definitions. So, qubits installs a couple of CRDs, a couple of custom resource definitions. One of them is device, the other one is device model. Essentially here device and device model represent, well device model represents an archetype of a sensor or an actuator that you can interact with, and device is a particular instantiation of that device, of that sensor or actuator. So, for example, we could have let's say a device model custom resource that specifies a particular type of proximity sensor that we're going to use in our project, and you can have two instantiations of that custom resource. So, you could have one for the front sensor and one for the back sensor. This allows you to provide a unified interface to access these sensors and these actuators, and also to check their health and everything else. However, you could argue that, well, how is it possible that it provides something so generic? Well, it also provides the concept of controllers. So, controllers in this case are custom logic, custom code, and essentially controllers are to creativity. So, if you're able to define your own, that interact with a particular sensor. So, for example, going back from the previous example, you could have, so let's say you've got a sensor or proximity sensor. You would have a sensor controller, which again, you have full control of rate because it's your code, and the sensor controller would read from the sensor. Based on this reading, it would update the state of the custom resource, which again is the instantiation of a particular sensor. And the pod, let's say a pod that we've got using these ones, these sensors, could just read that state. Now, what happens with actuators? Like this is great for reading, but what happens if we want to send actions? Well, all the internal traffic that happens between devices within the edge device uses an MQTT queue. And essentially MQTT is a protocol to manage events in usually in low resource devices at large scale. So, let's say the pod wanted to send an action. The pod could just send an event to the internal MQTT queue. And again, its edge device will have its own MQTT queue. The relevant controller could then pick up that event to act on a device instance to act on a particular actuator. This is great. And it works pretty much and out of the box. However, I did turn some quirks with qubits, and I thought it would be worth sharing in case you run into them. On one hand, we saw the Kubernetes logs generated. So, Kubernetes out of the box has some logging facilities. However, the logs generated on the edge device, by default, they don't get served back upstream to the cloud cluster. So, you don't have that good thing in Kubernetes, which is you just go into any Kubernetes cluster and it's very easy to see the logs of its pod as they get generated. Instead, they've got a workaround. I didn't dive much into it, but I know that they have a workaround. So, it's a quirk. It's not a blocker because it has a solution. It's just something to keep in mind. The same, something similar also applies to metrics. Now, also, something else that I haven't mentioned is that all the traffic inside the device, and by here, traffic, I mean like HTTP traffic between services, et cetera, is managed by a component that they developed called EdgeMesh. So, EdgeMesh is sort of a DNS service or a proxy that they developed. And I found some issues with it. So, essentially, I wasn't able to reach any Kubernetes service within the enterprise. I do think that there's a possibility that this is just not working because I'm using a local cluster, using kind of sort of like another Kubernetes distribution meant to run locally. And because it's running locally, it has some quirks of its own on how it manages the networking. So, it could be because of that. It doesn't necessarily keep its fault here. It could be kinds. And then, also, something important to mention, which is probably obvious for someone with experience in an Edge environment. You could think, well, you need to think to keep in mind at all times that the Edge environment, Edge devices usually will not have the same architecture as your laptop. So, your laptop will have x86. Edge devices may usually have ARM. And you could think, as I naively thought at first, that's fine. I've got Docker. I don't care about that. As I realize pretty soon, that's not the case. You need to build your Docker images first with an image basis compatible with ARM and also building with that architecture in mind. You can do that locally in your laptop, that's fine, but you just need to do it explicitly. Something else that I thought was, well, okay, I've got Docker images for ARM. The rest of my code is Python. And Python just runs on interpreter. So, that's fine. I don't need to change anything. We're on again. There are, in Python in particular, for example, there are a lot of dependencies that link back to native components. And they do this mainly for efficiency reasons to be faster. Now, some of them, like, for example, one thing that I found was linking to Rust code, Rust binary. And I tried to build that binary for ARM. Turns out it's just impossible. That library was unmaintained and it just didn't work in ARM. So, I just couldn't use that component. Something important to keep in mind. Now, we're going to see a very quick demo of how we can deploy a workload, a very simple workload in an enterprise using QBIT. I'm just going to switch here to my Jupyter notebook. So, what we're going to do is we're just going to add a LED to our Raspberry Pi, a green LED, and we're just going to make it blank. Something very simple. Some prerequisites. Well, you need a Kubernetes cluster in place. I've got a local cluster for that. You need QBITs installed and you need an enterprise that is already hooked to QBITs. We have added some instructions in the main repo of the talk. You can check them out. So, yeah. So, the first step would be implementing our code. Our code is pretty simple. So, essentially, we're just going to use a library called GPIO0, which allows us to interact with the GPIO pins of the Raspberry Pi. It essentially lets you hook any jumper wire and control the IO. So, for example, here it's very simple. We just say that on being with number 17, there's a LED. We're just going to turn it on and off every second forever. The next step, as I was mentioning earlier, is to build a Docker image, which does this, which does this. And for that, we just extend from the ARM, ARM Python official image. We just copy our, we start some requirements, which are essentially just a library. And then we call our script, or very simple script. We build that. And here I just wanted to mention this, we specify explicitly the platform. And we, the next thing that we will need is to specify our Kubernetes pod research that is going to deploy this workload. And this is something very simple. So, we just say that this is a pod. This is essentially a way of running a process Kubernetes. It has this name, let example. It's going to use this image. And also something worth mentioning is that we're going to need to load a file that acts in the Raspberry Pi as the interface with the GPIO board. This file is located at the GPIO mem in the native Raspberry Pi. And we're just going to mount it in our container. We also need to give security context. We need to give privileged access to access this device. So once we have got that, we're going to deploy it. And we're going to switch here to our terminal just to kind of show that now, so this is a K9S, which is so in the state of our Kubernetes cluster. And here we can see that the Raspberry node has been added as a node to our cluster, to our Kubernetes cluster. And if we look at the pod, we can see that we have now deployed a pod. And if I... And if I switch my camera, let's see if I can do... Okay, that's fine. For this example, it's very simple. It's very tiny, so it's not important. I'll switch it later for the demo. But that's fine. Cool. So we can see now that our pod is running in our Raspberry node. And the LED is blinking. Just believe me on that. We can now remove it and we can just leave the pod as we would do usually in any Kubernetes cluster. And we can see now that the pod is terminating. We know if we now go back to the slides, well, the next step now would be to actually deploy any inference workload in our cluster. And for that, we bring into the table seldom core. Now, in general, deploying models is pretty hard. You've got a lot of machine learning frameworks. It has their own quirks. You need to think about... It's not just about deploying a model. It's about how to serve it, how to analyze it, how to monitor it. It has a few things of its own that usually you wouldn't find in a regular web server. And in fact, there is this famous picture by a paper from Google about technical debt in machine learning systems. We need to think that usually machine learning code is going to be this very small speed in the middle. Everything else in our machine learning pipeline is going to be about managing the configuration, collecting the data, processing the data, extracting features. And once you have that, you can train your model. But then you need to think about how to monitor that model, how to serve it, how to scale it, et cetera, et cetera, et cetera. So seldom core is essentially a toolkit that allows you to bridge the gap at the end. So what happens after you have trained your model, how do you serve it, how do you monitor it in production? Now, seldom core is an open source project that Seldom created a while back in their own words, our own words. It's an MLOps framework to package, deploy, monitor, and manage thousands of production machine learning models. What does that mean? How does that work? Well, essentially, as another view of another way of viewing that concept, it lets you go very easily from a set of model binaries, which are going to be the snapshots of your model, your train model into a set of Kubernetes resources that expose your model. Sometimes this is just as simple as exposing a container with your model. Sometimes it's a bit more complex and requires having an inference graph. And an inference graph could have any kind of intermediate steps like processing your input, processing your output. This could be relevant, for example, if you think of an NLP setting where you may have some text incoming but the model doesn't work with text, the model works with numbers. So you need to first tokenize that text and that could be done in a preprocessing step. You could also have more advanced things like multi-arm banded routers which decide which model is behaving better, applying reinforcement learning principles, etc. All these things that it brings to the table are a set of pre-built inference servers for a subset of machine learning frameworks like Cycleron, TensorFlow, etc. It also gives you the ability to write custom ones if you want to. And a set of all these things that we usually don't think about like, well, how do you run A&B tests, how do you run how do deployments. This is particularly important in machine learning because it turns out it's very hard to compare two versions of the same model and to know which one is behaving better. And then it also integrates with us on our library. So for example, Alibi is a library and another open-source library by Seldom which focuses on explainers that help you explain your predictions, other detectors to see whenever you get data outside of your training set, i.e. data that your model hasn't seen before and other integrations for monitoring, logging, etc. So as I was mentioning that this is built on top of Kubernetes, this is cloud native. So it should be able to run it on any major cloud provider under Kubernetes clusters and also a non-brand Kubernetes clusters including OpenSync for example which is in the end a Kubernetes distribution. Now, how does it work? Well, Seldom Core essentially implements a new CRD, a new abstraction, a new Kubernetes abstraction that allows you to create a custom resource in Kubernetes that is going to specify your model's configuration. And that is what you can see here in the left side. So for example here we would create a CR custom resource of type Seldom deployment with a name which is going to be example model and a predictor. Now this predictor for example is built out of a complex inference graph that you can see here on the right-hand side. And in here you can see the different components are defined here in the graph. And you can see the Brute is going to be a transformer. The transformer is just going to process the input and transform it which is going to be named InputTransformer. And then it has two children which are a model named MyModel and then another model named Classifier. Now it's worth mentioning because in Classifier we know there's going to be a Cycler model. It's enough to just say well, implementation is going to be a Cycler server and this is linking back to the pre-built inference servers and these are where MyModel weights are stored. However it still gives you, even though it gives you this flexibility, this ease of access it also gives you enough power to say well, MyModel, I just want to run this image. I don't want to do any fancy inference server stuff I just want to run this image and that's fine it lets you do that as well. Now we can see what happens when we create this in our Classifier when we apply it and we can see that it creates if we looked into it, we would see that it creates a pod with a bunch of containers. These containers are on one hand, it's going to have a Cycler container that is injected into the pod by Seldom and we refer to this as an InputContainer. This one is responsible for, it has the logic to download the model weights that you specify in your CR for example here in model URI and make them available to your model to your model container and it does this by using Kubernetes PVCs to kind of have some kind of temporary storage for with your container your model container can access. Besides that, it would just create the different containers inside the same pod of it's a step on your inference graph and it would also inject a second container, a sidecar container which we call the orchestrator that is the one receiving the input request from the user and deciding how these should circulate alongside the inference graph. These inference servers are essentially the prepackaged ones are a set of Docker images that we have included in Seldom that essentially know how to load a model for a particular framework. So for example, if you say, well I've got a model in a file called modelAgeoflip in Google Cloud Storage or somewhere else and I can just instantiate the Seldom deployment CR which I'm just going to say that it's going to use a secular and prepackaged server and that's it. I'm good to go. Seldom will take care of that and will deploy your model. Same applies for, let's say now you've got an Exibus model you could have your waste stored somewhere you just need to point Seldom to them and to tell Seldom that this model is going to be deployed using the Exibus inference server. Out of the box, we have support for the frameworks like we have MLflow that allows you to kind of abstract between your training and give you an abstract layer so that you can in theory MLflow works with multiple frameworks and we also have other ones like TensorFlow and a few more. However you still have the ability to define a custom inference server and that's something that we will do in the demo. Other things that Seldomcore brings to the table are integrations with Prometheus for monitoring and with Rafaana to see this monitoring. Extending this monitoring even further it also gives you the ability to track complex or more advanced machine learning metrics so to do this because these are usually very heavy to compute we leverage K-native which allows us to build an asynchronous pipeline in Kubernetes that does things like add layer detection to detect any example and input data that may be outside of your training example with detection to see if your inference data is shifting away the probability distribution is shifting away from your training sets with more complex and agile detection or any kind of custom metrics and this custom metrics sorry this custom metrics could include things like accuracy etc. Now this first two we've got a set of algorithms implemented in the Alibi Detect Library which is also open source I highly encourage you to check them out it's definitely a very interesting library as you can use even outside of Seldom so these things essentially implement algorithms for detection settings for monitoring settings in machine learning Now leveraging the same K-native integration and building asynchronous pipelines Seldomcore also gives you integrations with elastic charts to keep track of your inference data and it also links with the second library which is called AlibiExplain which essentially allows you to explain the predictions that your model is making Now as you would have imagined these things can all be configured through the same Seldom deployment CR so essentially this abstraction doesn't give you just value doesn't give you just the ability to deploy models but also to configure all these kind of things that are usually second day after thoughts that are important in machine learning production setting all the things that it allows you to do is deploying models in a comparison setting so usually machine learning as I mentioned earlier it's usually very hard to say if a model is behaving better mainly because it has this stochastic component to it usually and because the inference training inside the inference data can be very different from what we have in the training set so it's hard to know it may be better with the training set or with the test set but you never know if it's going to be better with the inference set unless you try it in production because of this Seldom core has built-in support for advanced deployment models for example AP tests or sub-deployments so essentially here we can see how that's done again through the same CR so through the same abstraction you would have again a Seldom deployment resource which in this case is going to be named wines classifier because it's done from a different example that classifies predicts the quality of wine and in here we're going to have two predictors before we had one now we're going to have two the first one we're going to say both of them are going to be MLflow servers these are models trained with MLflow so the first one we're just going to point it to a GRI in a folder called Modelay in Google Class for it which is going to use the MLflow inference server and we're going to send 50% of the traffic to it the second one we're just going to point it to a second folder in our remote storage which is going to be Model B and we're going to say that this implementation type is MLflow server and we again send this 50% of the traffic you can check this example here by the way on this link similar as these you can also have sub-deployments which are essentially a way of just having these two models but sending all the traffic to both without sending any response back from one of them essentially just deployed in the shadow right so with this we have seen what the long-query is we have seen how QBETS allows us to run Kubernetes workloads and edge devices so now what we have to see that is can we run both and can we combine both to actually run inference workloads in edge devices and that's what we're going to see now in the demo now before going into it just want to kind of highlight what we're going to do is we're going to build a face mask detector with essentially going back to the loop like the inference IOT loop that we saw earlier it's going to have a Raspberry Pi it's going to be doing all the work the Raspberry Pi is going to take a snapshot from a camera touch to it a Raspberry Pi camera so this camera is going to be sensing data from this sensor it's going to be our camera it's going to run inference inside the device using Seldom Core to kind of run a model that checks if we've got a face mask put on or not and based on the output of this inference it's going to turn on or turn off red and green leds essentially if it doesn't detect anyone it's going to turn off both leds and if it detects someone with a mask it's going to turn off the green led if it doesn't it's going to turn if it detects someone without a mask it's going to turn on the red led just kind of a semaphore without the yellow now we're going to deploy these workloads into the Raspberry Pi using QBET so on the other side on the cloud side we're going to have a QBETnet disk cluster which in this case is going to be my laptop but it could be in the cloud which is going to be running QBET so which is going to be management we're just going to be interacting with this QBETnet disk cluster and QBET is the one that is going to be scheduling these workloads into the Raspberry Pi because of the low resources of the Raspberry Pi we are going to use TfLight TensorFlow Lite to run this inference to run this workload now as you imagine I haven't mentioned that Selencore supports TfLight so we're just going to build an inference server for TensorFlow Lite so essentially another view of how the example is to see what kind of workload we're going to be running on our Red Devise we are going to be loading we're going to deploy into our Red Devise through QBETs an instance of our Selen deployment custom resource which we're going to call facemaskdetector which is going to point to our custom TensorFlow Lite inference server and it's going to load from a repo which is actually someone that trained a facemask model here I just need to give a massive shout out to them this is a company called izoo.tech which has open source of repo with facemask detector models and we're just going to use one of them so with this we're going to be able to run the model in our Red Devise the second step is going to be a second bot that is essentially going to implement this IOT loop so it's essentially going to be reading from the camera it's going to be sending that data to our model which is also deployed in the Red Devise it's going to get that prediction out and based on that prediction it's going to turn on or turn off set of lets again to reemphasize all of this workload all of the inference is going to happen on the Red Devise which was kind of the goal at the beginning before going into the demo this model hasn't probably been calibrated so it may contain biases it's definitely not safe to use with a further assessment and I say this because ML is definitely quite super powerful however we always have to remember that with great power comes great responsibility and it's our responsibility to use machine learning thoroughly and ethically in general we need to think about standards we need to think about principles that we need to follow this is quite a complex topic that we're not going to get into this session but just something worth keeping in mind without further addition let's just go to the demo I'm going to switch back to my Jupyter notebook and we can see here the schematic for example this is very simple it's just an extension of the previous one so we're just going to have a camera attached to it and we're going to add a new LED which is going to be a red LED now the pre-requirements are the same the only difference is that we're going to need to install Seldom Core in our cluster and for that you can follow the Seldom Core documentation in here we're just going to assume that we've got that installed already now the first step as we were saying earlier is going to be to develop an inference server with TensorFlow Lite how do we do this it's fairly simple we just need to extend the Seldom component interface of Seldom Core so I think here we just want to override two key methods so one of them is load and the second one is predict so first of all if you look at load essentially load is responsible for giving a set of weights it's going to be sent by from the Asteroid Tunitializer container loading our model and loading any kind of logic so for example here we would just load a TS Lite model and we will also read it from the model what is the input index where we need to put the data and where is the output index the output tensor where we're going to read the data from besides this we would need to implement the predict method which as you can see is fairly straightforward so first of all we would set our data in the input tensor and then we would just call the model and just get the output out of it this is how TS Lite works now the next step is going to be to containerize our model so for that we just need to well we just need to extend our base we just need to install a couple of dependencies that are needed by some of our Python depths and that's it the command then would just be spinning up the Seldom Core microservice using our model our new model so we would do that, we would build our image again building it in a compatible manner with IRM and to add it to Seldom Core to configure it to Seldom Core and again this step is not necessary there are ways to just run the image however it's a bit cleaner that's why I decided to include it in the demo if we assume that Seldom Core has been installed using Helm we can just update the installed configuration parameters through Helm as well so essentially we can just say that well I'm going to have a new predictor server which is going to be called TS Lite server and you can find it here and that's it we are done so what I'm mentioning is as I mentioned earlier we have a couple of sidecar containers one of them is going to initialize the model weights it's going to download the model weights that container obviously is not compatible with IRM so we're going to need to build a new one compatible with IRM I've done that already in the background you can find that one here so again to configure it and to Seldom to use that it's enough to just say well the storage initializer is just going to use this image and to modify the Helm installation parameters and that's it now to download it it's enough to define Seldom deployment registers we're going to name face mask detector that is going to use the implementation that we just created TS Lite server and it's going to download the model weights from here which is the isoo.tech repo where these models have been stored and last but not least we're going to specify that this model has to be deployed to the node with name raspberry there are cleaner ways to do this in keyword messages you can set tolerations and affinities and taints I just took the simple way simple words now we can just deploy this and if we set switch to our terminal we can see here how we should be able to see here how our Seldom deployment is being created and if we look at the bots we can see how the modeling utilizer is currently download in the weights and the model is getting spinning up with the image that we just created the next step is going to be implementing the camera reader which is going to be the one with its iot loop now for that we are just going to use the same library dpi00 in a library called buy camera which allows us to access the camera the main steps the main blocks from this to be honest are just capturing we are just going to have loop continuously capturing the camera we are going to run inference on its frame and we are going to add the delets based on its frame at run inference here in this context will just mean sending a prediction to our model which is also deployed in the same device so we just need to populate the right payload we send it to the send point that is going to be exposed automatically by Seldom core and we read the response and we check with classes based on which classes with high confidence have been predicted by the model like is it someone is there someone without a mask or I don't know and based on that we are just going to update our lets now I don't think if I am going to have enough time to do it but if we just continue through it we would just build the docker image with the same structures we would build that and then we would deploy it on the slides I mentioned that it was going to be in a separate pod instead of that I am just going to run it in a separate container so we would just have this camera reader container that is going to be deployed alongside our model in the same pod this is something that Seldom allows you to do the main difference is that it is going to have access to the camera libraries so we would just do that and we would deploy that now if we see our terminal we can see that now it should be trying to update our model deployment so here is going to be first of all the model in the initializer container is going to run it is going to download our model weights and then it is going to spin up the model and the camera reader now these pods are still starting once they get started Seldom will tell Kubernetes to switch down this pod that we initially deployed so essentially now here we are updating the pod running in our S device purely through Kubernetes instructions I think I am not going to have time to show you the result but essentially the camera gets turned on and the LED gets switched on to ring or red I will make sure I have made a video by the time the talk is displayed on the repo now that is pretty much it thanks again for joining the session, please fire any questions that you may have I will be trying to answer the questions at the same time where you are showing the video and again just to re-emphasize we are hiring in Seldom so please give me a shout if you want to get more information about that thanks a lot for joining this session