 Hi, everyone. Welcome to our talk of machine learning at the Edge Cloud. I'm Vivek Hariran. I'm a machine learning engineer by profession at a top tech company. I'm Prakash Ramchandran, and I'm an interop working group chair and a telco cloud specialist. And let's get started. To start with, we want to say that the views expressed in this presentation are definitely our own, and doesn't necessarily represent our respective companies. So what is machine learning? The AI pioneer Arthur Samuel coined the term as the field of study that gives computers the ability to learn without being explicitly programmed. So some classic machine learning use cases that a lot of us would have experienced is like spam detection, character identification and recognition line of credit approval or not. Some of the banks use this and recommendations. I think a lot of us use Amazon. We probably interacted with one of these. So how does the computer do it? To illustrate or to explain this, I'm going to go after a simple example. Let's say that we're we give the computer a bunch of pictures of animals. We ask it to find all the ducks from this group. So there are two ways for us to teach the computer to actually learn this particular tasks. So in machine learning, the classes are supervised learning and unsupervised learning. For the purpose of the stock, we're only going to focus on the supervised learning part. So for supervised learning, we actually need to give the data set, as well as a set of labels. So when I say data set, it basically means a picture for data point. And then a set of features which are typically human defined. So in this case, I'm saying that the first image actually has two feet, has feathers and has big those are the features. And then a label of some sort. So we're saying that we're letting the computer know that the first image is a duck and the second is not and so on. So once we've given the computer features and label, we can actually feed all of this into a few algorithms that will actually find the relationship between the feature and the label. So the goal of these algorithms are typically to try to minimize the mistakes that they make. So they have to find a way to combine the features to correctly identify whether the first image is a duck or not. Some of the classic machine learning algorithms are like logistic regression, which actually allows the computer to find linear relationships between the features and the label. Decision trees, which makes use of the classic binary tree or tree structure data structure to determine if which the which feature actually leads to one class or the other. And a simple neural network which actually allows for nonlinear relationships between the features and the output by adding a hidden layer in between the two. So in supervised learning the step that we just covered is called the training step. And that is equivalent to actually writing a program or writing the piece of code. The output of the training step is typically called a model. And that model can be used in the future to determine whether a given animal is a duck or not. So the second step typically is the evaluation step, which is equivalent of actually deploying your piece of code into production. So in the case of machine learning use cases, the model is wrapped with your web app code and deployed into production. Recently with the advancement of like higher processing power with the use of GPUs and large volumes of data. There's been a huge leap in neural networks. I think a lot of you might have heard it, and that's deep learning. So the major advantage of deep learning is it takes care of the feature extraction part for you. So classic machine learning actually allowed or actually needed the data scientists to actually define a bunch of features by hand by doing analysis of the data. Deep learning sort of also automates that it's a lot harder to interpret what the model is doing, but it much closer mimics to how a brain acts how a human brain acts. So in our, in going back to our training example, instead of give defining features, all we had to do was give the picture and give the label. And the deep learning model automatically determines which set of features are useful for it to predict if it's a duck or not. So the algorithm is has become so popular that it has led to a rise in popularity in both machine learning and deep learning to sort of illustrate the point of how good deep learning really is. I'm going to go over an object detection example. In some research, there's a competition called large scale visual recognition challenge where they make use of a corpus of images with labels called image net to actually identify to help identify all the objects in a particular image. So the use cases on the way. Before 2012, the classic approaches were actually handmade features run through a classic models post that deep neural networks are deep learning sort of took over. So what we see here is a classification error, which is the percentage of times it makes a mistake the model makes a mistake. And as you can see from 2012 when deep learning sort of kicked off in the research fields, we see a drastic reduction in that errors. So some of the winning model architectures are basically Alex net which led to 60 million parameters and Google the net, which I think is close to like 5 million parameters. So each of these boxes actually denotes a function a mathematical function. And combining these mathematical functions allows the model to actually learn complex features combinations that will make it that will help it be as correct as possible in this task. So with this rise as well as more compute with this rise as well as with more computing capabilities, we're leading to more modern machine learning use cases. So each of our phones have smart assistance. We're coming up with algorithms that actually predict if a particular person is wearing a mask or not in public warehouse automation is rapidly becoming a thing with everybody moving to the market e shopping and ordering from online warehouses and then autonomous driving is also slowly becoming a thing if reality. So let's talk the infrastructure required for for this. So typically to actually typically to create some of these models or machine learning ideas we need to have a cloud structure of some sort. And it requires a lot of computing power. So most companies either have a private cloud of their own, or make use of a public cloud service like Azure or Amazon web services. So in the case of AWS and the public environment of public cloud environment training typically happens in high compute nodes or clusters that are closer to the data. Amazon provides a service called EMR, which actually is Hadoop. And it allows the data access from a lot of different sources, where both the input data as well as the labels typically exist. The evaluation environment is typically a container based deployment in service. So Amazon allows you to register containers using the service called elastic container registry. And then you can deploy those containers into compute nodes called EC2 nodes. So to go to the next section of our talk, we just want to focus on two particular use cases. So on the left is warehouse automation and on the right is autonomous driving. And for both of these use cases we need an object detection model of some sort. So on the left, like the warehouse detection, you need an object detection model that detects boxes, other forklifts and other objects in the warehouse. And for autonomous driving, you need the model to actually detect lanes, other cars, traffic signs, pedestrians and so on. So if let's say we use one of these state of the art models, which is a deep learning complex architecture model. The infrastructure required to train something like this would still be in a cloud source, resource for sure, with the additional requirement of having GPUs. But the evaluation part of it is somewhere different. And for this part of the talk, I'm going to hand it off to Mr. Prakash to talk about that. So we understood the fundamentals of machine learning here with use cases for the factory automation, factory warehouse and the other one related to autonomous vehicles. So what you see on this is a diagram which indicates location of the edge. So basically the location is important because it can be anywhere between the user or the on-prem to the central cloud. So edge cloud resides between the central cloud and the user or a sensor or whatever that is accessing the use case. So you can see here we have typically 20 to 100 millisecond range of latency. And as the proximity to the user increases, you can bring it to edge, the latency reduces 5 to 20 millisecond range. And then if you come to the IoT edge, which is closer, more closer, you see the latencies of 1 to 5 millisecond. So given the use case, where is the user, the client, and where is it being computed and where is the training. So you have training and you have influencing and for those two cases, which you do. So they are a different location. Therefore, the compute power needs to be relevant in this case. So there is something called hash rate, which is available for measuring the crypto computation requirements. The same thing can be applied here. And then you can see that if you have training closer to the storage and with a large amount of data is better than 50% of the training should be done in the cloud. Whereas if it is at the edge, because of the constraint environment, you would like to reduce the influencing computation because a lot of transcoding is required there for the media. The media needs to be converted from one form to other form like audio, video, text, maybe MP4. So MP3, MP4 and those kinds of plus sound also, everything requires transcoding. So a lot of compute power is pending transcoding. So you only are left with 30% there. Whereas if you go to the IoT edge in the factory floor where most of the thing you have to immediately react at real time. So inferencing is more important. So you need to have trained models. So this is what this is depicting. Next slide. So what we have is, yeah, go ahead, transfer learning stuff. So as described, as we go closer to the edge, we're constrained both on the amount of compute power that we have as well as the high requirement for low latency. So in machine learning, there's a technique called knowledge distillation, which actually allows us to make a lightweight version of the model for these specifically low latency use cases. So the way it's done is you typically have your accurate model, which is typically a deep neural network. And that neural network is treated as a teacher. And you have a student model, which is a shallow network, which ideally tries to learn from the teacher by mimicking what the teachers do. So the advantage of this is you have a lightweight model at the end, which can be both trans transported easily, as well as influenced much quicker than the actual model. The only tradeoff would be a slight loss in accuracy. Next slide. Yeah. So what we see, we call overcoming the constraints for the model. So model is obviously some kind of a dagger, what we call graph directed acyclic graph. Now that has a format. So whatever you capture in the model gets formatted into the DAG, and it's transported as a ONNX format, which is nothing that open networking, a neural networking exchange format. And it has a runtime, which can be applied at the deployment targets. So training target can be in the cloud, the deployment target can be in the edge. And when you apply it, then you can run with runtime. So it provides interoperability. That's one it provides portability. And it provides a hardware insertion so you can have all those three, which is critical for executing or overcoming the constraints of the computing limitation as well as the distance limitation to get the best out of the latency that is desirable. Next slide. So what we have here, just trying to provide some kind of a what does GPU do. So hardware is inflexible because you may have an NVIDIA GPU may have an Intel GPU may have any GPU and they don't have similar architectures. Some may have different architectures, but given different architecture, you cannot have hundreds of GPU in a given constraint environment. At the base, you can have one and maybe that is sliced into a number of instances. We call it multi instance GPU. So the requirement happens to be multi instance GPU to be able to maximize the use of a given GPU. And so if you find this clusters, here we are seeing a training cluster inference cluster and analysis analytics cluster. So if your training cluster is in the cloud, it already has created the model and it has been available to you, you can download it. So the edge execution model just downloads whatever the training cluster has created at a given time. And it can keep on improving, but dynamically it can download, which is very small one X and execute the one X execution or the GPU or a given instance. It can ask for as many instance it wants to compute and then depending on suppose you get a four streams. And you can execute that in the four streams parallel tasking executing faster than normally you would do with just a general purpose CPU. So that is this removes not only the interoperability issue. It removes the portability issue because you are using a standard format, whether you have Azure or whether you have AWS or whether you have an open stack. Because you can execute and that is where the importance that you do what you need to do at a given age, whether it is a IoT edge, where you are mostly doing inferencing, or if it is, let's say, a transcoding plus inferencing, which you do in example, like the other one we mentioned about training, etc. So next slide. Given, of course, this is I am repeating AWS has machine learning infrastructure and MapReduce which is the EMR. Amazon offers the biggest storage like your object storage, the file storage, the Hadoop for the big data, you have the DynamoDB for the key values and then you have got the other similar relational data services and etc. For big analytics and BI and Redshift and Glacier etc. Similarly, you've got evaluation environment there. Usually you are more focused on the execution that is deployment place. So here you have the container service elastic container service, which is a you register and use it. And then you also have the standard Amazon EC2 plus you've got the Fargate, which is mostly for the what you call the abstract lambda that is services. And so overall you've got the services which are there. So Amazon offers you different services. Now, given to what do we have, especially we spoke about two important ones. One is the Wavelength. I'm going to describe Wavelength because Wavelength means you have a cell tower. And if your autonomous driving is happening, let's say at some place, and you want the low latency to be applied to that. So what you do is your computational highway computation needs to be done somewhere and that needs to be close to cell tower. So your Wavelength zone is the offer from Amazon working with the Telco service providers, the mobile service providers and provides you a data center capability. So you can take that Wavelength zone. Now outpost is something which is different. This is like, hey, you want to be part of the Amazon system, yet you want to be closer. So you don't want to build your own anything. So a campus environment like CMU, let's say, Carnegie Mellonius. If they have the AWS outpost, what they do is they will size what is my use case. Oh, I'm going to do what you call the autonomous vehicle, which is running in close to my campus and I'm doing some testing on it. So I will size what is required. Okay, we need a, let's say Wavelength. I need a console, which I can use. Wavelength with virtual private cloud. And so we identify, these are the AM, AML stuff, which I need, et cetera, could be sage or something. And so what you can do is you can size it and order it and you don't have to order the 100,000 racks. Rather, you order a couple of racks so that you can prepare the site and validate it and get it shipped and start installing one or two and start building. So it's a rapid deployment of outposts, which can be tied with the Wavelength so that your clusters can be separated or as we mentioned earlier cluster, which cluster. So if you want completed, if you want something to do with the evaluation, that can be closer, that can be in your closer to your environment, which is in the outpost. But if you want something which is related to storing in all, it could be put there closer with the lesser nanosecond. It could be, part of the function can be a part of the function. So it's a functional, how you want to do, how you want to design the cluster, those are other aspects. But basically, you can split the functionality across whatever is most optimum use that whether you use Wavelength zone or the outpost, it's a question of decision making and that you can include in your design. So what we are seeing is AWS offers is something called SageMaker, which is a ML development tool where you label, build your studio with all your definitions of what the model is and how you want to capture the features. And outcomes. And how do you go about, so this is the label, build, train, tune and deploy. So the same similar offer is there from Kubeflow platform for, from Kubernetes. So if Kubernetes exists anywhere, you can run your Kubeflow. And what does it do? It does the same thing, like extract the data, generate the state, generate the schema, transform the data, train the model, validate the model and serve the model. So basically what you can do is if you have anywhere a Kubernetes, you can apply a Kubeflow. So what we are trying to do here is to explain how AWS AI ML landscape can be applied to open infra. That was the goal. Go ahead. Next slide. So you can see the Kubeflow platform here. It runs anywhere where Kubernetes is available, a cluster of Kubernetes and that you can see here, even the on-prem can use it or local can use it as you can. So whether it is a cloud or an on-prem environment, all you need to do is provide the scaffolding of all the applications that are required, ML tools that are required. And then start building the pipeline either using Argo or you can use your Kubeflow itself for the ML flows and execute them. And on the right side, what you see is TensorFlow serving, PyTor serving, Istio, there is service mesh, Argo is again pipeline. So you can promise this is for your what you call the collection of all the interrupts and all that or whatever, if something goes wrong, fault detection, et cetera. And similarly is this particles. So at the end of it, what we have got is we can replicate this in an open infra. How? Let's see. Next. So you can see here, whatever applications we had said, we can reimagine and process like outpost and wavelength for edge can be applied in even in the open infra. In open infra, we have got air ship, which is a major project. And so is Starling X for the distributed cloud at distributed edge. So we can use these for creating the pools of bare metals using ionic and then a cluster API with Kubernetes control plane can manage the workload clusters, whether you want to take hybrid approach of using the clouds. You can actually have resources from the cloud by like we have outpost wavelength. We can combine the hybrid cloud approach with the open shift or even with the what we call VMware or OpenStack platforms. And then it's able to interrupt and portable interoperability, portability for AI MLB already have given you an accessory formula. All you need to do is you have to get the scaffolding on top of it. And whether you want to do it in the cloud or edge, whatever split you want to do, you can do it with the clusters properly designed. And this can also be applied to 5G core, whether you have a CUDU for the baseband unit or you can do it for the radio unit or RAN, even IoT gateway. So distributed AI models can be adopted over airship and Starling X, which are open infra project with the assistance of Q-Flow and other models. And the models are based on the use case. You can bring them and adopt and use. So this is our message that open infra is ready. And it's just how you imagine how you deploy. And so this is our objective to see how we can do what cloud can do. You can do even in the open, even in the what you call the on-prem, which can be used by telcos. Thank you very much.