 My name is Quinton who I'm work for a company called Huawei probably heard of my colleague Ying Has been working steadily for the last while On our serverless platform. So we're here today to tell you a little bit about that How do I make this? Oh, there we go Screens are showing different things. So to start with I'm gonna give you a very very brief introduction to serverless computing I'm sure most of you are familiar with the domain already, but it seemed useful to just contextualize some of the subsequent discussion Apologies to those of you who are serverless computing experts that a few minutes might be a little boring to you And then Ying is going to take you through the architecture and design of the Huawei serverless platform Goes by the name of function stage. She's going to demonstrate it for you We're going to talk a little bit about, you know, some of the challenges that we faced These systems are designed for very big kind of enterprise deployments So we run into some interesting challenges Which we'll talk about some of which we solved and some of which we're still in the process of solving it satisfactorily And then we can have some Q&A at the end if anyone's got questions So what exactly is serverless computing? quite sure why this This screen is not showing the same as that screen Doesn't matter unless I What if I push project here? Okay, we'll we'll figure it out So what exactly is serverless computing so function as a service is the other name often used for it essentially Wish I could see better there So it's a cloud computing model for execution of functions The general idea is that you shouldn't have to worry too much about the infrastructure All you do is you upload a piece of code and you chain these pieces of code together in useful ways the other kind of key element of Serverless computing is is that it's all event driven So typically these functions execute as a result of certain events happening either inside your infrastructure or inside of your application Fixed. Thank you Which you know is not necessarily a Paradigm that all software of these days is written to so it sometimes takes a little bit of a mental mind shift to Figure out how to design and deploy these applications the other key element of it is that typically the Charge model or the resource allocation model is based on On the much more closely to the usage than something like VMs or even containers Every time you execute function you you pay for the resources Associated with the execution of that function as opposed to having some static resource allocation Like a virtual machine running permanently and so for certain use cases This is very very beneficial if you don't want to have you know virtual machines running all the time if you only occasionally need execution capabilities The other important aspect of it is that it is auto scaling So you shouldn't have to worry about how much capacity you need in order to execute these functions and if you have large spikes up or down In in your demand your user demand The the platform should automatically take care of that for you in autoscale And all you really care about is that you're paying per function execution So you shouldn't have to plan for any of these huge spikes in traffic And Related to the the resource allocation model. I just mentioned The billing is is similarly tied directly to How many invocations of your functions? Occur so you shouldn't in theory you shouldn't be able to be required to pay anything at all if your functions are not executing and One of the challenges that comes up is well, how do you do this if you have functions that clearly need to be? You know instantiated on some kind of virtual machine usually or or virtual environment And you need to wire up a whole bunch of networking to make them all work How do you do this instantaneously when the first request comes in and you know 10 milliseconds later you want a response? So this is cold start challenge which different serverless computing platforms deal with in different ways And you should definitely If you are shopping around for a serverless platform That's one of the areas you should look at you should either be comfortable with fairly substantial delays on your first execution Or cost associated with caching these things To make sure that your applications are going to behave in the way that you expect them to And you know the other aspect no no maintenance of virtual machines You shouldn't be worrying about which version of your virtual machine is running whether you need to apply security patches operating system patches, etc and The other I think compelling aspect of this is that because there is so little infrastructure management for you And because the programming model is so simple It lends itself very well to highly To high-speed iteration and in fact in many cases people who are not traditionally seen as Developers or engineers are building these serverless applications because they're much much simpler to think about and deploy and One of the sort of mental models. I like to think of of it is I don't know if any of you can cast your minds as far back as the I guess it was the 80s or the early 90s where Spreadsheets were were very popular and all of a sudden you had accountants and HR people writing macros in Excel or whatever it happened to be at the time and we're effectively writing software They were building applications although they were not software engineers, but it was so simple for them to do this They didn't have to worry about compilers and debuggers and all this kind of stuff They could write macros in in excel and get quite a lot of stuff done that was traditionally being done by you know Software written on mainframes etc at the time So, you know the end result of all of this is this is what a typical serverless application Usually looks like in the middle there. You've got a bunch of functions These can be invoked in response to any events occurring those events could be Incoming requests application level requests. They could be events in the infrastructure for example object storage Things being uploaded to objects changes in a relational database Results from an artificial intelligence service for example and by chaining these functions together Through these sets of events the end results of whatever the application requirement is is fulfilled as you can imagine The right-hand side there is pretty important part of the model So you need to have typically a bunch of high-level services available to these functions You don't usually write those in serverless or within the functions you tie into much higher level services AI services databases storage services etc in the back end and With that I will hand over to the far more interesting part of this conversation or talk to my colleague yin xiong Thank you. So Next we'll talk about how do we do that? in Huawei public cloud So I'm sorry. I think this is yours. That's all I can take over So current service platform products. There are a whole bunch of those. So you have a lot of options to choose from first obviously the most Used the most successful and their stock price is very high Amazon Also, like This is I personally worked in Amazon for seven years before I know they do things very fast They design from customer perspective. So Their business model driven their Technical decisions. So I will say this is a wonderful company to choose from but also consider others as well Because different company have their different cultures and they give you different features. So Followed by Amazon, we also have other serverless cloud function provider Microsoft Google IBM and others like kubernetes iron Fissions I'm only mentioning them because they are relatively smaller than those big giants But Alibaba and the Tencent also are some China cloud provider That they launched the serverless before then Huawei. However, I will think We serious consider when we do those performance Analysis actually Huawei is getting much faster than theirs So well, of course, this is based on a few months ago when we are an analysis our product with theirs So I wouldn't say that they did not speed up But we are saying Huawei is getting catch up very fast and we will talk about how we do that later Also in addition we have seen so many big giants Retail and the other company like a robot Expedia. No strong capital ones in financial companies They are using serverless. So consider when we talk about the security financial is the most Most the secured business then they are still picking up and they don't have any problem so Let's go to the meat What? Okay The architecture of Huawei serverless so this is a brief overview of Huawei serverless architecture platform, sorry platform architecture so for the green part we say this is what is already existed and We leveraged the existing resources serverless platform We call it function stage in Huawei It take the hardware resources provided by kubernetes and It pulls in the software resources saved in function repository Combining the hardware and software resource together We start the function runtime and the actual user function running in the function runtime hosted by kubernetes port with these resources when we have events in object storage messaging system and The client API request Any of those events happening we get this request so serverless framework get those requests and Starts instance for the actual customer request and fulfill your requirement So to further understand how function stage Works we need to first know how function is modeled in function stage The definition of a function is combined by two parts customer resource a whole bunch of those and the dependent libraries The actual function instance that is running in Huawei cloud is Hosted by container The container has three layers of images It has OS layer in function stage the OS layer is Huawei certified Sentos it's our Linux version of red hat. I'm sorry It's our Linux and it's a variation of red hat but it's a while we sent you certified and And They publicly are available Developed by our service platform the language specific runtime library and The most important part the customer function So how this function more function instance running in Huawei cloud? First we are as a serverless application user. I Don't know what happens behind the screen. I am outside of the cloud I'm only sending the request to the cloud. So they don't know the behind architecture What do we do behind the screen here? the customer request gets to pick up by API gateway and If I gateway further forward the function call to our function dispatcher function dispatcher here saying If I don't have the here, this is a code start if there is no such thing as this function instance We call the Kubernetes API servers. They give us our hardware resources and Then The function runtime will be started with the software or pulled from a function repository And then request dispatcher forward the request to the actual runtime So if This is the code start if this is our warm start Meaning that to request dispatcher already found the function instance from the instance catch it is integrated and then it will directly forward request to the function instance So there is no waiting for this part to be start For asynchronous call we put this request into Kafka message queue and For client HTTP request It will be filtered by the HTTP trigger to see whether this is an actual a customer defined API call and other events started from Messaging system or block storage system and they will also trigger a function for them to run as well Okay, here's a demo So this is why we public cloud it's not showing up. Sorry So first let me define my use case So as a person who's Like travel a lot and I like to take pictures as well. I want to share my best moment with Relatively not crash. I was so crappy pictures. I like to process my pictures before I send them out Okay, so but I also want to travel light I don't want to carry my laptop just because I want to process my pictures before I send them out So what I do? I want to know cost website because I don't want to pay for that a lot and this website can upload the picture and the process it based on my image processing logic because I could do a lot of programming behind the screen and Like what your customer is defined application logic in that those function. So here I Come up with a solution with I'm hosting a website a static website hosted by Huawei block storage that it The website the web page take pictures into a OBS bucket. It's kind of similar to S3 So there's another function that I created in function stage that it will do those image processing for me and they treat the upload object upload events will trigger the function to process that image for me so and Finally it will the process image will be saved into another bucket So that I distinguish between those original and the processed image Here comes to the Demo don't be really extremely hard if I'm not saying the demo page. So it's here, but I don't see here If I drag it across as I think it Let's do this code Presentation first So let's see This is a code that I come up with processing the So, let me see whether I can make it So I have a Python code that deal with three bucket that they upload image upload bucket the output bucket and The the bucket that hosts my website They I put it here many because I want to say how many times I've been using this web page So first the events that been done to buy OBS I Initially, I don't know what's a structure of the event. So I just dump it here. So that later on I know how it works and And the start from the code I Got my security information to be used to talk to OBS later So and then first start this function start I get the Image information from the event so So this the function is our event Data passer it gave me the result of the what's a bucket name and the image name Afterwards, I don't I'm trying to download the image from the OBS I give its bucket name my login credential and the name that I'm trying to the file name that I'm trying to download from Save that into local so that my image processor can handle that Afterwards, I do this transformation. So all those logic that I'm doing this image processing Inside this function transform image After it's being transformed I be I will be upload this file to the OBS to be shared later and I do this counter increase and Yeah, so Comes to this transform image here using the building function in PL library I I open this image and What I do is there are many times that we got a foggy pictures that I want to do default it first and Afterwards I want to change it when I use this defaulting Library I found it's not that Pretty many times it makes it very dull. So I do other like I make it brighter and makes it contrast more better contrast and Also, I do some have like the timer I put in how how long it takes me to process those image and here's the part that I want to talk about the Stator structure of the events. So I do use those kind of events to do local testing So I don't need to go to the cloud to do all those testing because every time I need to pack up Okay So let's see because the actual running of the Loading of this function actually takes a couple of minutes because we are loading back to China So I would rather save that time So here first we go to the object block storage We can see that The full bucket that is associated with my function the website the upload bucket the process image bucket and There's another one where I put to my function code. I Will show you later what's in there and Goes to the function definition so This function I define or trigger here So the OBS events will trigger the the upload of the file will trigger the Image to be processed by this function. So let's say how it works. So I pick an image here Let's try a small image first It's not so good, but at least this is something done by my program and it's yours a very Small cost, right? We can work on improvement. That's exactly what I'm gonna do later So initially when we see that the function here They I have I'm using this default zip. So that's why I'm trying to hear Show you inside this bucket where I host to my function code So say I have two File here the first one is the original one okay, and The second one is I come up later when I realize that if I'm using this using my cell phone The cell phone typically right now they have Relatively bigger size of images. I don't want to resize myself, right? So so I use this program the Python thing here So I added up this part of things later Say I'm checking the image size if you are bigger than certain amount I want to shrink you so that you don't use a lot of my computing resources and Remember service function times out within five minutes. So you you got to be hurry Don't use a lot of compute when it's not necessary So what I do here just change So I just saved the time of uploading this zip file and I'm here just making a change to my function is switching to another set of code So Let's see what it will do So this is a big function big picture. Sorry. It has three mega. This is the one that I downloaded from my cell phone So this is a picture. I took him when I was on the plane. It looks nice, but it's foggy. It's very blurry So let's see whether it can process it during my test. It sometimes can sometimes cannot That's why I recite that It's gonna take quite a while because initially it has to download those more than three megabyte from OBS and then do the resizing and so during my test Oh fool here. So it actually behaves. Well, although we can use a little bit brighter. So demo is over I'm demo is over Okay Thank you So there's a lot of challenges that we encountered because we want to build this to everybody we want to be able to host a big customer and For smaller customer who does not have very good experience We want this to be very use very easy to use and also there's a lot of Big player already in the background in the area that they start three years before us So first let's talk about challenge one. They fast response requirement. So the code start is required To be within two seconds. That's our kind of harder requirement of serverless so when we start our Model we actually were able to control this code start within 1.5 seconds but this is only for the Container to start without any of this added on VPN and like code loading all those kind of stuff and Comes back to the actual production environment We have so many requests and when we use kubernetes the actual performance test for kubernetes when we're saying is when they 10 when we actually needs more than two digits of Kubernetes port to start the performance drops down very dramatically So sometimes it's to give us several minutes to start a port So what do we do? We were saying that kubernetes is our Hardware provider and we actually may not rely on their performance So serverless framework should be creative. We invent our own stuff But still we are not reinventing. They will we using resource pool which is providing by kubernetes We configure a resource pool first. So think about as a serverless provider platform We actually this is not real. So for customer. You don't have those Resource you don't need to pay for that before but for us We actually already paying for those no matter you use it or not So we start with based on our status data We started a whole bunch of resource pool behind the screen with empty port with the actual Function library running there, but not with the actual customer code Okay, so that part to save us more than two seconds when I say more than two seconds we say Because by kubernetes port it start up time is nearly two seconds when it is performed Undernote, but if if on the load then it's more than one second one minutes so and then secondly we Do the code injection because we have this resource pool. We need the real customer function to run So we do the code injection by the time we ingest the code we change the Identity of the kubernetes port to the customer. So they have their real name space and Another way we are doing that. So remember JVM have noted notoriously Bad name of starting up very slow. What do we do here? Why we use the open JDK? resources that we have a JVM lab they actually was able to Start Secondary not the first one the secondary JVM on the same we and even though they are not sharing the same container But they are on the same same way it can share in the previous JVM resources and Give you a brand new JVM, but it starts much much faster than original one So second challenges Security with usability. There's always the case that should I be more secure or should I provide the more usability? Here's the causes so Huawei internally we use Internally certified Linux system CentOS So it is a variation of red hat But internally certified it stripped out the many things that was deemed to be could be harmful So many many of the other third-party libraries that are not certified. We are not allowed to build them into the system So it turns out there are many standard one when I trying to compute do my demo I personally encountered those because I went kind of brand new to Python and I don't know many of those It's supposed to be standard because I do not need to download the library and I can use that for just from the scratch However, when I try to run that in the cloud, it doesn't work. It says the module doesn't was not found So we're saying then some of the standard library are not there So I have to be invent for myself. So what's the solution? Diving be your own customer fill the pain and solve this issue so Other challenges, of course That's why I put here number three to n I'm gonna go over this quick and we have to catch up the big players because they are ahead for three years Three plus and we have company VPN We're consistently blocking us to get some open source code experience or those kind of and we have like three country actual four countries and the two continental Co-workers working together you name it. We have it Okay, so jokes are part here are some resources for our this is Huawei cloud although is The government policy is pretty strict You have to use your real name to register your account in Huawei cloud It is this is a policy by all the cloud provided in China So I'm sorry if you don't have a Chinese identity cannot use it and Also, but this page This product introduction page. I think not some Yeah, they stripped away because this is not in general announcement yet We are still in the public testing phase. So they stripped it out But you can see our Funking examples in the github. It's just called the function stage. So and the demo it it's my demo so but I Personally paid for that account and I hope they want to shut it down for a while Because I don't have enough Chinese money to pay for it Okay, so questions Sorry, this is not open source Sorry, okay. Yeah. Yeah, I was just wondering if you've seen like the open source implementations or if your solution has Kind of taken inspiration from any of the open source implementations out there That can you can you also explain how your Implementation is different than the major open source implementations Thank you. So Before we start we actually did a lot of research Especially on those open source projects such as iron.io Fission and Kublaz and open whisk. There's a lot of inspiration I will say each project have their own Benefit many of them like they are simple Something is just simple the but however the performance wise not so ideal I was saying like the code start has been less than two seconds so that's our major concern and also like because Huawei is able to provide an enterprise great Platform it has to be able to run in thousands more than thousand tens of thousands of Function instance at the same time. So we cannot just Use or borrow everything from the open source because they they are pretty I will say they are still in the early stage They need some improvement. So we borrow some idea There's definitely because we use kubernetes, right kubernetes is also is kind of open source and Many of this open source project use kubernetes as foundation foundation as well. So thanks for all those previous Open source service provider that we are not familiar but we do some comparison We also consistently compare what did we achieve what we are still like to be hand between all those like even for this non Open source like Amazon Azure we do those comparison But I'm not posting in here because we are still in public testing. We haven't general announced yet So we I guess it will come up later It's not for me to say we are bad on something, but I promise we're trying to catch Okay The question is sorry The question is Do we take any action to prevent our customer to harm other? Customers. Yes, we do we do put we're trying to isolate the customers action By isolate their container actions, they are they have a limited available OS layer operations that can be done there. That's why we use our in-house certified centos Linux System so and we are also still trying to All Actually scan and prevent it from happening. We're not saying we're done there yet But we do provide trying to everything that we can including like control our own OS We are not open this this OS is not borrowed by open source without any internal Certification or filtering. So also we do like a code scanning for customer code scan Just making sure that they don't have harmful action is gonna happen This is maybe outside of your area of expertise, but do you know when and if Huawei cloud will be available to? non-Chinese customers For your ass I cannot say but one of the major customer of Huawei is in Europe, so we do work who is like Detox Dodge telecom in German and we I think we also work with the friends Major telecom company. So those those people they pick up our service product Product, so they were gonna have that there soon Okay. Thank you everybody