 So welcome everyone come to my talk and I'm Wilson Wang I'm from by dance. I'm a research engineer in the infrab lab and today I'm going to talk about the web assembly in by fast. So by fast is or internal like fast platform like for for like by dance environment but I mean they are like it's very similar to other like the you know like the fast platform but today I'm going to talk about something like details that we implement that is different from like the other fast platform and especially about the the places where we use web assembly. So the talk it will be three parts for the first we will talk about our legacy like the fast environment and then after that we will talk about the how we use web assembly in our current fast environment and finally we will talk about or in progress research about or like next generation web assembly based fast system. Okay and finally I will give some demo so yeah let's start and first I will talk about the fast like the past like what we what was there before we start using web assembly. Okay so there are like it you know our legacy by fast architecture we have two kinds of like the functions like one is native function another one is event function so for the native function it's container based you know like use their customized images and in order to like run their function like the when the container start they need to have like have their start.sdc script to start all the the services and then after that their code needs to listen to like on a particular port for example like for HTTP services they need to listen on the port 80 for incoming requests and the second about event function so the event function is a little bit different from the native function we have a bunch of common like like shared common images for example for like Python, JavaScript or many other different runtimes and the users just need to provide their business logic code and then like we will use these common images to start their like the event function and in this case like the user code will be like much like their code will be much smaller than the like the native function so the ports with these two different kinds of functions that they are flexible because they we can provide like a user with different types of workloads and it's easier to migrate their container based microservices workload to fast because we have a lot like a lot of like the container based microservices they are either using Go or different other like languages and then when they want to migrate they are like the workload to or fast system then they we need to give them away so that they can you know so this will this process can be more smooth like more smooth okay the current ways these two functions that well for the container costar it's well known it's very slow right so you need some time with some time to for it to get started and another one is that the container images are usually are very usually are large and for the next slide I'm going to talk about the optimizations we did to the legacy workload so there are three you know like there are three major container costar optimizations that we can find in the industry or in the academic area and the first thing that we change the Kubernetes like part creation from declarative to imperative so you can think of like so in this way we can avoid a lot of like the you know like the backend force the data update and then observation all these observations that are happening in between so that we can save time that's the first one and so second one is using like one pool using one portals speed up like the initialization and this is also very common use and the last one it's not very common but many companies are starting using it so it's like you can snapshot your port like container processes and later we like at a particular like initial stage and then later recover from the snapshot so in this way like the for large containers so like it can like it can start up faster okay and next slide okay so for the for these optimizations there are limitations and the first thing that the costar optimization you know like there's always a guy between different isolations so for the VM like when we started like a VM there are like you know we have a VM inter VM exit and this cost the cost are more than 10 times higher the cost of usually like a system call which is commonly used by Lin's container and the second one is that the system for system call is the cost is also more than 10 times higher of the simple user space like call or jump instructions and so this this is gab like it's we cannot avoid and because of this and then there's always a gap between you know like the VM and container and then you know a different other isolation so how do we start like fast functions with latency that's like the millisecond level or some millisecond level and so we are thinking here like probably we need a sandbox environment that I can mainly use like a call or jump instructions and there are like we have two choices one is unicolonial another one is a web assembly and then we've been like thinking about unicolonial for like quite some time but still like there are many you know like the limitation with the unicolonial that we you know we make us like hesitate to adopt the unicolonial solution and finally we found like the web assembly and we believe that it will be much like convenient for us in our environment then finally we chose we chose web assembly and for okay now we are talking about the present what the what we have like right now in the as a lightweight web assembly worker so we introduced the web assembly worker for our by fast system so what is that it's a new like fast worker based on what's in time so like other components in the fast system like gateway or like all other agents they are very similar to you know the common the common fast systems but in our case we actually added like a fast worker based on what's in time so what's the difference it's raw space we use tokyo async framework to make a better use make a better use of the micro core multi core environment and also for async common request the web assembly sandbox is either created or reused from the cache and we use like the user web assembly functions pre-compiled ahead of time so that we can you know like that it can run faster and also we can start it faster and also like we have another service provider framework which gives the advanced features and the core capabilities to the user code so in the next slide I'm going to talk about what is service provider framework okay the service provider framework you can think of it as like a micro kernel design or message passing in design so why do we need to when do we need to implement it so there are like you know many features like our u-ring or even fd like these are like you know used by many of our legacy workloads but uh in web assembly actually it's they are not they are not directly provided so we need to use them then we need to find a way to you know to provide these features so what a way did we add the service provider framework it provides the host calls with back-end support to user web assembly functions and the current implementation that we have for this for this like back-ends are like kv-store service discovery, HTTP request and response and also message queue and definitely we also provide something like our u-ring through this interface but I'm not going to cover these details in like in my top because there are so many I can't cover them in like in a single top and there are a few things working progress because this service provider framework it came way before the component model and then you know in the community then they proposed the component model and also other like other new features to wasting time then we decided finally you know we might need to investigate like the new design how we can adapt like the component model and to improve our service provider framework and also we are thinking you know integrate the dapper you know or implementation so that they will be something like from the community like we can directly include it you know okay so to improve or like improve or like the service provider framework yeah and but there are pain points with all Wesson worker so there are two main pain points that we we need to like resolve you know or Wesson like you know Wesson worker so the first thing is that we have like increasing number of like small web assembly modules and they are small they are divided and they have dependency between each other so and also like for the different Wesson modules they have different versions and all these messing up together and then we get some we get an enterprise a hairball so they they they you know they have dependency then they are the develop for the developer they are productivity greatly greatly affected and also the second thing is that it's hard to identify and resolve scalability and performance issues we need to speed up the cascade request for example service a called service to be and then how do you know like then how do we make a speed up these requests and also we need to find the you know application bonus like for for example like application a called application b and then you know like if the application b like running slow then it will slow down the whole request okay so how do we solve these two problems and we have a partial solution and this is what we currently have but they didn't solve the didn't solve totally solve the problem like there are two solutions one is to work service another one is combine deployments i'm going to talk about this i'm going to talk about these two like the improvements that we have okay so the first thing about plug it okay so the for the worker service it manage and deploy function in groups with like the manifest files describing the functions backs and so the the work service we support mixing web assembly function and other like language runtime like functions and for the different like the functions that they when they communicate with each other we use the idl or you know you can use JSON to describe the worker function interface for a request the idl you can think of like a port above or something equivalent and we use kubernetes similar concept to organize the functions you know like we so the developer they write they are like the yaml files the yaml files contain something like deployment service group snapshot and then use these to describe the relationship between the functions and how they get deployed so the they are also like ports and cones with these worker service so like it's a poor high level of like solution to manage the group of functions but like but still it needs a lot of extra like efforts to manage the you know like manage these application deployments people need to write their yaml files and then they need to organize somewhere so that the platform can retrieve these yaml files okay and second the improvement that we did is combined combined deployments you can see from the the graph here so when the request comes then it will first come to function a if function a needs to you know talk to function b then it will in the original design function a will follow the request of gateway and gateway will do a reverse board c and send the request to another worker which will bring up the function b and return result and this is you know this the two and three is totally necessary if we can bring up the you know the the destination function on the same host so that what we did is that for the combined deployments we actually when the function a needs to call another function and this function can be bring up can bring up like on the same host then we will simply just bring up the function b on the same host and process the request and return the result so that like in this case the original remote request become like local requests then then we improve the performance and we can see the result here in this case it's where you were testing using javascript runtime on top of web like using on top of web assembly and with runtime that using where using is actually spider monkey so spider monkey generally the performance is like higher than quick j s and the results show that the average and long little tail latency are greatly reduced for small size traffic you can see here the for the remote request usually you know average around like 10 to 20 millisecond and then finally we will review the two like two or three millisecond but as the size of the request like increase then you know we couldn't get a lot of improvement for it but still we for the long tail latency we can see you know the difference okay now the last part we are talking about the upcoming feature the next generation fast system investigation so like we discussed about the solution that we have but they are not told they are not like for the combined performance and also for the worker service they are not solving the problem not totally solving the problem so what we can do to you know like to further improve it and also like give the developer a better like the developing experience okay now let's talk about the service of weaver so service weaver like for cloud like in their paper this suggests like three things that are part probably can be used to improve the cloud application development the first is monolithic application development is important so like for the developer they can develop a single like the application binary and later you know this will this will be more convenient for the developers and the second thing is that the new need to distinguish between application like physical and logical boundary so like people write a component but this component like they are they are logical boundary but they are not necessary the physical boundary when in the actual like the running environment and the third thing that are distributed like smart runtime is needed for auto optimization auto scaling and also for debugging well this service of weaver actually for only for go but so for our case we want different languages that you can work together so that we need like force item the course language interoperability so how do we do this like how do we like achieve goal and you know and do all these four items so our answer is like west among ray okay so how many how many of you guys know like ray or use ray before okay so i'll give a short introduction about ray so on ray's official website they say that ray is an open source or unified compute framework that makes it easy to skill ai and python workload from reinforcement learning to deep learning to tuning and model serving it's very long right and actually from like young stovica he said that ray is a distributed computing ecosystem at the service so here is one example of how you write ray code so you you know in the using python you can write a function and here the function square actually returns the square of the the the input value right and you use the decorator to like read our remote to decorate this function then later when you call square dot remote it actually actually this execution is like distributed by it's distributed by the ray framework so to somewhere else inside the cluster so that you get you just get the the return the return value of the futures and later when you call rega you actually you get the actual value for these like executions so that's how like ray looks like okay so next one i will talk about the ray architecture so the ray it has one like usually one hand node and multiple worker nodes say the handle and worker node they form like a ray cluster and so that you later can submit your job to the uh ray cluster and uh and like and run the your like the your logic your logic and the there are two things one is like or two important concept in ray one is task another one is actual so the task it's still is and the actual is staple in the case of python so like the ray support you know python and also java and in the case of python you can single like function as like uh uh stateless and the class actually is staple so when you execute a function it's uh when during the execution inside the recluster it actually still is so each time when you call it it actually you know the result does not depend on the previous previous call and for the you know java for the python classes when you have like an instance then later when you call its method it actually uh it's actually an actual it's staple so each time when you call like a class uh method uh in ray as then it actually depends on the previous states okay so now i'm going to talk about or vestimony architecture so there are two parts uh one is execution and scheduling for execution we add a new ray vestim worker uh so what is so what it does is it's different from the fast worker so the here the worker actually the ray worker so the for the ray worker it handles what by somebody specific task actual execution request so that when uh when you uh so that when you submit your call it actually as your code actually execute on these ray vestim workers and uh the runtime we use actually it is uh vestim match so the reason we are using vestim match is that uh or call uh actually like it's the projects actually uh using the basal to build and then we have some really hard time integrating with vestim time because like with you know like when you are building them like the for the uh for the cargo then you know there are a lot of uh problem with it for example like the when the cargo when you build the vestim time there are some scripts that you're running during the compile time which is very hard to integrate with uh with the beta project so that's why we choose the vestim match and also for the vestim match it provides more featured plugins and which will be very helpful for us and later I will show one example of yc and n like how do we do the inference actually we need to use we need to use it using the vestim match and the second ray host calls so for the ray we actually explore them uh like as ray host calls so that later different languages when they compile to web assembly they can use this ray host calls to uh execute like the uh execute task or actors and uh for the scheduling the the vestim runtime actually depends on ray to schedule the task uh and actor executions and also we will uh we use like ray to profile its execution uh like result and improve the like the function scheduling strategies and we can also uh depends on the scheduling result we can scale uh the task actors to improve the throughput okay here I'm going to show one example of vestim on ray uh you can see the the code on the right side so we have a function called task func which is basically doing some calculation and we return the result so the uh from line uh 18 to 20 we actually calling it locally so it's just we uh so we call the function and get a result and in order to use vestim on ray what we need to make some changes so from line 27 to line 32 we this is actually uh how to use uh vestim on ray to execute a function so the line 28 in line 28 we call ray we give a function pointer and also we also give a buffer which will return the future and later uh it's line 30 which we will do we'll call we'll wait for the uh like the future result and then uh populate the result the actual result then print it out so yeah and uh next thing I will go I'm going to show a demo like the the the like previous call uh demo and also uh mobile net uh inference demo okay uh here is uh because I I have like formal access in my uh like the environment inside our in enterprise environment so I'm going to show like a recording so why did uh to run like to run the example code and this uh the first thing is actually we need to compile the uh the ray so this command is it is just used to compile the ray and including all vestimon ray yeah it will you know like for the ray if you want to build it uh from the scratch then it will take a long time but this is since we already have some uh compiled before so it would uh be it would be very short okay then after this uh what we do is actually we need to uh run the example here uh we'll compile the code it's here we actually uh let me see oh yeah so in the under the example directory we actually build the example web assembly code and finally it is the uh let me see the execution yeah here here's just a command it'll be a long so later we you know we will improve it but right now you can see that here like it's called warrior so that's something that we uh used to uh like to load the web assembly code and execute it uh inside the ray environment yeah this is it's trying to connect to the ray cluster which is already uh running okay you can see here so this one it's actually calling locally uh directly like uh here and then we get uh we bring the result then later when we call it remotely actually uh like the call the ray uh I mean call the ray to execute the function then we also get get the same result yeah this is uh the call we shown here that's exactly like we are exactly executing the uh exactly the same code okay then after that it will uh like say determine the code and also terminate the ray cluster okay and for the uh second demo I'm going to show uh like western match mobile net inference demo and this thing uh this protocol is not actually executing the western like executing ray code but uh we are here to demo you know like the how it is uh how easy to uh actually like to run inference using west match and you know future we are we want to do some further uh improvement today so that we can actually run the inference distributionally but uh right now I'm just going to show the how we run the or inference here's this example so you can get it from western match and see the western example and uh what it does is just you give like an image and then later it will identify what is there inside the image okay here in this case we are running uh the let me see if I can yeah these are the commands that we use to um run the uh run the like uh inference and we need to give the argument like we also like argument is like uh it's shown here so with uh the model like the like the model file and also the input image so the because it's because it's running on like the cpo so generally like it is a little bit slow here I will probably speed up a little bit yeah here you can see it's executing it's a loaded model and start executing okay you can see the result here it identified the banana inside the image you can uh you can get it from the west match west in an example you can run it locally so this uh yeah it is very easy to execute you can just run it using west match okay now uh I will talk about the the in progress and future work so while we're planning to we're planning to run lama 2 uh inference based on like lama 2.c and uh because it's pure like it's purely written in c so later we we are we are like working progress to uh make it running on west mount ray so later you know when you run uh this uh inference it actually can run uh digitally so you can have data uh inside the inside the cluster on different nodes and the second thing we are doing is that we are because you can see the code it's still it's not three four yet so later we want to see if there any uh way we can further improve the uh the west mount ray so that it will be easier for the developer to take advantage of the ray underneath and also later we are planning to like uh do uh like rise decays for different languages to support like uh to support west mount ray and right now it's we mainly support c and c c plop us and later we want to add like the gel script support and also the like rust support and the repo is here and also we have like the proposal is already accepted by the ray uh like by the like the re-community yeah okay thank you for attending my talk and uh any other uh any questions yes please this one so this uh okay that's a good question so like how many of our services are running on uh like wasm so uh right now it's it's like the most of our like workers still like a legacy workload they are like either go or some other like for example java and we see a portion of force like the users like switching to web assembly these people are mainly you know like they they need to process some like the message queues usually you know message queue users and all some of the uh like they found in you know like the services they want to use web assembly because the web assembly uh it can stop very very quickly and compare to container that's the like the first advantage that people you know all the first reason the top top first reason the people switch to web assembly yeah so these are different uh you know these are different sides of the request so first one is one k uh one k uh data size and then you know this uh is the request from uh function a to function uh you can see here like function a to function b and that's the like size of data so the like the top left one is actually with the one k size data and if we go to the bottom that will be like a set uh like a larger size data you can see here with the maximum we have like four megabytes and we have some data missing there but uh yeah so right now uh you can see that when we have like smaller data uh because we are running the destination function on the same host it is very quick and because we have no like network latencies and it will be a few few like millisecond to for data for for the to get a result and but if we you need to like uh use a gateway for the request to gateway and then uh bring up the destination function somewhere else then you can see that the generally it's much the latency is much longer and also for the long tail latency it can reach more than like 200 200 milliseconds yeah uh we have like I mentioned the uh there are two like we we uh let me see if I have it here so okay so uh let me uh we have like the uh for the web assembly function we have co-star and also have the uh you reuse the the cache ones right for the co-star it's a few I think for us we it's only like uh less than 10 uh millisecond and then for the like if we use the one from the cache it will be like less than five millisecond compared to we like a container or something right yeah the container like the like when I say the millisecond it's the actual data that we get we like you know bring up the all these things that are happening like we see in like a few uh millisecond but for container it's usually uh like more than uh you know more than uh 10 millisecond it's definitely way uh longer than the web assembly functions yeah any other yes please yeah rust yes yes and the west match actually actually it is uh using c++ but they have like the uh the like rust SDK so that you can use the rust code to call uh west match oh I see okay so the you can see that for the west time it's something that we have right now because they are using uh you know they are using rust everywhere almost every component they're using rust but in our case it's west mount ray so the re project it actually compile using basil and uh if you want to compile like the rust project using basil then there are a lot of you know like headaches for you to uh so for example like for when you use the cargo to build rust code then there are like some like build rs which basically execute it uh like at the compile time but for basil you know it's uh like it's hard basically it's hard to integrate these two concepts and it will be not very easy so that's why we choose west match that's the one main reason another reason that uh for west match is they have the plug-in support so you can have different plugins and then it will be easy to have your own plug-in and it's extended yeah okay any other questions that's true that's true yeah that's uh that's uh the the thing that between different teams so like something they are actually yeah they actually the team that are using you know like the proposing the different framework they actually we are trying it out but there are a few small issues that we need to resolve but our future goal will probably you know make it a plug goal we can make it either using uh Tokyo or using our internal like the uh different like the frameworks yeah okay okay great thank you everyone for coming have a good day