 My name is Seo Chang, I'm a CTO at Infineon. This talk, we're gonna talk about how WebAssembly powers our Flubia open source project. For agenda for this talk, we start with the overview of a Flubia project. And then we go into challenges we encounters while building a Flubia project that we are trying to solve, spoiler alert, we think that WebAssembly is a really powerful solution to our challenges. Lastly, we'll talk about the future directions. Flubia is open source event streaming platform. We open source in 2019. Flubia is retaining Rust and designed as a cloud-based platform from the ground up. We have, basically the platform does is it collects event from consumers and the producer and then it stores and process to the consumers and then connectors. We have, the producer can be retaining many language such as JavaScript and Python and Rust, of course. The connectors that we have can collect the data source from other database or a different variety of other platforms such as Kafka. The Plutfeo platform can persist those events and dispatch to consumer other data sinks. So why build a platform for the event streaming? Now, before we can talk about event streaming, we need to discuss the one of the most significant trend in the world right now. The, we defined the trend that matter to us is what we call building a real-time economy. This is where the business process and value chains are entirely digitized and connected. We have made progress in a lot of digitization, as you can see on aging devices, converting a paper process to a digital formats. The challenge is the second part, the connected. It is where the trillions of dollars will be made or lost. And the most important part of the connector is latency. For example, the Uber as a business wasn't possible until we figured out how to write the right request to other drivers in a real-time. It is no longer options to wait for hours or days to connect the data. Either you adopt to the real-time economy or perish. So what are the challenges of trying to connect this data? The biggest issue is existing data processing paradigm, we're using request and response model. In the model, you basically persist the data into some kind of storage devices or database and then you make request and response from the application. The most existing backend and microservices are done in this way. The problem with the request and response model is they are very inefficient. It takes minutes hours to process this data and get to the app. This paradigm is also known as a data at rest. And it assumes the natural state of data is at the rest, which makes very difficult to reduce the latency since it's hard to move things at rest. So how do we make progress? The better way to approach is to move to what you call approach goal data emotion, which assumes that instead of having a data at rest, let's make the data move always. This improves efficiency and also it maps to a lot of natural asynchronous processing. The best approach for data emotion is done to what you call stream processing. And one way to visualize this is just like flowing the stream of water where the data kind of represent the water and we have a processing steps that can transform data in any way that we need to do. The best part of the data stream process is a composable. You can filter or map or combine into other streams. You can split them and you can merge it and into many different shapes. And also it is very functional, which means that this is where a lot of developers find it easier to have mental constructs. So you are convinced that data motion and stream process is the best thing in the world. However, there's no free lunch. Moving data is going to be very expensive. Here's one map of visualizing the AWS transit cost. This is how AWS makes money from you and your CFO have a very interesting and digging into these numbers. Now anything beyond the BPC will cost you daily. Now it gets worse as you transfer to NAD gateway or NAB and two very different proxies. So it can be very expensive. And if you don't carefully model your cost on your micro services or the databases, it can be very different between successful project or your shutting down your business. Now let's typically look at other stream existing stream process, how they've done it. They typically have some kind of ingest streams that collects the data and put that into different processing such as filtering. Now the ingest stream can compress into smaller dataset but still is somewhat similar magnitude. Now the problem is that this network, the large network is going across the network. And those network processing as you can see on before it can be really expensive. And you have to be very considerate to architecture network such that it's very cost efficient and reduce latency. The next, there is a security problem. Whenever you are moving data across the network, it gives the potential for a network or the hackers and other entity to steal your data. The zero trust model assumes there's no basically secure or trusted network. You have to have a mind of that than your network is inherently insecure. And the more data you move and the faster it moves, the harder it is to secure those data. So how do we solve this problem? So basically solution is to actually reduce the data moment. And one way to do is that instead of moving the data, we reduce the data by moving a compute to near to the store as much as possible or to the source of data. And we leverage power compute to reduce the data to small as possible. And the most low, with the powerful most low, the computer and computer is getting faster while your network is not getting that much faster. So by doing that, we saving the bandwidth and reduce the latency at the same time, saving the cost and increase the reliability. And since data is a small, it's much easier to audit and enforce the security policies, okay? So now let's go back to this previous example and see how we solve this, the problems we encountered previously. So instead of sending stream data to another node, we can actually combine into a single node where we process the data and only send out a data that makes sense for the downstream stages. So in this example, we actually gained factor of 50 to one reduction in bandwidth. And I'm sure your CFO will love that. And if the network is go over to the AG devices, the savings, of course, increases. We call this a project called Smart Stream Processing, where instead of having a stream to be kind of dump pipe, we enhance by adding intelligence to do more to make this one network efficient. Now with the AG devices, the smart streaming can be even more powerful. So one of the biggest issue dealing with IoT devices which is large data volumes. IoT device doesn't usually have reliable or fast connections. It's not feasible to ship all the raw data to cloud for processing. It will be too expensive and takes too long. With the smart streaming, we can move data processing in the edges and move the only available data to cloud for the further analysis. Now to enable smart streaming, we need a compute infrastructure with following properties. First, it needs to be portable so we can move to support different CPUs such as on 32 or a powerful compute servers like Intel AMD. Next, it need to be efficient to minimize energy to run small devices. Third, it need to be able to send box such that it can be run in any environment. And the lastly, you need to support different languages. It's no longer sufficient to just support a single language or a single set of language-related families. Of course, you know the answer. We believe that the WASM or WebAssembly is truly what we call movable compute. The existing the compute platform is doesn't provide isolation and security guarantees that we require. Brutal machine is just basically too heavy and it doesn't provide probability. The containers are much better but it doesn't provide a true isolation because there's a lot of money spending on trying to lock up the containers. The other, the traditional approach of moving to the jar offers portability but very tight to Java virtual machine and it doesn't provide actually the true isolation. So WebAssembly technology is a game changer by providing true isolation with the portability. It is better tested because everyone is not running on every browser. With WebAssembly, now we have opportunity to move compute to anywhere and this is the new type of infrastructure that we can advantage to move everything to the data emotion. The bigger picture is that the portability and isolation provide to allow us to reshape our compute infrastructure and data stack by flattening into what we call unified stack. Instead of having to think of a separate compute and data stack, we can think of as just a unified stack. This stack can go from edges to cloud, can span a continuous, can span to different location or even to space and allows to disaggregate it's a lot of existing monolithic micro service or data stack and create a lot to compose different more flexible stacks for our needs. So now let's go into how Fluvial leverages WebAssembly. Fluvial combines streaming processing with the powerful WebAssembly modules, which we call smart module. And Fluvial use the smart module as a fundamental part of its platform. The Fluvial has the part of a platform is control plane which actually distribute this smart module into different parts of platform. It can move the smart module to the connector that we can transform a very different protocol into shapes necessary for further down into streams. It can also go into the main, the streaming process unit where you can actually transform the stream into or merge them. And lastly, it can also power our consumer which can be other edge devices. For example, IoT devices, it can be drone or can be autonomous vehicles. Now we go further into how Fluvial use WebAssembly to how compute the underlying streams. Smart module itself is very opinionated WebAssembly module optimized for streaming process. Now we want to have smart module to make to be easiest possible for developers to write. So we abstract out the smart modules to very simplest constructs. The green box is where the smart modules are in relation to the rest of WebAssembly modules. The blue on the left side is the store binding. This is where the smart module intersect with the data from the store. Now the store can be different types of implementation. It can be a file, it can be EBS or S3, or it can be a storage array. So regardless of the storage devices, the store binding basically abstract out into a unified API. And then we layer on top of that ROS API to of course to give us the API in terms of WebAssembly module. And then we have, we are working on different language bindings to expose that into different language such as assembly script or Python and other possible languages in the future. So this is example of writing this filter in Rust. And this is a simple example of filter that filter out the records based on whether they contains, you know, letter A. Now the interesting part is that we're using the ROS's language constructs go procedure macro to basically annotate them to indicate that what kind of binding we want to perform. So in this case, of course, the filter. The next, this is the ROS API signature and it provides the filter and then it has only one single argument, the record. So the record is the basically the primitive, the basic constructs are the storage, the bindings. It can either return Julian to a false if it satisfies the criteria or it can return the exception. The result type is the ROS equivalent to indicate that it is exceptional course. Okay, and then, so this allows to provide the same API for same storage bindings. Okay, next, this is where we converts the binary records into a string. Because FlubiaStream can handle any arbitrary data, you need to cover it into a basic data type for processing. And then lastly, of course, it's a simple expression to indicate that if the string contains A or not. Now note that the user doesn't have to worry about how to encode and decode. It's basically, it's done through this wrapper that the procedure macro that generates all the glue routines. It's done by our smart modules framework. Now, let's go further go down into how actually binding the stores with the smart module. By default, the FlubiaStreams are stored in a pen only file system. They are immutable. The strings could be stored also in different file system depends on infrastructure. The regardless of implementation, the streams are considered as immutable and ordered. For file system implementation, these records are grouping of batches. So when we actually send these records to Flubia, smart modules, the entire the records, group records is really in a single batch. And in this case, the batch group starts a file offset 1,000 and to 5.3, 1,200. So in one single shot, you'll read those file contents and send over to those smart modules. So first, what it does is it copies the blocks of memory into Wasm memory block and then increments the last, the read position. Next, it, now the copying of binary block into Wasm is actually turned out to be quite interesting challenges. At the time when we were building this initial implementation, the Wasm, the reference type and the other spec was very working in very stage. And there was much of way to looking at the implementation and the documentation was very skimpy. So the approach was suggested by someone else that was encountering the same problem. So the way you do is you actually implement the ALOCK and DEALOCK in the Wasm and you invoke this ALOCK function to create a space in a Wasm memory space and then you copy those blocks into the Wasm memory space. Of course, when you are done, those blocks must be deallocated. And then after we copy this memory blocks into Wasm memory space, that we deserialize into Rust structure so the Rust program can perform filtering functions. Now, actually this is not normal filter function. This actually, the filter function is, it's actually part of a map filter function because what happened is is that the outer, the map filter take those filter results and copies the successful records that satisfy criteria. And this actually makes it easier to implement it. Another way to do is to pass this result through and forth back into host and do another processing. But this turned out to be much easier and that's also creates a unified pipeline. Next, okay, and then we copy after, we copy this filtering this records, we send back to the host's SPU and then we send back to the downstream consumers. The next binding is a map and it's very similar to record except it can transform from one shape to a different shape. And because we share the same logic as the filter map, it's essentially a same process. Now, those previous bindings are very similar. The next binding aggregate is different. With the aggregate, we maintain the interim state to pass back the state back to the smart module to do further processing. So with aggregation, you can implement things like sum, average, mean and max. This is a list about these current bindings that we implemented. We have things like array map which is placed the records into multiple different records. Now, we have implemented some basic use cases but there are so many more interesting use case there that we are working on in the future. The very good example is a join which is able to combine multiple streams based on some criteria. Other is windowing which allow to process records in basically time-based functions. Other interesting use case is a key value which allow to process the records in terms of key and values. This is very critical for a lot of microservices and business logic. And then there's a materialized build which allow to cash values for subsequent access. And then the transaction support which use combined with external process to build some long-term workflow. If you are interesting in supporting other store bindings or different ideas, please reach out, please participate in Flubia projects. Now, as you can see the current store binding is done in very simplistic manner. So we are looking into using the new capabilities of WebAssembly such as a reference type, WASI and of course the component model to link to process these records and the stream in much more efficient manner. To allow, this allows to build a sort of a zero cost stream processing and to efficiently process this data with the network more efficient way. So Flubia is still a young project but we think with the power of WASM we can make a stream process easier and can be used everywhere. Here's a list of project docs and where you can get it started. And contributions always welcome. That's it. Yes, we have a very, the question was, I guess, is there work like a CAFA? That's the reliable, yes. So currently we implement at least once and we are working toward to implementing exactly once in the future. Well, we think our streaming process allows to build a lot more use case than the CAFA does. And of course, being in Rust and allows for the WebAssembly, we can actually spread the stream process to edge devices where the CAFA doesn't. Because being in Rust, for example, our, the memory reduction is pretty much over one-tenth of a CAFA because with a Rust we don't have to have a garbage collection. So you don't need to do a lot maintain this extra overhead.