 Databases, a database seminar series at Carnegie Mellon University, is recorded in front of a live studio audience. Funding for this program is made possible by Ottotune. Google. Thanks for coming for another Carnegie Mellon Database Group seminar series. Today we're excited to have actually someone who is a visiting PG researcher with us many years ago. Ying Jun Wu is the CEO and co-founder of Rising Wave Labs, the company building the Rising Wave streaming system he's going to talk about here today. Ying Jun has been at a lot of places. He was also, he was at MVDOS Redshift, he was at IBM Research, he did his PhD at National University of Singapore in Databases, and that's when he came to visit us at CMU. So we thank Ying Jun for being here and as always to be a question for Ying Jun as he's given us a pause, please unmute yourself, say who you are and feel free to do this any time. We won't just be a conversation and not have Ying Jun talking to himself for an hour on Zoom. So Ying Jun, good to see you again, man. Thanks for being here. Okay, great. Great. Thanks for having me here. So today I'll be talking about the Rising Wave, the title, yeah, it's a re-eventing streaming processing in the cloud era. Yeah, I'm Ying Jun Wu and I'm currently the founder of Rising Wave Labs. It's a startup. And before I started this company, I was a software engineer at AWS Redshift, a researcher at IBM Research, I'm an animator, my PhD from National University of Singapore, and I was also a vision PhD at Kachaman University, you were with us for a year. We have a paper and you didn't ask me, you didn't ask me to be on your thesis committee. Oh, yeah, that's right because for the rule in areas with that, okay, if you cause any of my papers, then you cannot be in my committee because we want to have some surplus to review the thesis. That's why I got Ellen, right, Ellen Pickett. Sure. Yeah, great. Yeah, actually I was pretty well-known at CMU-DV, right? And one of my papers, yeah, this is a paper we caused, right? And the paper used to have pretty nice titles. But unfortunately, the VODB chairs didn't really like it. And they threatened us to reject our paper unless we change the title. So well, unfortunately, I have to save my PhD, right? So I have to obtain my PhD and Andy really want to get his tenure. So we have to surrender, right? And the second thing I was famous for at CMU-DV was that I was actually frequently blamed for the death of Palatine. And Palatine, well, if you know this database system, well, it was actually the first self-driving database system built as Continental University at the database group. But the project was actually abandoned two years, oh, no, three years ago, right? It was abandoned in 2019. And many people, including myself, believe that the reason was because the project actually got the same name as the popular bike company, Palatine. And he insists that it was me that killed the project. He blamed me that I didn't really write any test cases for it. But I still insist that, OK, I actually wrote a lot of test cases. And all my test cases can pass, even though the test cases don't really test anything, right? But anyways, because I was blamed for the death of Palatine and I know that probably I don't really get a good chance to stay in academia. So I have to leave academia and join the industry. And now I'm currently running a company called the Riding Wave Labs. So Riding Wave Labs is currently a self-made startup founded in early 2021. And we are currently building a system called the Rising Wave. It's a cloud native SQL database for stream processing. And in this company, I'm the boss. And I'm super paranoid about the database testing. And we now have a very strong dedicated testing team responsible for testing to test the database. And thanks to the restrict testing, Riding Wave now is production ready. And it has already been deployed in several use cases. And actually nowadays, well, to be honest, I'm actually nowadays pretty quite famous for database testing. And a few months back, I was actually invented to join the DB Test Workshop at the Sigma Conference to share my experience in building the industry-strength database systems with strict industry-strength testing. So what is Riding Wave? Well, Riding Wave, well, I mean, I actually have so many keywords that can make it high, can get it high, right? It's an open source database. It's a Postgres compatible. It's SQL database. It's cloud native. It's built from scratch in Rust language. We actually have a blog describing how we can rewrite the entire system in Rust, right? But the only thing you need to remember in today's talk is that we are building a cloud-native streaming database or a cloud-native SQL database designed for stream processing. And it's definitely not a pitch talk, right? So I will not tell you about, OK, I will not show you a demo about, OK, how great Riding Wave is. But I will describe a lot about the technical side of the system. And today, let's focus on the stream processing. OK, actually, I don't really think that's why I need to educate a lot about what stream processing is. Conventionally, people use the batch systems to process data. What they typically do is to collect a large amount of data generated over a few hours or a few days, ingest them into the batch systems, such as Snowflake or Redshift. I used to work in Redshift. And I believe that's where Redshift is actually better than Snowflake. And the wrong pair is on top of them. And in this way, the result freshness can be as high as a few hours or even a few days, meaning that you cannot receive the result instantly as the data comes in, right? But batch systems are very good for doing a lot of things, such as reporting, data mining, machine learning. You can always use batch systems to, let's say, something like the data science, right? But the problem here is that not everyone is happy with the batch processing. And many people nowadays really want to make decisions instantly based on the fresh results. And that's why we have the stream processing. Stream processing systems leverage the power of the incremental computation to lower latency from a few hours or even a few days to just a few minutes or even a few seconds. And instead of processing millions or even billions of tuples in batches, the streaming systems trigger the computation once the tuple is ingesting to a system. And that is, every time a new tuple comes in, the stream processing system will be very fresh results. So that users can always see the most recent results based on the most generated from the most recent data. And obviously, the lower the latency is, the higher the business value we can get. With stream processing, we can do dashboarding, we can do monitoring, we can do our learning, right? We can extract a lot of business insights from the real-time data, right? And we also want to ensure that the business value is high enough. But the problem here is that lowering the latency also indicates that users may have to pay more money for it. Because it may cost more resource, right? And people really hate to make more money. And that's why we build a system called a riding wave. Riding wave is a system that can achieve low latency and low cost. The goal of building this system is not to build a system that is, let's say, 10 times, 100 times, or even once in a while, times faster than existing systems, existing streaming systems, like Apache systems, Apache Flink, or whatever system it is. The goal is to make sure that it can achieve cost efficiency. Okay, then how can we achieve cost efficiency? Well, you may consider that, okay, probably we can put a riding wave into a cloud, every cloud there, right? But instead of, I mean, before talking about the cloud, let's first discuss what cost is in the stream processing. We can think of this question from three dimensions. The first one is the normal execution. During the normal execution, during the stream processing and normal execution, we really want to achieve high performance. We really want to hyposummit that probably low latency and high throughput, right? But can consume as few resources as possible. And it doesn't really make sense for us to achieve low latency by consuming, let's say, 1,000 machines or even 10,000 machines, right? And the second dimension is failure recovery. The stream processing essentially means continuous query processing. A streaming database system needs to, a streaming system will need to continuously emit results to users, but the machine can fail, right? If the machine failure occurs, we don't really want to wait a few hours until the machine gets recovered. And we don't really want to redo all the computation from scratch. But, and what we really want to get is instant failure recovery. That is, I mean, if the machine fails, we probably can get recovered from such kind of failure instantly without any latency, without any delay. And the third dimension is elastic scaling. Well, one of the most challenging scenario in the stream processing is that the workloads can fluctuate. We may suddenly witness a workload burst, right? And when we come from workload burst, we want to scale out instantly to ensure that the system can sustain such a high workload burst, right? But if there's only a few tuple comps in, then we really want to shut down most of the machines to make sure that we don't really want to, we don't really waste a lot of resource. And all these three dimensions actually are related to a concept called the key technology called the state management. So what is state management? And why do we need to manage the states? Well, this is because we need to do the state for stream processing, where we need to maintain internal states for different operators, including the aggregations, goodbye, joints, window, or whatever, or many other operators. So that is, whenever you want to operate the data using aggregation, goodbye, joint, window, and many other operators, you have to do state for stream processing. And let's just explain what state for stream processing is using the very classic ad monetization example. So imagine that you have two streams. One is the ad impression stream, continuing the events where the ad was displayed to a user. And the other one is the ad click stream, which captures when the displayed ad was clicked by the user. To charge a user, you have to match which ad impression actually leads to a click. And in other words, you have to join these two streams based on a common key. And in this example, it's the ad ID, because this is the unique identifier of each ad that is present in the events of both streams. You need to join these two streams. And to support such kind of drawing operator, you will have to maintain two states in the streaming operator. One is a hash table for the impression stream, and the other one is the hash table for the click stream. Every time a new table comes in into the impression stream, you will have to check whether it's a match in the hash table for the click stream. If there's a match, you have to emit an output. And similarly, if a table comes in from the click stream, you need to check whether it's a match in the hash table for the impression stream. If there's a match, then you have to emit the result. And such kind of streaming operator must be fully elastic. Because as I mentioned, the workload can frustrate dynamically, right? And we have to scale in and not scale out the streaming operator based on the streaming workload. The problem here is that how we can manage these internal states to ensure that the all stream processing can scale in and scale out elastically and can best leverage results. How we can achieve high performance at very low resource consumption. How we can achieve faster further recovery. And how we can achieve smooth elastic skating. Well, to answer these questions, let's reverse the evolution of the streaming systems. Well, all friends in Sweden and in the Netherlands actually published a survey two years ago on the evolution of stream processing. And there are basically three stages, as shown in this figure. There are basically three stages in the stream processing history. And I call the first stage as the single node arrow. In this arrow, researchers and the practitioners build the streaming systems based on a single machine. And in this stage, we witness the system like Neogora CQ, the Stanford stream project. And in the commercial side, we saw the IBM System S, Oracle CQL, and Microsoft streaming site, SQL Server streaming site. As Google published the MapReduce paper, people have already shifted their attention from building the single node streaming system to the distributed streaming systems. And I call this parallel as the big data arrow. And in this arrow, we witness the burst of all kinds of the streaming systems, including the storm, the SAMHSA, Spark streaming, Flink, and many other popular streaming systems that are used in today's environment. And as the cloud is becoming more and more popular, people have started investigating how to build a streaming system in the cloud. And we have already seen the systems like Ray, like Archon, Neptune, and actually, Riding Wave is one of them. So, on today's topic, let's focus on the state management. Let's first think about how we can manage the states in a single node arrow. In a single node arrow, what we have is only just a single machine, right? If we want to build a streaming system, then the only resource we can leverage is just a single machine. To support stream processing, to support continuous stream processing, the only way we can maintain a state, the only place we can maintain a state is on the local memory or the local disk. We have to put all the internal state there. And the state size is actually limited by the memory size, as well as the disk size. If we confront the workload burst, then what will happen? Okay, the state size will increase, but if the state size keeps increasing, we will find that and it keeps increasing and hits the memory size or the disk size, then what will happen? Okay, the only thing that will happen is that to get the system crash where, see, boom, it seems it just went down and we can no longer process any data there. This is quite unfortunate. But what we can do in a big data era, in a big data era, we have more machines. We have probably 10 machines. We have probably 100 machines or even 1000 machines in a local cluster. And in the big data era, the node of the machine is actually the minimum resource unit. If we run out of compute or the third resource, the only thing we need to do is to just add more machines. So given these characteristics, most of the systems built in this era adopt the so-called coupled compute and storage architecture, which means that the compute actually moves along with the storage. And using this architecture, we can do so-called the embarrassingly power execution. If you are not quite familiar with this term, well, this term was actually invented during the, I think he believed that's why it's invented during the big data map reduce era, right? If we want to adopt this, if we use this architecture, we have to do the embarrassing power execution, which means actually we can maximize the performance by shorting our data into different machines and process the data locally. So even this kind of, back to this example, the meta-admonization example, if we observe that more events come in just in a sudden, then what we do is to add more machines, shuffle the streams, and then process the streams in different machines. Well, we can definitely scale in and scale out elastically by shorting a data and by shorting the state and put them into different machines and process them in there. But the problem here is that the resource utilization can be super low. And this is because when adding one machine, one or more machines, it actually means that we are acting more both the storage and the computer resource. Okay, any question? What do, how do you change? Ben, do you want to meet yourself? Yes. If data scale changes over time, like more data comes in or data expires to your attention, how do you change the short count and change the number of nodes in the cluster? Well, in the big data era, I think that's really the thing we'll do is that, so first thing is that if it was five years ago or 10 years ago, the thing we'll do is that we can manually change the our Java code. In the Java code, we can actually identify the number of parameters we want to get. So that's okay, we can do the reshuffle. And there are a lot of research and I believe that's why a lot of industry projects also implement that. They will actually measure, okay, how many machines I really want. They will estimate how many resources I really want and the shadow machines and the shadow data based on the number of machines estimated, estimated number of machines we want to get. Can you change it after the fact or only at initial stream creation time? At the initial creation time, we can set a number of parameters and that was quite popular in main systems like Apache Storm and also Flink and or even Apache Spark streaming. But for nowadays, I believe that's okay, most of the system can support dynamic scaling. So that means that we probably do not really need to set it before hand. Again, Ben's asking how do you handle this now? Okay, now we don't really need to handle that because I mean... Okay, so this is just the big data area. But for us, what do we do? We basically monitor the state size and decide whether we want to spare the state or not. Yeah, make sense? Okay, cool. So yeah, basically, yeah, okay. So yeah, as I mentioned, okay, in the big data area, the problem here is that the resource utilization rate can be super low and because, well, I mean, every time you add more machines, it actually, it do not just add only the storage resource or only the computer resource. Yeah, actually adding both the storage and computer resource. And it's really super hard for us to consume all the resources provided by these machines, right? And hence comes the resource we raised. And now let's talk about the cloud area. So with the cloud, well, in the cloud, what do we see? That the computer and storage resource actually manage separately. And let's say in AWS, we can have both the EC2 and the S3, right? And the EC2 is actually the computer resource. And the S3 is the storage resource. If we run out of computer resource, then what do we can do? Well, we can just buy more computer instances. And what if we run out of storage resource? Well, the good thing is that we were never run out of storage resource in a cloud, because in most of the cloud, let's say AWS in GCP, Microsoft Azure, the storage resource can scale automatically. So we'll never need to worry about whether we were run out of the storage resource. So given such kind of architecture, we can now design the, adopt the so-called decoupled computer and storage architecture. So what do you mean that we can essentially build an execution engine on top of the cloud storage? So that's where the compute and storage can scale in and out independently and infinitely. So back to the Edmontization example. So the naive solution in the cloud era was to maintain, where to maintain a state? Well, the naive solution is just to maintain the state in a remote cloud storage, right? We can just put this on the state and into the, and insert them into the S3 bucket. And if we run out of computer resource, then we can just add more machines, just add more EC2 machines. And all these EC2 machines can see the, can, can see the S3 data, right? So that's key, we can, we can quickly distribute the work accommodation workload into different machines. And what if we run out of storage as I mentioned, that's okay, we will never need to worry about whether we were run out of S3 resource because S3 will automatically scale out itself. But the problem here is that it's not realistic to build such kind of, to adopt such kind of naive solution. And the key problem here is that if we put all the states, internal states into the remote storage, let's say the S3, then we'll find that the state multiplication will become the remote access, which means that every time we want to read or write data, the data in the, in the, in the internal state, then we have to trigger a remote access from R2 to S3. And S3 is super slow. So that is too slow to support low latency processing. If you want to check the S3 stock condition, S3 stock condition, you will find that through this database claim that where the latency can be as high as actually 100 or even 200 milliseconds, right? Let's say that's where if you want to process 10 tuples in just one second, then in the worst case, you may have to wait two seconds to process all the, to, to finish processing all these 10 tuples, right? Which can be super expensive. And the second problem here that S3 is actually charged based on a put request basis, which means that every time you get a data or put a data from R2 S3, you will be charged by S3, you will be charged by AWS. And I don't really want to, and I believe that nobody really want to pay AWS so much money, right? Then what do we can do? Well, definitely we have the lucky thing is actually we can have, we can maintain the data in different services because AWS essentially provides us with different services such as EC2, EBS and S3. So EC2 can be, can be actually viewed as a so-called volatile storage. It's super fast, right? So the disk is local disk. It can be the NVME SID, right? It's super fast. But the data will get lost if it's not replicated. And we got S3. S3 is persistent storage. It's very slow. As I mentioned that the latency can be probably 100 millisecond or even 200 millisecond. But it's persistent storage. It provides 11 noise durability. And in the middle of EBS, EC2 and S3, we have the EBS. And EBS can be considered as semi-persistent storage. It's fast enough, but it only provides five noise durability, which means that really, I mean, it's still not the persistent storage. But EBS is good enough to cache some data. So given these services of different characteristics, what we can do is actually we can use the so-called LSM tree structure to maintain the internal states in different storage medium. So the LSM tree stands for the, I mean, if you're not familiar with this terminology, then LSM tree stands for lock structure merge tree. It's actually their structure typically used for dealing with the right heavy workload. And given this kind of architecture, once the streaming data is ingesting to the streaming service, it will not be directly persisted into S3. And instead, it will be written into EC2. And periodically, the data will be compacted into a lower level medium, which is the EBS. And I mentioned, EBS is fast enough to serve as a cache between the hard data, between the EC2 and S3. And eventually, the EBS data will be further compacted into S3, which is the persistent, which is the key, a true persistent storage. So, well, I mean, it's quite straightforward to come up with such an architecture, right? But what's the key difference between the big data solution and the cloud solution in the streaming processing systems? That is, what's the key difference between the coupled compute storage architecture and the coupled compute storage architecture? I mean, in the coupled compute storage architecture, what we do is that we will maintain the states, the internal states, in the compute nodes. And to ensure that we can instantly recover from the failures, we will directly check point the states into the persistent storage. But in the coupled compute and storage architecture, the state itself is essentially persisted in the persistent storage. And what we did to optimize performance is to add in a caching layer at the compute node, or between the compute nodes and the persistent storage, to make sure that the performance is fast enough. And in this way, if we can store the state in the persistent storage, essentially, the state itself can be served as such a point, because we can always leverage multi-versioning to ensure that we can reload the state, right? So what's the difference between these two architectures? Let's talk about, okay, what if we only have, let's talk about just a single, okay, what if the internal state is small enough? Well, if the internal state is small enough, then what we will find in the coupled compute and storage architecture is that the state can be fully cached in the local disk, right? If that's the case, then essentially these two figures are equivalent, most likely, I mean, you can consider like equivalent especially in terms of performance, because all the data, all the internal states can be cached in the states, and there will be no cache minutes there. So that way we can always load the data from the local disk and make sure that we can achieve low latency during the query processing. But what if we are trying to handle the big state? Then here comes the difference. If the state is big enough, then as I mentioned, well, I mean, if we adopted the coupled compute and storage architecture, then we don't really need to do anything to maintain as the state grows. But if we want to adopt the coupled compute and storage architecture, if the state is big enough, we have to provision many machines to ensure that all the states can be maintained in the compute nodes. And here comes the resource waste. And what about the failure recovery? If we adopt the coupled compute and storage architecture, then failure recovery can be straightforward. I mean, yeah, kind of straightforward. Let's assume that this machine fails due to some, yeah, due to some accident, right? The thing we'll do is that we will recover from a reloaded state from the checkpoint, from the person's storage, and recover from that checkpoint. But how can we handle failure recoveries when we adopt the so-called coupled compute and storage architecture? In this case, as we mentioned, the state is essentially a checkpoint. If this machine gets crashed, then we do not really need to reload anything from the person's storage. We can't directly stop processing the data because the compute node, the new compute node, can directly read from the remote state, right? Can read the loss of state from the remote storage. And what about elastic scaling? Let's assume that we want to scale in this machine from one machine to three machines. We want to scale out, right? Then as I mentioned, the thing we'll do is that we will partition the state, shuffle the state into three machines, and then load a new tuple into these three machines, and then process a state there, right? But if we adopt the so-called coupled compute and storage architecture, things will change. We do not need to do such kind of shuffling because all the state essentially can be fetched from the remote storage. And the old machine only maintains a cache. We do not need to do anything like state migration because all these states can be directly loaded from the remote storage. So what's the key, what's the challenging problems we confront when implementing such kind of architecture? Well, there are actually several challenging questions for us. The first one is LSN3 compaction. Well, let's rethink this architecture. We maintain the internal states in LSN3 architecture, and we need to parallaxly compact the data from EC2 to EBS, and then from EBS to S3. The problem here is that compaction can result from jobs. And people may argue that probably we can use something like remote compaction or lambda function to offload the compaction workload into remote machines. But we have already done a lot of experiments and found that it doesn't really work because anyways, you have to, anyways, to shift the data from your local machine to remote machine, or to do any compaction, you have to incur the CPU, you have to incur high CPU utilization rate, which will influence the stream processing performance. Right. So we still need to fine-tune the compaction mechanism to ensure that the compaction will not trigger a huge performance drop. And second challenging problem we encounter on there is the cache maze. As we mentioned, EC2 and EBS only hold a cache. But if we cannot find the data in this cache, we have to go for the S3. And the S3's latency is super high. Well, how can we mask out this high latency from the customers or from the users? In writing wave, we do two things. The first one is out of order processing, which means that if we want to process two tuples, and essentially we can change the order of processing these two tuples, right? And second thing we did is that we actually overlapped a fashion from S3 with a computation, meaning that the user will not be aware of such kind of S3 latency because we always do some computation over time. And search, is this like COVID teams or you just mean to have one friend doing computation, one friend block waiting for the request? Yeah, something like that. Yeah, because you have to fetch from S3, right, from remote, so it's not quarantine. Yeah. Just make sure. Yeah. And such challenge we encounter on there is actually how we can implement so-called state as a checkpoint, right? I mean, if the state serves as a checkpoint, it essentially means that we have to maintain multiple versions for the state, right? Because, I mean, you have to ensure that you can recover from the failure, right? And we have to reload the previous state from the checkpoint, right? So, I've visited several questions, right? Yeah. Karini, do you want to meet yourself and ask your question? Yeah. So, my question was, if a computation is split across multiple compute instances, and in that case, is there a need to share the state, for example, the hash table would need to be shared between different compute instances working on the same joint query? In the compute instance, right? Well, I mean, I think where it totally depends on how you define sharing. I mean, in terms of, I mean, so the different, let me go back to you. Yeah, let me go back. Okay, anyways, yeah, I can use this slide. So, basically, these two machines were not shared the state, were not shared the computation. They were to, I mean, they were, then the computation is still different. I mean, they process different data, but they have the same view of the entire state. I mean, they have the global view of the internal state, right? Say that, okay, we have the, let's say in the state, we have three numbers, one, two, three, right? Machine A can see one, two, three, and machine B can see, still see one, two, three, right? They have the global view of the state, but they still process different tuples. I mean, in state A, we can process number one, the data A, data one, and in machine B, we can process data B, data two, right? So, can the one, two, three in the state change to one, two, three, four? Yeah, so basically, yeah, if you want to change, if you insert a four, then yeah, it will persist into the remote storage, right? So, in that case, the update would need to be propagated to every instance that has a cache locally. Oh, okay, so yeah, how about a cache? No, no, it will not be replicated, yeah, replicated into different caches because, well, I mean, as I mentioned, well, if you have a four, if you have a four, these machines will still process data one, and this machine will still process data two, and maybe in this machine, it will process data three and four, but four will not exist in these machines. So, that means that the sharding strategy that I have decided before I start my query computation, I have to stick with that, no matter how much data I see. So, for example, I can suddenly start seeing a lot of, lot more of fours and threes. So, the third machine will have to, like, would want to offload some of its work. So, is that a little concern? So, if you, yeah, so basically, your question is, that's whether we, whether we sometimes see a lot of fours come in, right? I mean, in this case, we're, I mean, we have to spill out. So, let's go back to, yeah, go back to, yeah, this slide. So, basically, we want to scale out this machine. The strategy is not, I mean, the strategy is not to split the cache here, because all the states can be found here, finding the percent storage. So, the only thing we need to do is to reload the data from the, reload the storage from the percent storage instead of, okay, shuffling the cache, splitting the cache. We do not split the cache. I see. Okay. Thank you. Okay. I think, well, yeah, there are some engineers here helping us to, okay, anyways. Okay, well, okay. Oh, yeah, yeah. So, back to this slide. So, how to implement a state as a point? It means that we have to, we have to make sure that we have, we maintain multiple versions. We can still find the old versions, right? So, that's key. Once the machine fails, we can still reload the exact checkpoint we really want, we want to get from the, from the, from the percent storage, right? So, we adopt, to implement this, we actually adopt a multi-version control, and we use the epoch to identify version names. Okay. Yeah. I will not go deeper into, okay, how we can implement the MVCC here. All right. So, the summary here is actually the key idea of what we adopt in building writing web is to leverage the coupled computing and storage architecture. And there are two strategies. The first one is that we can adopt the remote storage to maintain the internal state. And the second one is that we can add a caching layer to reduce the latency. Well, the question here is that where is the new idea? Well, to be honest, it's definitely not a new idea. The same idea has already been explored 10 years ago. If you read a block, I mean, 10 years ago, the most popular database system, I mean, the most popular streaming system in the world was called Apache Storm. And if you read the, read a block from nascent, you will find that you will find that the tree didn't essentially implement this architecture. It adopts, it adds primitives to do for doing stateful incremental processing on top of any database or persistent store. And inside of Google, they also have two systems. One is the mail wheel, and the other one is the data flow. And data flow is actually, you can consider like a build on top of mail or mail wheel. It's actually a next generation of mail wheel. Both of these systems use a big table as the external state to maintain the external, to maintain the internal state. So essentially, they still, they still adopt the, I mean, the remote, they still adopt the remote storage to maintain the internal states. But if you read this paper, about six years ago, linking folks for a popular block, which is also quite influential in the, in the streaming world. They probably block how the stream person had problems prior to data access. And they call this remote store idea as the traditional model for building applications. And the local store idea as the not the traditional model. They call it as a new idea. And afterward, they proposed the popular paper called SAMHSA. And in this paper, they, they claim that in this example, in the, in, in SAMHSA, they can handle state, they can handle states efficiently, improving latency and throughput by more than 100 times compared to using a remote storage. So what you can find here is that key, essentially, before the big data era, I mean, I mean, probably 15 years ago or 10 years ago, what we have is that key, we want to maintain, people are talking about, we want to maintain a state in most stores. And five years ago, people started thinking about key, we want to maintain a store, a state in the local stores. And most recently, we found that probably the local storage is still not the best place to maintain a state. We still want to maintain a state in a remote storage. And why this happens? Why don't the existing popular streaming systems such as SAMHSA or Flink adopt the idea? I believe that's all about the cost. You know why? Because if you think of the data processing systems we built five or 10 years ago, you will find that there's no S3, there's no EBS, there's no E2. I know that's okay. I mean, at least it's not that popular at the time. So if we want to build such kind of scalable caching service or the persistent storage service, we have to build a system, build services on our own. And as a company, we have to eat a cost. We also have to build such kind of complicated system. That's why they decided to just maintain the state in their local machines, so as to achieve the high attack scalability. But in the cloud era, things can change. In the cloud era, the cloud infrastructure essentially offers us to maintain the states in two different services and they have already provided us a service. And in the cloud, you can imagine that the resource is unlimited. Given this, the thing we objective we want to achieve is not to achieve low latency, it's not just to achieve low latency, it's not just to achieve high scalability, but to achieve high efficiency, high resource rate even to achieve high performance and low latency and high scalability. Yeah, that's why we build the rising waves from scratch. That's why we don't really adopt any existing system. We don't really build rising waves on top of any existing systems. Okay, yeah, that's all my talk. And again, well, rising wave is a cloud system. So if you want to try out rising wave cloud, you can just check out so risingwave.cloud and insert your invite code pattern to get an early access. Yeah, we will have a manual check here and see where we input the invite code. And rising wave is also under Apache license, so that's okay. You can definitely try out your own machine or your own clusters. Okay, thanks. Well, thanks. Yeah, question. Great. So I'll pull out a background or else. We have plenty of time. The audience has any questions for you. Go for it. Hey, my name is Doug. I'm not a student or an alumni, but I have a question. As more projects are embracing decoupling compute and storage, which I think has come up in a few of these sessions now, I find myself remembering something from Joe Armstrong along lines of like, why not just send a small program to the data? And I wonder if the pendulum can be swinging too far in the other direction, like when I hear about the compaction challenges you're having, I wonder if there's merits in the storage layer having some computation. And like, I can see that being a lot easier on something like open EBS, but virtually impossible with AWS EBS. So do you feel like your choice of cloud technologies constrains your architecture in that sense? So basically, yeah, the question you have is essentially, that's great, whether we can leverage these called a computational storage right? So actually, there are several. Yeah, I think it's definitely public. But yeah, what I was seeing Russia, there was a project called the I've got a name, but so we leveraged every PGA. Well, if you check the database dot Russia stock, you will find that key essentially the leverage every PGA to accelerate processing. Yeah, okay. Okay. Thanks. Yeah, essentially, it leveraged the FPGA to accelerate per processing. But the problem here that K it doesn't really offer that kind of capability to the to the users to the customers, right, to our customers or to vendors like us, right? So essentially, it's not quite possible to for us to adopt such kind of power like FPGA, FPGA, accelerate storage, right, to do processing. So we have to use the conventional resources such as easy to write. So, I mean, I think that's that's a privilege of the of the cloud vendors that will they they have they have the power. So I mean, if you have the FPGA, then you can, yeah, probably that can be leveraged by address, but probably not us, right? That's the that's a problem. That's why it was definitely on the on the hardware, we do not. S3 does expose some filters, but I suspect it's not the filters, like it's too simple to what you what you what you what you're trying to do here, you're not trying to read data, like a bunch of data that's on S3, like Redshift or snowflakes trying to do, you're trying to maintain your state for your, for your customers. Yeah. Yeah, actually that kind of functionality is called the feature called S3 select. So S3 select, well, I've not checked S3 select for a while, but S3 select was designed for the, I mean, for Parkway, for CSV format. But in all system, while the internal state is still maintaining our own format gospel, I mean, we want to make sure that the the data is, I mean, the data processing is fast enough, right? But we we do support, I mean, if the data is persisted in all database, we do support converting the states into, I mean, maintain a state, maintain the data in open format, that's possible. But I mean, S3 select is still quite slow compared to, I mean, your, your, your own state format, right, your own format, right? So I think I still do, because of the performance reasons, yeah, we do not adopt that. Yeah. I have a question about your storage tier. So if I'm looking at the slides for your storage tier, it kind of sounds like you're the the backing store for your state under storage tier is EBS. Did you guys notice a difference between EBS and like using SSDs instead for your storage tier nodes? Okay, so good question. So essentially, we also, we do not have this layer. I mean, the first, the first version of rising wave doesn't really have the EBS podge. We only have the EC2 and S3, and we leverage a local. So the first version we did that, okay, okay, actually the version, version 0.1 was advocate every all the data is maintained in the cache. And the version of 0.2 was to, okay, to allowing the, allowing the machine to, allowing the computation to spill the state into the local disk. But the local state, local disk is still, I mean, large enough, right? So we have to adopt the EBS. So that's the, that's the main reason, yeah. EBS is very, it's actually fast enough. And I think about the, I mean, it's quite reasonable to have the tier storage to maintain a tier storage tier. Yeah. That makes sense. Thank you. A more interesting question. Yeah. Hi, Yingjun. So in the beginning, you mentioned that, you know, rising wave is postgres compatible. So I was wondering sort of, is SQL the language that you use to query this and what, what portion of SQL do you support right now? In particular, multiway join. I was kind of curious about, if you support more than two way joins? I mean, well, it means actually it's a postgres well, well protocol compatible. I mean, we implement the one protocol CMS pattern. Actually, a lot of things implemented in PG while in in pattern. So it means that's okay. I mean, as long as your client can talk to PG postgres, you can talk to your database system because we speak in the same language, we speak, you know, we adopt the same protocol that is PG, well, first goal. And the semantics is also same. Yeah. So I'm not sure whether that's possible. What is, what is the SQL dialect though? PG, yeah, postgres. Yeah. So it's postgres protocol plus the process SQL dialect grammar. That's right. Right. That's right. But we do not build on top of postgres. Yeah. I know that's right. Some database that we build on top of postgres for example, Redshift, right? But we build from scratch in Rust. Yeah. But Amo was also asking like, again, what, how much of postgres do you actually support then? It's like, of their SQL multiway joins? Multiway joins, well, the, I mean, we can see the part where definitely the general case is we can never support that. But first, I'm not sure whether, well, what kind of joins you want to have? Like join more than two tables. Oh, we can, we have that. Yeah, we have that. But I mean, in terms of the detailed, detailed implementation such as well, whether you want to have binary joins or whether you want to have multiway joins, I think it really depends on the optimizers. Yeah. Yes. Yeah. Again, it's not. SQL doesn't specify that. I understand. Yeah, that's right. Yeah. All right. Any other questions? Oh, I think, well, they, I think there was a question here. If I want to choose one, two, two papers to read in, to understand what is unique about writing way about literature, what paper you want? I haven't written in a paper for a long time. After I graduated. But, I mean, we're, we're thinking about, well, okay, writing a paper and submitting to the industry track as possible, probably early next year, probably. Victor, it doesn't, it's not specific to rising way, but I think a bunch of the papers he listed in the talk were data flow, SAMHSA paper. There's enough. Oh, yeah. Yeah. That's right. Yeah. Yeah. Basically the, the theory behind it. Yes. Yeah. Okay. Yeah. Yeah. I think there are a bunch of papers we'll talk about through the stream processing, how to maintain a stays, right? Yeah. The middle row and the flow is definitely well, definitely one, yeah, one of them, right. And there are several other papers like, I, I believe that's okay. There are several papers from TU Berlin, right? Yeah. They wrote a really good surveys about stream processing. Yeah. Yeah. For example, timely data flow seems like that. What is the, really the main theory behind it? Timely data flow, we can, we didn't really implement a timely data flow. I mean, so the, so this is actually, this is related to my last question. My last question is like, you know, the early stream processing stuff that you showed in the 2000s, that was sort of groundbreaking and setting up like, here's the algorithm methods you use for like doing these sliding window computations and making the staple operators. And then to the NIAD favorite in particular, I think that's, that was, you know, just a different approach to this problem as well. Millwell to some extent, but that was more about the architecture. Like, what do you see that is, is it different that like it's rising waves doing it in terms of the algorithm methods for computing these, given these window functions, just sliding window functions, or is it just the unique aspect as the decoupling of the, of the short term compute? Well, we do not adopt any, any unique computation algorithms. For example, you mentioned, you mentioned timely data flow and differential data flow. So I think although, I mean, if you, if you check the background of these papers, well, they are mostly designed for the unifying batch processing, batch processing, stream processing, incremental computation and graph computation. It was published in 2019, 2012, I think. But I mean, we're not focusing on those applications. And we are more focused on the, I mean, SQL, because we believe that, okay, the problem here is that, okay, it's not about the expressiveness of the, of the database system. The problem here is that the cost of the system. So, yeah, to answer your question, we do not really adopt any unique, very unique computation algorithms or computation architectures. But I mean, yeah, yeah, you, yes. The contribution is the engineering. The engineering of the, in the cloud. Yeah, that's the, that's the unique characteristics of the, of the, yeah, unique part of the system. Yeah. Okay.