 All right, are we ready? Yeah. Yes? Yeah. Okay. Well, hello everyone. Thank you very much for joining us live at DevCon 2020. My name is Bowen Song and I'm joined by my teammates Fuyao Wang and Yichen Ma to present to you a project that is originated from the cloud computing class at Boston University early this spring and evolved during this summer. So this is a project that's about distributed database where each of the distributed instances are rapid cars of one and another and are kept in sync via the gossip protocol. Now you can envision each node playing the game of whisper or telephone game much like the picture on the right hand side there so that they can stay in sync with each other. And with the gossip protocol, they can adapt to any type of network topology and communicate the least amount of data necessary using the state-of-the-art set reconciliation protocol. Now next slide please. So our agenda today, we will introduce you to some of the traditional databases and some of those common problems within them. And also we will talk about the distributed database in general and the cap theory that's evolved in there. We will focus on our database and the Gossiper DB and its architecture. We also will present some of the application and the future work. So we did think about doing some kind of a live demo session today. We will show performance graph and diagrams to better characterize our work since setting up hundreds of database instances and visualize a cascading data propagation speed of the gossip protocol is not really possible with our limited resources, but hopefully our presentation will be good enough for you to be able to envision this entire process going. And now I would like to hand it off to each and all to start us off with the traditional database and its common problems. OK. Hi, this is Yichen. It's my pleasure to share our work to you guys here. Databases are very common in our life. For example, in your cell phone, you need a database to store your contacts and your bank needs a database to store your information. It seems like we store everything in the database, but when the amount of data we need to store become large, for example, for a global company as customers will be all over the world, a single server may not be able to bear such a large load. Also, a single database may drop because of the power off that the customer cannot use a service anymore. And these problems may cause measurable losses. In this case, a distributed database can solve these problems well. The definition of it is a database in which data is stored across different physical locations. In this picture, we can see they have three database located in America, Africa, and Asia. Now distributed database are already popular. These are the common distributed database today. TITB is compatible with Mexico, and it has draw scalability developed by Pincap. And HBase is an open source non-relational distributed database. And the last one, ICD from CoreOS, which is a lightweight distributed key value database. Because the database are distributed, there will be a communication between different database service, which may cause problems. For example, I have $100 in my bank account, and I made a transfer in Seattle, sending $100 to my friend. And at the same time, I used another IP in Boston to complete another $100 transfer. If two transfers occurs at the same time, the full system in Boston and Seattle will allow this transfer. Because both service will see that there is indeed $100 in my account, and I have enough money to make the transfer successful. But in fact, totally I transferred $200 to my friend. So you see, something went wrong. For companies like Bank, they are sensitive with data. They always want to get the latest data. Otherwise, they may lose a lot of money. Everyone can steal money from bank guys like what I said. So how can we solve these problems? To avoid this situation, the bank want to know the latest information of my balance. Speaking of this, I have to introduce CAP Serum to you guys. Every time when you mention distributed database, you have to talk about CAP. CAP stands for Consistency, Availability, and Partition Tolerance. Consistency means that we always get the latest data, no matter which node we write from. If the data cannot be guaranteed to be the latest, the server will return an error. In our case, the transfer in Boston will fail because at that moment, my account is empty, and the server will return an error. For availability, the server cannot return an error. It must return the data even if it is not the latest version. And the Partition Tolerance means that the system continue to operate, despite some messages from being dropped or delayed by the network between nodes. Indeed, CAP are the benefits of the distributed database. But in reality, at most, we can only achieve two points at the same time. So we'll come to the real world. For the problem of back I mentioned before, they want consistency. When client A sends a request for transferring, the system will wait until this commit gets both database in Seattle and Boston. Then, my balance will become zero, which will prevent the invalid transfer in Boston. By doing the waiting period, any other operations of my account will be locked, which affect the availability of system. But in some case, we don't need a consistency. On the contrary, we want availability. So in our project, we sacrifice consistency to get availability. And that's the end of my part. I will handle the next part to Fuyao. Thank you guys. Okay, hello, everyone. I'm Fuyao. Thanks for each hand. So I'm a graduate student from BU and now I'm working at Sena Health, INC. Nice to see you guys. And I'd like to continue to introduce our work. So our choice is to use Gaussian protocol. In Gaussian protocol, every node is equal. They don't need a leader to control them. And on the contrary, they transform messages between themselves. And they spread messages periodically and only to their neighbors. As you can see in this GIF, they transform messages like in pandemic. It's good in our project, but do not do this in our real life, okay? Next slide, please. So in this way, we have all our own advantages. Firstly, our database is decentralized. Nodes don't need to know everything about the cluster. This reduces a large amount of work for a node. Now the nodes can fully concentrate on their own business. And as long as the node is connected to the network, it is able to send data to the whole cluster. Then we also have scalability. The cluster becomes much easier to scale and can grow naturally. Also, we don't care about the temporary failure of a specific node. The other nodes will work continuously. And in addition, we have uniform convergence. The data would be spread from mouse to mouse. Like I just said, it's like a pandemic. The data transfer speedup could be exponential depending on the topology. So the inconsistency of a cluster will become consistent very soon. Based on this, we named our project as the Gauzipper. So okay, here's the way we implement Gauzipper protocol. Firstly, our P2P communication. The entire network relies on P2P communication. And we use site reconciliation to efficiently synchronize data between different data source. Site recirculation takes the union of the two sites and in the case of collision, it takes the later version of an entry. So collision handler is parallel to the database design and can be adjusted to specific system settings such as data lens, purity, et cetera, et cetera. And secondly, we have heartbeats. Heartbeats from a node ensures the nodes are healthy, responsive, and contributing to the network. It is also used to determine whether it is necessary to reconstruct the network or circumvent a specific node. I'll talk more examples later about this. And certainly we use level DB, which is an open source Q value storage as our bottom level storage. So our database is designed to store Q value pairs. Finally, our neighbor list and member structure is used for reconstructing network topology in case any neighboring nodes fail and the network falls to suboptimized state of communication. Next slide, thank you. With this kind of design, we make our database as available as possible. For example, we have four nodes here in a cluster and they form a mesh network, they are connected to each other, but suddenly two of them were crushed. If we are in a strong consistency database, we cannot do anything now until they get back to us because the leader node cannot get the majority of its followers heartbeat. But in Gauzipper, we can still write data to those two alive nodes. And whenever the two died nodes get back, they get the data eventually. Also about our member list structure, for example, in this kind of ring network topology, if one of the nodes is crushed, the cluster becomes a line. We snowed out that we don't want this, right? But our member list can recursively get the neighbor's neighbor and connect the node to it. That's what we want. Okay, talking about site reconciliation, it is an efficient way to synchronize data between nodes and it is our core tag. It firstly determines the union of two sites and only transfers their symmetric differences. It minimizes the total amount of communication and this reduction of communication reduces bandwidth conception and the amount of time for data transfer. Like in this example, Alice and Bob both have one, both have A, B and E. So just need to synchronize the D from Alice and the C from Bob. About the performance and more details, Bowen will talk more about it later. Next slide, please. There are several synchronization protocols already implemented in my professor from Boston University, Ari Tredgenberg's repo, like CPI sync, IBLT sync, Google sync and full sync. This library is written in C++ and there is also a go-down version in Bowen's GitHub. They are shown on the top. It's really welcome to import or come to build to them. And thank you everyone. I'll hand it back to Bowen. All right, thank you Fuyang. So those are some of the example or a state of the art site reconciliation protocols and we could use any of them for our purpose to synchronize data between the nodes. Now in our implementation, we used IBLT, but let's take a closer look at some of their performances. So first of all, let's look at this graph here. We are looking at the communication cost and time versus database size graphs. So in this graph, we try to reconcile the differences between two databases on the same machine to remove the network variability. Now we fixed the number of differences, different entries between the two databases and only increased the database size from left to right as indicated by our X axis. The left Y axis is a number of bytes communicated between two nodes in order to synchronize the two database. This is also known as the communication cost. And the right Y axis is a time cost for the synchronization operation. We're not counting the time needed to input each entry from the database into our synchronizing data structure. We can easily do that as we add elements onto our database. So the solid line in this graph are the communication cost for each protocols. We have the magenta as the database size, black for IBLT and blue for a one type of CPI sync. We can see the properties of these set reconciliation protocols right away that the database size makes zero impact on the communication cost. As you can see, this is also a log log scale graph so that we are seeing a drastic change in the database size from left to right, without any change in the differences between the two databases, the bandwidth consumption remain almost constant as you can see in the blue and black line over there. Now the dotted line, they are the time cost, which for both protocols are theoretically, they should be correlated to the differences, to the amount of differences between two instances, but what you see there is actually fluctuations. So in terms of a speed, we lean towards IBLT, you can see that it's a slightly lower, but for communication cost reduction, we would definitely lean towards the CPI sync. Next slide, oh yeah, next slide please, thank you. All right, so in this graph, we have the same setup, except for we wanna show the communication and time cost when we increase the number of entry differences between the two database, that is what we replaced with on the X axis. We are looking at a linear growth in both communication and time cost as the difference grows. In this graph, we fix the two database size, so we are seeing that at a certain point, which is the magenta line over there, we would be better off sending the entire database over to reconcile the differences than using one of these protocols. So this set reconciliation protocol are best used for reconciling a certain percentage of differences before the scheme of trading computation for communication costs become less beneficial and we use this rule within our gossip DB as well. Next slide please. All right, so let's also take a look at some of our top applicable situations that could best benefit from a distributed system with such a design. Our database is best used for systems that can work with weak consistency and require fast write and read operations to any distributed nodes, especially for systems that require fast scaling, allow PTP communication and has inputs not prone to collision. So one of our top example would be the CDN database, which stores cache version of website content in multiple geographical locations around the world for users to access. That reduces their latency, right? But this is the type of the system that can afford to care very little about consistency, but rather that the data gets to the places eventually, because just for a point of reference, not that long ago, the synchronization between different instances of CDN meta store is about 48 hours give or take and each meta entry is in this case less likely to collide with each other because we're talking about different websites here. So for our next example, we also have the cluster federation, which this year, we finally, finally have a project from Kubernetes it's called CubeFed to federate a set of clusters. Although it is somehow still under the model that requires a hosting cluster as it will be the one to joining other clusters into the federation. And then, well, that's the current design. And as the project grows, the federated cluster would need a way to keep their data in sync with each other and given the eventual consistency requirements of the Kubernetes cluster, the Gossiper could perhaps be one of the best fitting database for such a federation when it evolves into a fully decentralized system. Because when we're talking about a federation, we really aren't thinking about we have a one hosting cluster there. We really are talking about a federated fully distributed cluster. So that's the, with that in the main goal, perhaps this is worth a shot. And at last, one of our applicable situation would be for smart grid. The smart grids are powered by a bunch of IoT devices connected using ad hoc connection on their different type of network topology. Some people even refer to them as fog or instead of a cloud, they were saying, ah, it's called fog, but the naming kind of become a bit derivative. So this devices consists a huge network that's spread out to the city, to the world. Devices like traffic cameras, sensors under some bridges to watch the water levels, right? They are required to send data back to a central place for aggregated processing. So to do that, the system could rely on the Gossiper to spread data. Now devices in remote places could send data to its neighboring devices and then propagate back to the cloud for later processing. In this situation, data is individualized for each device so that they are very unlikely to collide with each other or actually they don't really collide with each other because we might, we just have the device ID infused in the entry there. But this type of system requires read and write or maybe not read, but definitely write at any node devices and the system is most likely to care about the eventual consistency rather that some type of entry goes to the cloud right away. So of course these are just some of the many applications that we want to inspire. This type of database is useful for the age of cloud and distributed system and we want to encourage you for involving in such an idea into your future design and the hashtag hopefully cube fat maybe should consider us. So speaking of the future design, in our work we would like to investigate a lot of more aspects, next slide please. So for us we would like to investigate the best ways to build a cluster of fully distributed nodes and construct their neighboring and member structure to create most resilient system on the different type of networks. So as Fuyang previously pointed out, we have a way to make the type of network to be resilient, but we definitely would want to make more progress on that because as you might have seen that some of them are kind of a greedy approach and next up we will also like to investigate the network topology impact for the system performance. So for example, a ring type network would result in each node only able to communicate with two other nodes in the system whereas the mesh network would interconnect all the nodes in the system. Although the ring type of network would result in a system that takes longer to synchronize the entire network than the mesh topology, but the mesh network would definitely require much more resource to establish the connection and more likely to require the devices to be geographically co-located. And at last, we would also benefit from some kind of a consensus algorithm as a type of our plugin option to satisfy some systems that still require consistency with some of the availability trade-offs. So that is something that we are willing to do and that we can do and hopefully that would be part of our future work. Thank you, next slide please. So with that, I would like to thank you and welcome you to check out our repositories if you would like to try them out. MyGossip repo contains the Gossiper DB and the CPI repo is a library of all of the state of the art set reconciliation protocols. They are freshly implemented straightly from the paper themselves. And I can see on the chat that Professor Trottenberg and some of my lab mates are present in this talk. So I'm very grateful that you guys are here. I'm very happy about it. So hopefully the documentation in those public repo within this public repositories are able to guide you and to walk through how to use them. And thank you very much. Next slide please. Thank you. And I would like to acknowledge Ashutom Davis for coaching our presentation. I have to acknowledge that he's here too. So I'm really happy that everybody that's somehow involved in this project are present at this talk and thank you very much. We would be happy to take your questions now. Thanks. Very, very interesting topic. And I was so excited because I have a lot of questions. So the first time when I look at the slides and videos sort of remind me up because I previously started Stanford online course. The course is presented by the professor. I think I was typing the type channel called juice. Have you heard that one, that person? He's talking about some, let me get his full name. I think his full name is, okay. They talk about a lot about the information propagation and how we can utilize some graph algorithms to help with the nodes communications. So it's kind of related to the topic today because in that course have a lot of the series. Also, I think it's similar to the talks today. Beyond that, the first question I have is that when in the first several slides that you're talking about that the several transactions happened at the same time, right? So conflicts happens because they are not synchronized, no consistency. Have you ever think about to show back or how the logs can communicate in that situation? Right, so any type of conflicts, each system has their own strong suit and weak spots, right? So for our system, our strong suit is our availability and fault tolerance. And then we are relaxing the consistency which would cause those, how should I say, conflicts resolving issue. But in our design, we are taking, we are trying a eventual consistency approach. So we are taking the latest version of certain key value pairs. But yes, one of the, as we all mentioned, we consider this type of conflict resolving idea to be, well, parallel to our project because it could, as I also mentioned, it could be as part of the plugin to do that. There are plenty of ways to resolve conflict but they are sort of evolved around what the system looked like. So for example, we might be looking at a cryptographic database then we might be wanting to look at the key values for number of trading zeros, number of leading zeros or how long a key is. Those might be the options in order to choose which one to save. There are other examples but I don't really have them on top of my head but a lot of other system have their consensus. For example, by pixels and so on, they elect, they admit some, well, yeah, they use some kind of maturity to vote. We could also potentially do that but it is not something that we focus on this time but like you said, it's important and we definitely want to include that as part of our future work. The reason I mentioned the lock back into this picture is that I was thinking of because it's fairly easy to implement with the lock rather than because, I mean, you're saying that if we inspect the logs, we can see how the data has been propagated, it has been evolving, right? Right, right. So the problem with that is if I say I write to node A, I write one version to node A and then my second version of changes, actually I write it to node B, right? You cannot, at that point, you cannot say, I look at the logs and I know exactly what's going on because by then you would need multiple logs and then compare them together to say, hmm, maybe we should go with one of these things and note that there's no guarantee to say that which one is the latest in that sense in that sense because we know the famous lampard clock, right? We know a lot of famous work, including the Google True Time, a lot of famous work to try to basically tell time at each distributed nodes and by just simply looking at the logs and try to aggregate them based on the information there, like even if you know all the logs and then try to combine them, you might still don't know which one is the one that you want. So that's why sometimes depending on a system, you might want the consensus algorithm meaning that whenever I decide it, I wanna go with this one, then no further discussion. We're done here. We're just gonna go with that one. We can also use other algorithm that I mentioned, like from the cryptography, that we choose the key value with certain traits by comparing them together and then we choose the one that, depending on some of the rules, we choose certain traits. But in general, yes, you feel like if I see the log then I should be able to know exactly what's going on, but that might not be always the case. Especially if the traffic gets to be directed to different nodes and then you try to aggregate them, that's when things goes kind of a horribly complicated. Okay, okay. Yeah, thank you. Cause I never mind. Another question is when you're talking about the gossips, have you ever think about some nodes can be backtracked, right? So it's more like a former circle, former ring, right? If you can reach in the first time, why should we reach in the second time or fourth time? It's gonna be backtracked. How did you avoid this kind of situation that can bring to the implementations? Right, so this is one of those properties that comes free with set reconciliation. So in set reconciliation, we're not sending the differences. We are sending the equivalent of the, like we're sending amount of data that's the equivalent size of the differences between the two databases. And whenever we do that, so let me take a very simple example with the IBELT case. So IBELT in the nutshell is basically I have a lot of keys, I have like a hundred keys and then the other node has like a hundred keys. Now we XOR all of those keys into one entry. And then the other thing do the same. Now let's pretend that they only have that one difference. Now you XOR those results, you're gonna get that one difference, right? Yeah. If they are the same, if you XOR them, they're gonna be zeros. Right, right. So this is basically in the nutshell what set reconciliation is. And if the reason why it won't backtrack is that if you connect, communicated with me and then I already have everything that you have, then if we check that, set reconciliation is not gonna do anything. It's just gonna say, oh, you guys have the same, bye. Then we're done there. But, right, you are using XOR flag to check, but is there any way to avoid using just directly skip it too? Oh, cause yeah, we also in our implementation we definitely has a way to calculate the digest of each database. That's basically what's happening before we try to sync the two databases. We calculate the digest, we compare the digest, are they the same? Then we don't need to do anything about it. Yeah, cause in, I mean, in a big database, so it will increase a lot of complexity. Just, I mean, if you do not know which way to go exactly, just, I mean, just propagate all the possible ways you choose, but you didn't know whether or not that one already have been updated, right? Exactly, but that is the fun thing about set reconciliation is that if you already done that, you won't do anything. Okay, okay. Right, so basically think about this. If you constantly send to another guy your XORs and then they are the same, you are not really doing much damage, right? The other guy always say that, oh, there's no difference. Okay, right, right, right. Another question is that, do you have any specific graph algorithms used? Sorry, graph algorithms? Yeah, to use for your cases. For the general case, you mean? For, right, for how, I mean, in the broader view, it's more like how the information was propagated. Right, so basically, let me rephrase your question and see if it's right. So you're basically asking, do we have any of theoretical bounds or expectations or from how the data will be propagating in the future, right? Hello? Are you asking that, do we have some kind of analysis on how the data is gonna propagate within our implementation? Right, right, because I mean, I was thinking that, because a long time ago, I started that course, and I was thinking that, I mean, you know, because a long time ago, I started that course, but right now I couldn't remember clearly about some comments, but more like comparably different algorithms have different trade-offs. So I do not know, for your cases, you care more about the work, about the complexity, about the consistencies, or about, I mean, the real time responses, right? So I don't know which one you prefer, so that's how you're gonna choose which algorithm you're gonna use. Oh, so again, so you're actually asking what set reconciliation algorithm we are trying to use or best fit in this situation? Yeah, yeah, yeah, yeah. We can best benefit from it while using the reconciliation algorithm you're talking about. Right, so the essence of set reconciliation algorithm has been trading communication costs with computation, right? So each of those algorithms would have a strong suit, a weak suit, mainly I compared IBLT and CPI sync in our graph earlier and you can see that while IBLT has a slightly faster computation time, its communication cost trade-off is not as efficient as CPI sync. So with this in mind, you could envision two different type of systems. One type which it really cares about the communication, saving the communication costs, saving the amount of bandwidth between each of the nodes, for example, for IoT devices, they are living on batteries, they might have quite of a distance to propagate their data, all of that would contribute to why you might wanna choose CPI sync because one of the more important or more costy way, more costy energy consumption happens during transmission, during transmitting data from one end to another. And so yeah, for this type of system maybe CPI sync works, but maybe for say, within Google Cloud where they have so much bandwidth between the nodes, they don't really care about sending a lot more data to one another than maybe we could go for IBLT or even just sending the entire data because they don't care, but essentially they would care because Google kind of stores a lot of several beta bytes of even more data there, so it would still work and if you dig deeper on the set reconciliation algorithms, they all kind of have a boundaries depending on the each setup, of course, that at certain point, past a certain point the amount of differences between two instances, the amount of communication cost is no longer reduced by computing, by the computation that's involved in there which also was showing one of our graph. So essentially when we're choosing certain type of algorithm for our system, for each of the, for different situations, we need to keep in mind that they have slightly different trade-offs and we would mostly concern with most common cases and somehow also have a hybrid, use some type of hybrid approach to say that maybe when they pass certain values, we would change to a different protocol or maybe the set reconciliation is not worthy anymore if all of my things are different and I better off just send it in the entire database, right? Right, right. Yeah, makes sense. No problem. Last question. No problem, I have all the questions. So when I see the plots, so can anybody share the screen? That's the plots that went to Y axis. Yeah, we have two plots. One is for different database size, one is for differences between the two instances. How did you implement that one? Through my lab or my cloud lab? Oh, heck no. Oh, go back to the graph. You can't go back to the graph. Do you mean where the data come from or do you mean how I graphed it? How you graph it? Oh, so the last exciting part, graphing that is just my lab. Oh, my lab. Cause I mean, it's easy to, I think you're using something like the cloud YY like this way to make it to Y axis. There are a lot of approaches. This is my lab, it cost me like 20 minutes to finish everything. So it's not a big deal in terms of this thing. I could potentially share how I graphed it onto the GitHub if you want, but it's not that hard. If you just check how to graph the Y axis on the right side, it's there. Yeah, I mean beyond the how you graph it because back then I will also try to graph something but the frameworks that I was using at the end just conflict, I mean, when I compiled, it felt sort of the, I mean, the code complaints because the some framework that I was using is not to allow to use something like the plot YY to have the two Y axis. So at the time, I just give up how to do the data into the same Y axis, but with the one plot. I just split it out. I give up. Yeah, to be honest, I give up. But today when I say, I was very interested in how you did it. Yeah, that's beyond that. Can you explain how the graph again about this one? Sure. Sorry about the overshooting the time, but I just want to check in. You're the moderator. I'm okay with hanging around. Yeah, yeah, no. So if you take a look at this graph right here, we have the database. It's a lock, lock scale, right? So there is a lock here. So on X axis, we have the database size. That means from left to right, the database increases drastically from say 100 megabytes to actually a huge amount there. Now, you can see on the left Y axis is the communication cost, which is basically how many of those checks sounds that we're sending to one another between the two databases to make them in sync with each other. And those solid works, thanks for you. So those solid lines, they are, well, they basically belong to the left Y axis. And you can see the blue line, the black line, those stayed constant throughout the increase of the databases. The magenta line goes up, which is the database size. I put it there so that people could have in a visual of... Which line is your approach? Which dotted line? Oh, they are the solid lines. So solid line for the magenta is database size. The blue one is the IBLT, which we used in our database. The black is the CPI sync, which we didn't use in our implementation, but it's there if we wanna switch up. Also, so comparably, I mean, with the database size increases, your approach, I mean, to the number five transfer as constant, right? It didn't increase. Yes, because Sarah reconciliation only cares about the difference between two instances. And for example, I just mentioned the XOR example, I don't care how big or small the database is, I just need to send that one XOR to the other side to get that one difference out, right? Oh, yeah, yeah, because in this way, so because if we can't know, which nodes we do not need to propagate again so that we can dramatically decrease the time for, or decrease the steps for propagation, right? Right, right, exactly. So on the first attempt, we can see that, oh, are they the same? Then they're the same time, we don't need to do any of it. In this case, in this graph, we have a fixed number of differences still. Okay, and also the time is also very low, right? Yes, the time is because we only consider synchronization time, we don't care about the time when we added the entry into our data structure, we just care about when we're doing that transfer and then the other one figures out what are the differences. That's the synchronization time and that time doesn't really change. What you see here is more or less of fluctuations because they're within like a second or so and my laptop is kind of old. Oh, where's data come from? Sorry? Where is the data come from? What's the data for plotting this one? Oh, so plotting this, we are actually using randomized data. This is a performance graph, right? So we don't really go around and say, I'm gonna grab a lot of data from somewhere else. We're basically generated a bunch of random data to put in as entries to put inside. Okay, but it's more like before, you have to build the graph first before you test it, right? I'm sorry? You have to build the graph for the data. You have to build the graph first, right? To before testing, right? I think you mean build the structure first? Right, right, right, right, structure first before test. Yeah, yeah, yes, of course. We structured it so that we show this two types of graphs. Okay, yes. Okay, thank you for the explanations. No problem, thank you. Thank you for all the questions. Those are really nice questions. Yeah, thank you. Yeah, I think I know that you have more questions. I am very, very happy that a lot of very important people that paid a lot of attention to the project were here today, so. Yeah, I highly recommend it for these professors, our courses using Stanford, just checking his courses. I think I forgot the title for the courses, but I think could be extremely useful for your situations, for the talks you're talking about today. It's pretty related. Thank you, yes, I'm good at learning him. And he's a pioneer, more like a cutting-edge pioneer in this area, so it's gonna be very useful. Thank you. Yeah, you're welcome. Okay, thanks, let's end today, yeah. Thank you, Yicheng, for staying. Yeah, yeah, yes, yeah. No problem. Okay, thank you, bye.