 afternoon. So my name is Jun and this is my colleague, Hugh. We're both from the eBay. The team is called New Data Team, which is a response for developing and maintaining the digital data infrastructure in eBay. So today we'd like to present to you the new graph, which is a GraphDB services step-up on Genesgrab and FoundationDB. We'll just give you a very brief introduction about the GraphDB. So a graph consists of vertices and edges. On the right-hand side, on the right-hand side diagram, you can see the airport is actually the vertex. Connected between these, the airport is actually called route, which is the edge. And then you have the property distance attached to the edge, like route, and also the property attached to the name, which is the belong to the vertex airport. So the graph traversal or graph query is actually just start with a set of vertices, and then you're following the incoming outgoing edge and move to the next set of the vertex, and then you start the examination again. So then we also have called one hop query, which is a start with the one vertex, and then move to the connected vertex, the next immediate neighbor. And so this one hop, and similarly, you can have a two hop, three hop, and hop query. So in eBay, we have many applications that like to have GraphDB. And these applications include fraud detection, knowledge graph, IT infrastructure management, and product recommendation. So these GraphDB is required a real-time query, near-real-time update, page update, and also had a bug loading at the initial time when the GraphDB is being constructed. So most of these applications excluding the first initial loading phase actually exhibit the traffic pattern that's really heavy. So for the rest of the presentation, we will go over this new graph architecture and then the detail with the important feature that we developed for the FoundationDB storage parking. And then we also try to identify some feature and improvement that we like to have from FoundationDB, and finally the conclusion. So this is a 3D architecture of a new graph. You start with the graph application on the left-hand side, and then the application live with our new graph current library, which is developed by our team. And then the request will roll through the GRPC and end up to the new graph service tier. And then the request actually will be landed at the particular node. And the inside this node, the request is a go to the Janus Graph. And then it goes down to this FoundationDB storage parking. And it's this storage parking that translates the graph query into the FoundationDB specific key value store-based query. And then query then reach this FoundationDB backend store. And this store is a store both the data table and the index table for the GraphDB. And separately, we also have the management plan to manage the cluster and manage the schema for both the service tier and the FoundationDB cluster tier. So Janus Graph, it's inherently to support the transaction. However, it's up to the storage plugin to decide whether they're going to implement the transaction logic or not. So today, all of the storage plugin download that you can get from the Janus Graph public release do not have a transaction, a distributed transaction. So Berkeley DB is the one that supports transaction, but it's not horizontally scalable. But distributed transaction is the one that provided by FoundationDB is key to adjust the data inconsistency that we encounter when we develop this GraphDB. So I'm going to spend the next three slides to identify the key data inconsistency that we encounter and to see that. And to show you that the digital transaction is the way to tackle all this issue in a very straightforward way. So the first one is you have two vertices. You add an edge. This is a very simple GraphDB operation. And because the Graph query usually requires a forward traversal from the V1 to V2 on this diagram, and also it supports the backward traversal from V2 to the V1. So as a result, the edge information had to store in two presses. The first row belong to the vertex one. The second row belong to the vertex two. But because these two vertices are actually scarred, it's actually shard in different shards, and therefore this edge information update becomes a distributed closure update. In the traditional eventual consistent system, it supports now the first row, update, succeed, but not the second one. So as a result, the forward traversal query come from the V1 to V2, we succeed. But the V1, backward traversal to V1, we fail. So if you had the digital transaction that both updates of the row update actually belong to the single transaction, then either all of these updates succeed, and therefore both queries succeed, or both will not fail, and will not become visible to external world, and therefore both queries will show nothing. So that is still consistent. So the second one is a theory with many of these operations related to this graph DB. We have real-time query, near-real-time update, batch update. So all these go to a single database. With the digital transaction, all these queries and simultaneous count, and they all get the consistent result. And this is very different from the traditional lambda architecture. So if you follow the lambda architecture, it proposes that you have temporary DB to host this recent update, and you also have master DB, and you merge the data periodically from the temporary DB back to the master DB. Well, that's introduced at least two challenges. The first one is that, well, the query has to spend across two databases. One is the temporary DB, one is the master DB, and you have the merged data by the application. Well, therefore the compass query is very challenging. The second one is actually introduce a back-end capacity in terms of data management. Because now you have the temporary DB, a back-end DB, both serving the online traffic. In particular, all of these two can serve in the right traffic, and now how are you going to merge these two in a consistent way? Well, that's very challenging as well. So this slide shows you the failure handling of JanusGraph. Suppose you have the node, you're going to update the node to JanusGraph. So update the node as you introduce is required. Just JanusGraph will give you the new vertex ID for each new node creation. So there's a set of mutations. Each mutation corresponds to a key value pair update. And those key is derived from this vertex ID. Suppose now in this eventual consistent system, some update of some mutation fails, some mutations succeed. The mutations succeed while we persist to the back-end. And then you also saw the exception because some of these mutations fail. So now when this client receives the exception and make a second retry, and then this retry will lead to the new vertex ID, and this will lead to the new mutation set. Suppose now these mutations have succeeded, and therefore you have this duplicative mutation, in this case, m1 and m2, and 1 prime and m2 prime in this back-end database. And therefore you get potentially duplicative vertices. So this inherently, the problem that you have in general scope, if you do not have a distributed transaction back-end, well, if you have distributed transaction back-end, this failure handle is very simple. You just have a low back. And this low back will take care of this parcel update failure for you. So FoundationDB will choose it as a new back-end because of this distributed cross-strat transaction. And in addition, it has horizontal scalability, cross-region, high variability, low latency, key value store access, and also has intelligent safe management. So this diagram just shows you that how we deploy our cluster in the Kubernetes environment. We have three cluster, DC1, DC2, forms the active region, and then DC3 from the standby region. So our FoundationDB storage plugin is based on one that provided by Ted Wimis, who presented the GenScrub FoundationDB plugin presentation last year in this FoundationDB summit as well. So we take these, what he has, and then fix the bugs. And then we also add the new features, including AC iterator, read-only new graph services, and read-only query optimization across the data center, and among others. So here, Ness is going to present to you this important feature that we developed in the FoundationDB storage plugin. Hello. Hello. Yes, good. Yeah, Drew, we just introduced to you the new graph architecture and that we use FoundationDB as our storage back-end. In the next slide, I will introduce some important features of the FoundationDB storage plugin of GenScrub. So first of all, it is query-reson fetching using iterator. So as you know, FoundationDB can query the data using two more, either blocking or non-blocking. With blocking, all the results can be returned in one blocking call or non-blocking. We can use an async iterator to fetch the result one by one on demand. So the high-level GenScrub query executor relies heavily on iterators to point the data on demand. So it is very in line with the non-blocking mode of FoundationDB. So that's why we use async iterator in our GenScrub storage plugin. So here, just to give you an example, we can have a queries, for example, g.v.iterator to get all the vertices in the graph. So the GenScrub executor would first get the iterator, and then from the iterator, it can loop through each in a while loop to get the result and then processing it. So with blocking, all the data will be fetched in the first call, so that this is not good for our lookup service because it increased the memory usage and also the CPU in case the GenScrub executor terminates the iterator as soon as it miscomes some condition. So with non-blocking, the first call is actually return only the async iterator of FoundationDB, and while it fetching through the iterator one by one, it can get the result as needed. So this is a very good solution. It keeps our service low memory and CPU consumption. The second feature is request contact propagation. So a request come from the client can have some request contact that carries metadata such as request ID, client address, and application ID. So the service upon receiving a request can get the contacts and attach it to the threadlocker, and the FoundationDB storage plugin can get the request contact from the threadlocker and then can enforcing some condition at that layer to handle the request differently. So note that by this way, we bypass the GenScrub, so there is no code change at the GenScrub layer, and we can get the request contact at the storage plugin. So how is the contact useful? So with that context, first we can enable the new app service to run in read-only mode. In this mode, basically every write request from the client will be directed. So you may think it is a simple solution just enable read-only mode at the FDB database level. Unfortunately, it is impossible because the GenScrub underneath it requires some administrative write operations to the database. And if we enable read-only mode at the FDB, then on the write from this write from GenScrub will be directed too. So the solution is that we check the FoundationDB storage plugin for the request contact. It exists, meaning that these requests come from the client so that we don't allow it to go to FoundationDB cluster. Otherwise, those requests are from GenScrub, and we allow them to go to the FoundationDB. The second use case of request context is that we enable prefetching the transaction version. So remember that every FoundationDB transaction need to get the transaction version at the primary DC. This is good. It provides strong consistency. However, it also costs the request to incur high latency because especially when the read comes from the secondary DC, since it has a cross DC drive trip just to get the transaction version. So we come up with an optimization that we can allow the client to optionally hint that the request is a type read. And then in this case, the client can prefer low latency rather than strong consistency. So the client can annotate the request with type read, and then the service will put this into the do-cast context. And at the FoundationDB storage plugin, it can get the annotation type reads. And then instead of going courtesy to get the transaction version, it gets the transaction version locally prefetched by a different background thread that are running the background. So with this solution, we can have a low latency and a higher throughput, especially in our use case, most of the workload are very read heavy. So with the example, we come up with a real wrap example and run some numbers. This is an account linking wrap in eBay, and we have multiple accounts that can be linked together with different linking strategies. So in this example, you can see that account one can be linked with account two, three, four via the linking strategy one, or account one can be linked to account seven via the linking strategy three. So there are two types of vertices here, account and linking strategy, and one type of edge which is linking. Note that after maybe multiple hope traversal, like a six or seven in this case, all the accounts are linked together. So in our deployment, as June said, we have a 3D C2 region deployment, and the total number of ports are nearly 290 ports, so this is very large. The database consists of 1.3 billion edge and 1.8 billion edges, and the triple mode is, we put the storage in triple mode, meaning that there are three copies of each data at each data center, and that results in 16.8 terabyte storage across on 3D C. So there are three types of query that we experiment, one hop, two hop, and three hops. The major query in our application is three hop traversal. Note that in this three hop, we limit the query to return only 50 reason to provide consistency, because there may be some super node that has a lot of vertices that satisfy the condition. So without the performance, from the primary DC and from both DC simultaneously. So here I only focus on the three hop traversal. So up to two DC, we can get to 15,000 query per second, and with that we can get to 95 latency of less than 15 milliseconds. So we are quite satisfied with this result. Note that we haven't put foundation to the limit in this case because we only have a 20 service node, and if we increase the number of threads, the service CPU would be exhausted. So next, we will continue to talk about some features that we want to have. So overall, the foundation is great to power our GraphDB backend, but we'd like to have some feature improvement or some features. So the first one is actually that five second transaction limit, right? So maybe it could be extended to 30 seconds or one minute. Well, the reason why is because some of the query, in particular the multi hop query, that also has a complex pattern matching, and also have the children node that can spend thousands of children nodes. So in that case, we're often that we get the transaction 2.0. So by having this longer transaction limit, then actually we can support a much more complex query. So the second one is actually we'd like to have a better storage management. So this happened, we found this problem when we do the one week long residency testing. We add in the storage node, we delete the storage node, and keep continuing doing that. And we found out that actually the storage occupants actually start from 55% full and then go up to 75% full. And some of them even go to 90% full. So this has introduced a storage imbalance as well. And then so we actually report into the forum. So this system that we have developed so far is actually OLTP oriented. So that's with two reasons, active and standby to support this OLTP transaction. But then we'd like if we want to have a graph analytics and there's a workload that requires the batch retrieval over the entire database sometimes, then that we need to have another third reason just to host this graph analytics workload. So finally I touch up this bug loading. So for the example data set that Hugh Mason actually took us more than two days to load that whole data set into our cluster. So now the question is that for this initial data loading, is that the way that we can bypass the normal transaction path? So as a conclusion, so we had to develop the graph DB service called new graph, which is based on the foundation DB and Janus graph. So foundation DB offered this distributed cost chart transaction. And this is a feature that key to adjust the data inconsistency issue that we encounter in graph DB. And also with this digital transaction, it greatly simplified the client-side application development. So with the last data set that we have, we found out we saw that foundation DB showed a high performance in terms of the high throughput and low latency. So with that, we conclude our presentation.