 So we're going to do this without presenter notes. So I don't know how well this is going to go. But anyway, thank you for your patience. My name is Rick. I'm here with my esteemed colleague Ray. We work on the engineering team at Segment. We're going to talk about how we saved a ton of money migrating from Aurora, which is nice but expensive, to FoundationDB, which is quite efficient and also nice. We're going to talk about a few things. First, we're going to talk about what Segment does, since that seems to be a question many people have. We're going to talk about the use case of identity resolution. We're going to talk about the data model we use on top of FoundationDB. Then Ray is going to talk about how FoundationDB saves us money, how we got to production, and some potential future use cases for FoundationDB at Segment. So what is Segment? Segment is a platform that you use to instrument your mobile apps, your websites, to get data from those about how your customers are using those products, and forward them through Segment to standardize them, to make sure they're clean of PII, and to forward that information onto downstream analytics tools, AB testing tools, mail campaign management tools, lots of stuff. So the idea is Segment delivers these amazing customer experiences. This is common. We see people using tools. They connect things all over the place. There's a lot of inconsistency. Data is going to the wrong place. It's full of stuff like PII, where it doesn't need to be. You may recognize some logos here. I hope that this isn't offending anybody for some reasons. We call this CDI, which is customer data infrastructure. Now, this is kind of a... We have a few slides that are kind of from our marketing team, but they help kind of illustrate what we do, because it's kind of complicated. So the idea is we provide this single point of customer understanding and action. We help you synthesize your data. We help you standardize it, so clean it up, make sure it's all in the same format, and then collect all that from various endpoints, from the Roku box sitting on top of your TV, and then power all the same tools from that same data. This is kind of a slide. You can kind of see this has nothing to do with actual reality. This is just how the thing works conceptually. Data flows from the sources. We clean and synthesize it, and push it out to destinations. So this is kind of a timeline. This is only really inserted here to kind of give you an idea of where we are relative to FoundationDB's open source release in 2018, which is where we started to really be interested in using FoundationDB. So around the time that we launched this product called Personas, you see in 2017, we built this thing on top of Rora, and then we started to look at FoundationDB afterwards. So what is identity resolution? This is the use case for FoundationDB at Segment. It's kind of in the center here. What we do is we try to build profiles of users that are using our customers' products, and that often means that they have various identities across the product that we want to unify into a single picture. And so what we do is we build an aggregated graph of data associated with a unique user, and this helps simplify analytics and provides a unified view in your downstream tools. So a common problem you might see is a user goes to a website or opens an email, and then they're redirected to a website, and all we know about you is some random ID we've generated and given in your cookie. So you can see this anonymous ID. You might have opened an email, and through the data that we collect over time, we're able to basically build a unified ID. You may log in and you may complete a transaction, and all of those events have different identifiers associated with them, and the goal is to basically provide a unique ID that connects all those other identifiers together. So we do this using two different types of data, one we call mappings and one we call merges. And a mapping is what we do to connect an external ID, what we call an external ID, like an email address or a database ID, various other things, to what we call a segment ID, that universal ID. So right here in the example, these external IDs would be this user ID, 8Y, et cetera. The email is another external ID sloth.segment.com, and we're trying to map those all to this universal ID, use 1, 2, 3. Now this comes with a problem, which is that we generate these new IDs, these new quote-unquote universal IDs all the time, and we want to be able to, at a future time, merge those together when we realize they're actually connected via some other element of data. So that's what we call a merge. Now let's talk about what this actually looks like on top of FoundationDB. So we don't use any real mapping layer. We decided to go directly to the KV store. Some of that is because we use, all of our services are written in go, and there isn't a ton of layer implementations for go. So we directly use the KV interface. We use subspaces to structure our keys, and we have two types of KV pairs. So we call KV pairs objects, which are basically surrogate IDs, keys to JSON serialized objects, and indexes, which index those objects based on the fields that are in those objects. So again, we do mostly point gets on these object KV types. Value is a JSON serialized object, and when I say surrogate key, I mean just a generated key we use that doesn't really mean anything outside of our system. We also have these index KV kinds. We use the subspaces to segment the compound identifiers. We generally look up by key prefix, so we use get ranges for these, and then the suffix of the key is the actual target, the thing that we're mapping to. So the prefix will have multiple targets that it's identifying in the system, and the value is usually empty. So this is kind of what it looks like. Everything is prefixed by tenant because we're a multi-tenant system. This is one of them might be an identifier type. To an identifier, this would be a range prefix, and it's connected to this mapping ID, J2, LK, et cetera, garbage. So let's talk about this first type of data, which is mapping data. So this connects an external ID like rick at segment.com to a unified segment ID. All right, so this is kind of what it looks like. It's very simple. We have a tenant prefix and a mapping ID connects to the JSON object with the information. Very simple. Sometimes we want to list all of these. We run various reports and things on these. So say you go to your interface and you wanna see all of these mappings. Literally people wanna do that for some reason. And so you can actually, we have a big index of all these mappings that is ranged by tenant. We also have an ID index for the segment ID field. So you can see we can go search specifically for segment ID and point directly at a mapping ID. There may be multiple of one or the other. Okay, so merge data. So this is what, when we realize, we finally realize, hey, there's actually multiple mappings that connect to the same ID. We need to bring these together. We create what's called a merge entry and those associate two quote unquote unified segment IDs. So this is what a merge object looks like. Similar to the mapping, it's a tenant prefix with a merge ID that's generated, surrogate key. And generally there's some other fields in here. There's some timestamps, et cetera, but the only things that really matter is it's like a graph relationship. It's a from and a two. And these are the segment IDs. So we have an index, the from index. Very simple. It's the from field to the merge ID and a two index, a two field to the merge ID. Very straightforward. All right, now I'm gonna go to my friend Ray and he's gonna explain the rest of the gory details of the system. Thank you, Rick. Can you all hear me? How about now? It's better. Okay, cool. Yeah, so I'm gonna talk a little bit more about sort of the path to production and a little bit more about operating the system in production and then some future use cases. So these are essentially some of the needs of this workload, as you can imagine. It's a high throughput sort of workload. It's, we project the system needs hundreds of terabytes of space on disk replicated. It's not that large right now. It's more in the order of tens of terabytes but it's rapidly growing. And of course the key piece is we need serializable isolation levels. It's also a very high volume sort of a read workload. So the read volume is much higher than the writes as you'll see in a moment. So a little bit about how data flows through the system. So the identity resolution system sort of is teed off of a stream processing system that handles the core data ingestion and then routing that data to all the destinations that we integrate with. So data comes in starting from the left-hand side of this graph. Data comes into our API and into our core ingestion pipeline. There's a lot of systems not featured there but we're just focusing on the identity service here part. So there when it first comes into the ingestion pipeline we start in Kafka and then we have some systems basically performing validation, schema enforcement, things like that. Any customers who are using Persona that data is then teed off into the identity ingress topic and then we have a service called the identity resolver. That service scales in and out horizontally as needed, depending on the volume of traffic and that queries the identity service right path essentially on the top left there. So that is where for every message that's coming in essentially we're going to FoundationDB looking to see if we have seen this identity before this piece of data and creating the mappings and mergings as necessary. On the bottom half we also have a read only path essentially so that's just read only request to FoundationDB for other systems downstream that are looking for identities. Okay, so as we mentioned we originally built a system with Aurora and it actually worked pretty well. There are some nuisances for the developers and FoundationDB is actually a bit better. Like one of the problems is you might do a read from a read replica and then determine oh I need to go insert a record then you go to the master and go to insert something and another message had already come in and inserted that so then you have to go back. It was sort of a bit of juggling but Aurora worked but the primary problem was the cost. So this is just like some baseline cost for Aurora just like an idea of the back of the napkin math we did when we started to consider FoundationDB. So basically a terabyte and a billion IOPS a month it's not too expensive it's about a K with a single instance but if you can imagine even like a medium-sized workload that basically grows to tens of thousands of dollars really quickly and this is only like 20 terabytes and a hundred billion IOPS a month. You're looking at tens of thousands of dollars. So you can essentially basically provision we determined that we could run essentially the same workload on FoundationDB for about a fifth or six of the cost. So roughly with the resources here. So 20 I3 extra large instances, some stateless and some transaction nodes. Okay, so a little bit about our path from prototype to production. So prototype was pretty quick. We were able to stand it up pretty fast but we hit a couple like hurdles on the way to production. The first one was everyone's favorite game. And so we weren't exactly sure. We weren't getting this sort of performance that we thought we would get based off of some of the benchmarks we saw online. And we weren't sure was it something we did wrong. You know, we're systems under provisioned. It turned out it was mostly us doing things wrong. We weren't like we're basically not using the API correctly. We weren't pipelining requests, things like that. So we quickly improved performance but there was some things that was confusing. So for example, like the default recruitment sort of confused us. We found the tuning cookbook and that sort of led us down a path of like evolving our cluster configuration. So like when we started, we saw this as like CPU utilization with the default basically settings with everything on set. And you see CPUs sort of all over the place. It was a little bit difficult to reason about what was sort of going on. So we essentially came up with a configuration. We went through a little bit of an evolution but we came up with a configuration and ended up with heterogeneous three tier sort of cluster set up. So we ran, we run a stateless cluster, a transaction cluster and a storage cluster. Actually, sorry ASG, auto scaling group. We're running all this in AWS but that allows us to run the C5 instances for the higher compute workloads and then we run I3 instances with NVMe storage for the storage and the transaction tiers. And that allows us to scale these clusters independently essentially. A little bit about how we get this all sort of provisioned in production. So we use Terraform. We sort of set out to then, so after we solved some of the performance issues and we were getting the results we expected, we sort of started templatizing the infrastructure. We run, as I said, three auto scaling groups and then we run a container orchestration system on top of that. We'll talk about that in a moment and we run a bunch of supporting tools that we wrote in-house and we hope to actually open source and share with the community on top of the instances as well. We run the FTB server process directly on the boxes and I think we run FTB monitor as well. So just a little bit, look at the Terra code. Like if you look at the top of our installation we basically have everything broken down by environment and all the foundation DB config is under there. We have an input and a main file. Just running through this quickly we can specify the instance type, the number of processes to run, any sort of tuning we wanna do, knobs and things like that and then we actually provision each individual tier. So here we're looking at the storage tier and the stateless tier and then at the cluster level this is where we actually have our implementation. So we're running for container orchestration. We're not on Kubernetes right now we're on ECS but this is, we hit AWS implementation and then we have some of our services down there. So you can basically stand up a new service and create a .tf file in your environment and basically it will stand up autoscaling groups, it will stand up the container orchestration systems and then run these services that we wrote, the supporting services on top of those systems. A little bit about the sort of how we provision the boxes when they come up. So we use the user data configuration. So we build all of our Amys with Packer and then we use user data configuration to basically specify what class this is and any other properties associated with like tuning so knobs or any other parameters we wanna pass and then when the instance is booted up we have a bootstrap that runs and a set of system deprocesses that go ahead and look at it. They determine what type of, what class this is and says oh okay it's a storage node, I need to mount the NVMe, I need to format the device and then I need to finally build the foundation db.conf. So we have this script called configure fdb that gets executed and it basically sets all our parameters, sets our locality zone ID for replication, sets up the backup and then we're sort of off. So that's like the core of the Terraform configuration and then as I mentioned we have a few services that support the system that run on the ECS or any container orchestration system. So the first one is fdb discovery and I guess you all are probably familiar with this problem. So basically what fdb discovery is is this is a small service that runs on all the clusters, it's fronted by a load balancer and you can curl that service and basically get back the fdb.cluster config, pop it in fdb.cluster and add your node to your cluster. We have a process called fdb trace and that is basically parses all the trace log files and converts them to metrics and we feed those into Datadog today. So just a look at some of those, for example this view right here it's showing like a chart with master recovery commits, it shows you queues for storage servers and transaction nodes. You can see data in flight and moving data that's actually queued. You can see mean and max bytes per commit and then we also track transaction latency P90 and max. And then the last process, we actually have a couple more supporting tools but the last one we're gonna talk about today is fdb backup and that just basically kicks off the backup process. So we moved, when we started, we were on a six auto version, I can't recall what it was, but we moved to 18 for the blob store backup support. It was pretty smooth, we ran into a couple issues with expiring data. I think we were one of the first people to use it because I was posting on the forum asking questions and I think we were sort of early on with it. We're considering moving to maybe a hot standby with the DR agent and kind of because of the limitations we've seen around how quickly we can recover from backup and so I think Evan talked about this earlier but this is a screenshot of a game day test we ran where we basically rebuilt the, restored the cluster from S3. Looks like we were mostly maxed out on CPU but after we pulled down batches of data, it takes a while for it to apply. So this was like, it was a couple of terabytes on disk and it took several hours. Okay, and then so I'm gonna talk a little bit about game day and chaos testing and then we'll wrap up with some future use cases. So before we put it into production, we wanted to run through and try to break it as good as we, or as well as we could. We tried to induce tons of different failures the system behaves just as advertised, it was great. So we were able to kill pretty much anything in the cluster and it behaved well. We did notice that partitioning storage nodes resulted in high CPU utilization for example as the cluster healed. We also created a scenario where the performance dropped by basically exhausting a disk but we did that sort of not through the foundation DB path we just basically allocated a bunch of space on a drive and so we spoke with engineering and they're like, yeah, don't do that. So this is an example of like losing a storage node during game day and we're like, okay, well the cluster's healing, healed pretty quick. We saw increase in latency but our workload is the data's in Kafka so we can handle a little bit of latency and then we just burn it down quickly after and this was the case where we essentially exhausted storage on a node and then once the node basically handed off data the system recovered. So other than that, it's been great, it's fantastic. It's been really sort of hands off as far as operational stuff, it's been really good. Some ongoing issues we have though and I think these were mentioned today, one of the data distribution unpredictability. I think it's probably predictable but I'm not exactly sure how it's behaving so maybe just some more observability or some metrics associated with that would be great. We need to internally consider three data hall. We're running across three AZs and we're doing triple replication now and I think that's really ideal if you have five localities zones so I think three data hall would be better for us when we're running with three AZs and then I'd like to see maybe some ability to throttle things like healing or things like that if I want to trade off latency for recovery time for healing time or things like that. Finally, let's talk about some potential future use cases so we're trying to get to the point where this is sort of a primitive inside of a segment and new product teams can come, start up new foundation DB instances, get off and get running and just build their stuff on top of this. One of the use cases that we're considering is for a system we have internally called DDoop. Basically a lot of the messages we get come from mobile devices and mobile devices are on cellular networks and those are not necessarily reliable. People are going under bridges, going indoors, losing signal all the time so we get a lot of duplicate messages so we have a system called DDoop that sits sort of towards the end of the pipeline and essentially the way it works is we hash messages by message ID to a Kafka topic and then we have a worker per topic and topic in partition that consumes that and says it looks in a local database, it's RocksDB today and says have I seen this message before? If it has seen it, it would suppress it from going downstream. Otherwise, if it's the first time it sees it, it writes to Kafka and it commits to RocksDB. So, and then we jump through a bunch of hoops to make that actually work because it's not atomic to write to Kafka and write to Rocks. But anyway, when we restart, we read our writes. But this system is, it's pretty good because it's fast and basically you don't have to deal with any issues with network partitioning and it's an embedded database but the problem is it's difficult to scale because it's tied to Kafka. So if you wanna increase your Kafka partition count if you need to increase your count to add more processing power to the system, we have to jump through a bunch of hoops to basically shuffle all this data. So, it would be nice to move this workload to FoundationDB. So we could have an elastic tier depending on how much, how many messages are coming in, what the volume is. Those could scale up and down. Query FoundationDB to determine if we've seen these messages before and suppress if needed. Otherwise, forward down the rest of the pipe. And that is it.