 All right. Hello, everyone. Welcome. I got the inspiration for this talk from Buzz Light Years, from Toy Story, remember the famous catch phrase, to infinity and beyond. So today, I'm going to talk about how you scale your application with Cassandra. I'm a principal engineer from Intuit Persistent Service. Intuit is home for TurboTax, Credit Karma, QuickBooks, and MailChimp. Intuit Persistent Service, IPS, we provide persistent as a service that's used by many products at Intuit. We offer two flavors of product. We have what we call team-managed and IPS-managed. Team-managed is basically, we just give you the code to deploy your infra, while IPS-managed is where we manage everything. So this is the flavor of our products. For example, Cassandra, we support both flavors. AWS Open Search, we only support IPS-managed. For PostgreSQL, we support both flavors. But for MySQL, we only support team-managed. So what is our value proposition for IPS? You see that I'm referring, when I talk about customers, these are service developers at Intuit. If the service developer do deploy the database themselves, you see the activity that's in the dark blue circle that they are responsible for. Of course, they develop the business logic, data modeling, application integration, data like integration, data governance like CCPA, GDPR, data lifecycle, like deleting your old entities, something like TTL. Data security is like encrypting your sensitive data at rest. Of course, there is alerting and monitoring, scaling, HA redundancy, and multimodal query patterns. Here, what it's referring to is you duplicate your data in multiple databases, the same data, but for different access patterns. I'll get to more deeply. And for team-managed, when you deploy infra as a code, you see the light blue circle there. Basically, these are the activities we provide either libraries or services in Intuit that the service developer can integrate with. So for example, scaling, HA, and resiliency, all you need to do is configure, but we provide the hook for you to enable it. For monitoring, for example, we provide basic dashboard. That's important because a lot of people don't even know what to monitor for databases. So we provide a basic dashboard, but we are not responsible for the alerting. They are responsible for their own own call. Now you see where IPS managed, where the value proposition really is the biggest is that really now as a service developer, you don't even care what data you persisted or how you managed your database. We are responsible for all the alerting on call, your scaling, basically everything persistent as a service. All you're really calling us is via REST API. You don't even deal with the data model that I will get into. So today talks, we'll do a deep dive into IPS node SQL. This is a product within Intuit that's over 10 plus years. It's a very mature product that's serving many critical use cases and used by multiple business units at Intuit. Today we have 10 production customer clusters. All the cluster are active active in two regions in AWS. Our largest production cluster is 117 nodes per region and that could support up to 220K TPS of crude plus lease API. Our system of multi-tendency, usually we standard the cluster business unit. Some bigger business unit has multiple namespaces or clusters. Our API is relatively very simple. We have crude plus lease. For example, for lease entity API, we only have two flavors. We have lease by owner, lease by index. I'll get into details later. And we have create, update, delete of entity and relationship and we have both flavors of the API single and bark. To read entity, we have two flavors of API. For most part, most services is just reading the latest version of the entity. And for those who need like a multiple version to go through the historical version of the entity, we have that API. And for this relationship, this is where you want to find for a given entity, you want to find all the related entities. This is our all architecture. You see that the crude, our REST API is fronted by API Gateway, where we have the crude plus lease API. And everything in IPS node SQL basically is in terms of entity and relationship. So for entity, we have two parts of the data. We have attributes and we have payload. There's a very simple reason for that. The attributes are stored in Cassandra, where the payload can be pretty large. So for that we store in S3. For relationship, this only has the Cassandra part that we store. And for every create, update and delete of entity or relationship, a CDC event is generated. And that goes to our staging topic. This is our internal topic. And then we have an IPS domain event processor. This event processor basically what it does is to transform. What you have your raw CDC may not be the most useful for business analysts to understand. So we provide this functionality that you can do transformation from your raw CDC events to something that's more user friendly to be used. And this, what they call business events are then forward to domain topic. From the domain topic then the same events is then is being consumed by another product by IPS. In this case, I give the example IPS search. So you can see this provided a very powerful system. What most SQL is really good at is scaling and high availability. But your ability to do ad hoc query is very limited. It's like, well, I want to do a more flexible collapse and say, sorry, you cannot do that. So what we provide is a very, you know, with just a few clicks in our UI and the configuration, you know, mapping between the, I'll go a little bit more in detail is between the logical schema and the search schema. Now that you have integration with IPS search and IPS search is backed by open search. Then you can do, you know, group by, count by, you know, a lot more flexible query that's allowed. So the same events as consumed by IPS search, then can be consumed by stream materializer here to populate the tables in the data lake. This is where, you know, everyone now these days have a data lakes where you, the business analysts do their reporting and stuff. This is an example of the logical schema that this is the first thing our clients that interact with our system comes to define their data model. Here you see the type name here is equivalent to a table name. And you see in this particular entity student has two attributes, your first name and last name. And you see the type here also. And you see the data classification. This is the part that controls the encryption of data address. And the next thing you see here is the index definition. You see here the index here has, it's a composite index of two attributes, first name and last name. And you have an option to define whether the index is unique. If unique is true, we'll do, we'll check for duplicate when you create a new entity. Our index, our binary index, it means that we only support exact match. And it's a sparse index so that if there's no attribute, no index will be created. So this is an example of an entity schema that we implemented. So there are many, many logical schema that users define. But underneath, we only use one table to persist all the entity. So you see here the partition key here is NS and entity key. NS is just a short for namespace so that each cluster or namespace in our cluster has a unique namespace. Entity key is just a UID. Here you see the entity version. You see the ordering here is important. This is in decreasing order because our, most of our pattern is really in the latest version of the entity. And then you see the attribute name value. This is just a, you know, any entity that can store any arbitrary key value pair. And this example, you see that there are two updates. Our entity version, by the way, is the UTC timestamp. So you see the first update, you have the first name, you know, it's Densen. And then during the second update here, you see that last name is added. This is our index schema. This is the binary index that I was talking about. This is to support the Lisbon index call. Oops. In this case, you see, where is this? In this case, you see that in this partition key, you see the index entity type, the index name and index value. The important part here is the index value. This is, you see that since this is a composite index, you see that the, you know, key and value for that particular composite index. And this one has to be a sorted map because you want to, this is an exact match comparison so that when you want to construct the partition key, it has to be always in the same order. So you cannot, your map cannot be random. So for example, you cannot have last name first and first name switch. Then, you know, it's not that you cannot do an exact match. So in our Lisbon index, so the first thing it does is a two-step process. So first, you create the partition key in this case, right? And then you find all the entity keys. So the first lookup is to get all the entity keys that's mapped to this particular index. And then the second part is to fetch, you know, the entities in the entity table given the entity key. And this is our relationship logical schema. You have relationship type here. And then you define the from entity type. In this case, the student to entity type. In this case, it's a course. And you define the cardinality. It's basic cardinality for relationship. You have one to one, one to many, or you can have many to many. And this is the schema to support the Lis relationship API. You see here the partition key here is from an S, from key. This is basically when you come and Lis for the API, you provide the from key. And you provide the relationship type. And you see the relationship type and two entity type and two key as the clustering key right here. So in the Lis relationship call, you provide the from key. You provide the relationship type. And then I can look up all the related entity keys. And that's what being written. And then the client usually then we call the bug get with the entity keys to get if they want to get the details of all the entities. And one thing to note here, our relationship are bidirectional. So you will see that this is what the direction type is for. So when I create the relationship in this case, from student to course, what I call here is the for relationship is the way the user created it. And that's why the direction type is one. But we also created the reverse relationship. Basically, now, instead of student to course, it the other way around is from course to student. The reason we do this is then you can look up from either side of the node, right? So you can look up from the from side, you can look up from the to side. And you can look up either way. I always return you a consistent representation from student to course. And something that is also unique about IPS is we implement the internal owner relationship. So every entity you created in IPS is own. Let's, for example, like user ID. So it helps with the security, right? Because if I own, if a user like me own a set of data, another user own different sets of data, then the data is secured at even at the persistent layer so that another user cannot access another user data, even if the service developer made a mistake. And having, you know, clear ownership in this case really helps with the authorization and also securing the data. And we also have use case where to authorize more complex authorization scheme that we have to integrate it into identity system. And the ownership information here becoming is useful for them. And this particular data is also used to support our list by owner API. So a user can come in and list for like if I'm, if you are filing for your TurboTax return, I can see give me my TurboTax return for this year, for example. These are some of the modeling best practices I've learned. It's don't store a large blob of text in Cassandra. It's very expensive. You think about it, you know, our rapeseed factor is three times in two region times six. If you have a large blob and you can't even search on it, it's really expensive to store it in Cassandra. So for this case, that's why our entity has S3 payload. And that's what we done of the payload to S3. One thing I noticed, people like to use a kitchen sink pattern. So for example, if our entity schema has no logical schema, I'll put that as a kitchen sink pattern. You put arbitrary key value parent, no one will know what's in your entity table. So you want to make sure that you avoid doing that. Another thing I noticed, there are a lot of blocks written that for different access pattern, you duplicate your data. That is very expensive. You can do that, but it's very, very expensive. In our case, like our index scheme, even though it's very simple, it's just a binary index, but it provides alternative access pattern to the data. But in our case, it's normalized. It's not duplicating the data, right? Duplicating data, you can do it, but to us, it's super expensive. Of course, you pay in extra read. Now instead of doing one read, you're doing two reads. But laterally, latency is tolerable. You're talking about three milliseconds here. And this is from my experience. You're doing no SQL for the first time. It's almost guaranteed we'll do it wrong, myself included. But it's something that you learn. This is our customer setup in AWS. We have two volumes. You have commit log. We have data volume. That's four times one terabyte in GP2. There's a reason that we max out in one terabyte. In AWS, EBS volume, they give you the max IOPS and throughput at one terabyte. If you provision anything more than one terabyte, you're just wasting your money. Not really worth it. So just don't do it. Because we use EBS volume, so our compute and storage are decoupled. We like it that we keep our cost. You can control your cost better and also helps with the auto-recovery. We implement a self-heal mechanism in our infra. When a customer note goes down, it will go down. And it will automatically start it. So if a server goes down, a new instance will come up. We assign it the same IP. And then we reattach the volume. And then just restart Cassandra. Eventually, within half an hour, the note will come up and rejoin the cluster. We don't even have to do anything about it. And our deployment split evenly across three AZs in AWS. This is also reflecting our replication factor of three. So we can tolerate one complete AZ failure in AWS without affecting anything. And this is the time when I give a plug for our open source project. All these things that I'm talking about is in an open source project called DSE Pronto. Many years ago, well, probably four or five years ago, we had a big effort moving from entry data center to AWS. And before, for our tax use case, we were provisioning for each tax year, we provisioned a new cluster. And that gets expensive pretty quickly. So as a part of moving to AWS, we consolidated this multi-year Cassandra cluster into one AWS cluster. So as we migrate more and more data into this new cluster, we start seeing Cassandra's GCPoS. This is the kind of stuff you don't want to see in Cassandra. So we went through, like, you know, head deep down to optimize Cassandra to take care of this GCPoS issue. So this is a process that we went through. Here, you give out the samples of the settings that we did that we tested and then, you know, take it to production. So we started with a kernel and we tested each setting or a group of setting to make sure that we tested and verified the settings that we are putting helps with the, you know, helps with the issue, not make it worse. So, and a DSE has a very good article reference on this one. And next thing after we optimize the kernel, same thing again, we repeat the whole process again, you know, testing in each case to make sure that it actually improved performance, you know, then we move to JVM from JVM and then we move to Cassandra, you know, to optimize at each level. If you try to test everything all at the same time, a lot of times you cannot tell if it helps or not helping, right? That's why we have a very methodical, you know, testing to make sure whatever we put in prod is actually tested and helped. And the last thing we did was the data stacks driver optimization. One of the things we enable is a speculative retry. This is to minimize the any service interruption that can happen. This is pretty standard if you are familiar with the data stacks driver. One thing that to call out here, we do, for example, set the remote region connection to zero because we don't want to do any, have any accidental requires to cross region. And one thing that we did, this no host available exception. This requires special handling for if there's a sudden network interruption, or sometimes I see it even with the few notes with the GC pause, this could happen. So we have a special handling for that to make sure that our app server can recover quickly. It's basically we terminate all connection to the Cassandra servers and restart everything again. And that seems to make our app server self-feel. Keeping Cassandra healthy is complex task. It's not an easy thing to do. And this is our experience from running Cassandra. It's running repair from day one. Whatever you do, just do it. It really will save you. You don't want to suddenly one day, you know, you have inconsistency and you cannot even run repair. I've seen quite a few cases of those. And another thing to notice is like, you know, your data model, right? In our case, for example, you know, entity, if the client do, you know, a lot of updates, then you can see that a white partition can happen. But since we don't expect that many updates, so we lower our threshold in this case, like this complexion large partition warning, the default is 100 megabyte. That's already too big. So we lower it to one megabyte so that we can get early read on if something bad is happening to our clusters. The thing is, what partition, when it happens, is very, very hard to clean up and it affects the performance of your cluster until you remove it. So the earlier you can get notified and earlier you address the problem, it will help with the cluster health. And since we have a logical model too, then during every PR merge, we also review the data model. Like for example, make sure that our clients don't create silly indices on like Boolean or enum that basically will just cause a white partition on your indices. And lastly, use any tools you have to help with your operation like running repair. In our case, we use the op center. No one did everything by themselves. I do have a very good team, a NoSQL team, very talented engineers that make all this possible. And lastly, we are hiring. Thank you, everyone. Any questions? Welcome to chat. Five years ago, I think that's the time when we moved to AWS. So that's when we make the huge leap. We paid off our tech debt because we have to do schema migration also. So that was when I think we paid off our tech debt. And that's why also we see that then we also comfortable in expanding our use case, not use case, expanding our client base. Because if you don't have the right schema or right framework in place, it's very hard to scale out. And once you scale out, when now we have hundreds of services into it using our platform, it's okay now because now we have a stable foundation that we build on. So we are not using Cassandra CDC. We use our internal Cassandra CDC is like, we don't really like it. So we implement our own CDC. And we kind of do a pretty cool stuff with our CDC. So we use Kafka for our eventing. But as a resiliency to make sure that for every create, update and delete, we don't want to miss any events. So basically, we have two systems. Basically, we use Kafka. And then as a backup, we use SQL as a buffer. So for most parties, Kafka is up and running, we will use Kafka. That's our first choice. When Kafka is not available, we will post SQL. So if any system is one is down or the other, we are not affected because if something is down, we did have outage on our managed Kafka within two years. Our SQL buffer kind of saved us because it was down for like a few hours. Everybody was crying. But having the resiliency we build in, it helps that we don't need to do anything when the Kafka is up and running again, our buffer from SQL starts flowing in. Yeah, go ahead. No, Kafka is more like post events. So for example, if Kafka is not available, I post the SQL. So it's independent. So the transaction is always taking priority. So for example, if both systems are down, unlikely, we have transaction logs that we go through, we can go through and replay. That would be the last result because that's for, it's not automated. It's more like a manual task that we can run. But we do have that option.