 Good afternoon and thank you everyone for joining us here today. So we hear a lot of things about new applications being built and using custom rate scales and new technologies. But we do not normally hear about how do we migrate older applications, existing applications to newer technologies. So there are some applications which I've been running for last 50 years. They are processing hundreds of millions of customers profile data and building data. And we are here to present a story about how we moved an application like that to the new technology. We went through the entire transformation. We took the application from on premises and relational databases and moved it to cloud and into Cassandra. And this was an exciting opportunity for us at Verizon Wireless. So my colleague and I are going to tell you that story and we are going to pinpoint certain interesting facts about that. So this is not going to be in-depth presentation about how we set up Cassandra, how we configure Cassandra. But certain interesting facts about data modeling that we learned and I hope you will able to appreciate and enjoy it. My name is Sunil Dabre and I've been working in Cassandra for the past seven years. And this is my colleague Chandi Dapta. He's been the SME for everything Cassandra in Verizon. So before we start with how we did it, I just want to give you a little bit of a shout about Verizon. Everybody, I hope, knows about Verizon. I don't need to go through this, but this is something that's required from me. So what are we trying to address? What were our pain points and why did we embark on this journey? So I'm going to concentrate specifically on the database side of the story. So when we started this journey, Verizon had fragmented regional databases. Even though we are a national company, we had different databases which were independent. So North customers were in the North data center, East customers were in the East data center. So all the databases were fragmented. There are no single SOR. There was a very high cost of ownership because of proprietary licenses, and especially because we used to have all these fragmented databases. So there are no centralized way to manage all the databases. And of course, we had a performance and maintenance complexity because any release would need us to do the same things on all different data centers. So that just added to the performance and maintenance complexity. So what were our requirements? We wanted something that was, of course, cost efficient, low maintenance because we were already looking at a high maintenance solution and we wanted something that was simpler than that. Of course, we wanted something that was scalable because we want more and more customers, right? So we want something that scalable that can grow as we grow as a company. We wanted something that was cloud native and hybrid. So we did not want to stick to on-premises and have those costs. We wanted to make sure that whatever database we choose can also go with the modernization of our applications. At the same time, one of our major pain points was regional databases. So we definitely wanted something that was had cross-data center replication built-in. We did not want to go ahead and like have three whole regional databases in three data centers. We wanted something that could automatically replicate the data across all the data centers. And of course, performance is always a requirement and availability. And most of you know what is the answer to all these questions. So I'm going to turn over to my colleague Chandri Dutta. He will take you through the challenges and complexities and he will take us through how we started our journey and where we are. So like we started, as Sunil said, our requirement was like, it's not a green-field applications. We started as existing applications and we want to build on top of it. And at the same time, we want to migrate our applications. So if you take a look at our stack, it's like 1,000 plus API, existing APIs were there. And top of that, it's a mainframe world. It's like so many bad jobs already. So how do we, like that was like big task, like how do we migrate those things in new stacks? So before I start, so our journey started like long back before even Verizon wireless formed. So like think about it when we are like in monolithic mainframe stack where everything moved that is like pretty much delivered at space. Where you want to access your data, you want to update your data, you have to go through complex loops. And top of that, like all MIPS cost. So if you want to add an engine, then you have to go through like whatever your vendor and then you have to corresponding cost, you have to bear. Now we wanted to change. And then in-memory database emerged. So we thought, okay, in-memory database is going to be a good fit for our stack. But in-memory database came with its own bales and whistles. So like, and it's still, as Sunil said, it's like end of the day, it's a fragmented database and we want a national database. So it failed to meet our national database requirement. And then came Cassandra. And immediately it met our requirement through it's like out of the box replication. And with that, it's like our journey started with that it's move on with Cassandra. So with that, like let's move on to the next. So before we proceed to like how we did it, as Sunil said, we'll tell you a story about like why we did, like how we did our migration. So as I said, our journey started back in 2000 when Barajan wireless formed. It came or acquired assets based on Bail, AT&T and everything. So everything came over like from our sister company. So that time everything born based on heads down mainframe applications writing to mainframe and application reading from mainframe. Now, like what kind of data we are talking about? We're talking about like system of record that contains profile data, billing data or a data. So these three are the key thing of any company or company like Barajan wireless where anything goes wrong will be in big trouble. So if you want to migrate our existing stack to a new stack, we had to make sure we are not touching any of the ecosystem. So we do not want to like, we don't want to change anything existing. So if anything happened, like we want to make sure in new stack, anything happens in new stack, we always can fall back immediately. And that like leeway is pretty low. It's like one minute or two minute, we can fall back. So it lets impact to our customer. So we saw what we did. We started with our like read journey first, like we are writing to mainframe and then we started replicating mainframe data to Kasandra in real time. So even though it's kind of small box, but it's like a lot of things baked into in the real time replications. And it's like, as you can see the customer, suppose you are in Barajan wireless.com and you are doing, adding a phone line suppose and immediately you want to see whether phone added or not or the provisioned or not. So those things like getting data into mainframe and immediately serve from Kasandra in real time. So it's like big round trip we had to make within certain milliseconds. So why we did that? Because one thing is, as I said, the MIPS. MIPS cost in mainframe is like if you read more, you are going to pay more. So we want to get rid of our MIPS first, like as soon as possible so that we can, we do not have to add engine in iconic time, any like Black Friday or any holiday pre-eds. So we copied like our data in real time from mainframe to Kasandra. And on top of that, we added all our API. Even though we migrated data, as I said, we didn't change any of our existing API. So earlier API, if you think of an RDBMS world, it's like thousand tables. It's interacting with thousand tables. But in Kasandra world, we had to mimic exact same thing. Or it's like, how do we can replicate data? That was one of the biggest challenge that we had because we are not changing any of our interface, API side. Like once we are done with that, and then our final target landscape was get rid of mainframe and write and write directly to Kasandra. So before I move forward and tell like how many tables we had and data model, I'll just quickly touch up on this one and debunk the myth that like Kasandra data modeling is kind of insurmountable challenge when it comes to a brownfield applications. So before like two, two, maybe like a second one from the last probably, somebody said from into it, like if you're starting from Kasandra data modeling, please get a separate hand because it's not easy thing. But what we felt it's easy thing to do once we understand how Kasandra unique way they are storing the data. Once we know that, it's pretty easy things. But even though it's easy thing, what we felt it's like one data model today, it's may or may not going to work tomorrow. So how do we change or how do we adapt that one? So glimpse of our like high level or backend. So it's like we played by Kasandra rules. We followed Chebot's course diagram. We followed logical model, physical model and one table to one query entity which for everything. So we had like before denomination, we have like 800 plus tables and we end up having like uneven size partitions and hotspots nodes. I'm sure like if you are familiar with Kasandra, you faced same thing, right? So we face the same thing. So with that, I'll hand it over to Sunil to start with our use case. Like data model use case, how do we tame those things? Okay, so let's look at a very simple use case, okay? So how did we reach this point? So for a particular telecom company, right? So two of our main entities are phones and customers. Customer has phone, phone belongs to a customer. So two of our like simplest queries would be, hey, I need all the phone numbers for this particular customer. Or if I have, if I know the MTN, I need the customer number. Let's say you call me customer service agent, he knows the MTN, he needs to know the customer profile. So you need to get the customer data. So these are our most simple use cases. You can't get simpler to this in telecom. In Adi main base bulk, only one table is sufficient. I can have a table, customer number, MTN, I can query by MTN, I can query by customer number. Obviously that's thrown out of the window when it comes to Cassandra, right? So now how would the table definition look like in Cassandra? You have one table which says MTN is by customer, where you have the customer ID as the partition key and the MTN is the column. And the second table is customer by MTN, where you have MTN as the partition key and the customer ID is the column. Of course, there are a lot of other attributes, but I'm just leaving them out, right? So these are the traditional approaches that we use to arrive at these data models. So do you foresee any problems with these data models? Partition size, what else? Consistency, of course. Okay, let's see. What happens when a customer has millions of mobile phone numbers? Of course, we are very happy, the business is very happy. But what happens to the DBA? Okay, so the first problem we are going to hit is there's going to be too much data stored on the base of customer ID, right? That's going to be large partitions. And when we go in the IoT space, sometimes the customers have millions of phone numbers, right, because each device has a phone number in the background. And of course, we can't efficiently read or process this data. At the end, Cassandra is just a JVM, it's handling request, and if there's large partition that's much more than 100 MB, and if you keep on reading that, the JVM is going to be in trouble. Of course, then the next problem is consistency. This is a very simple use case where we have stored the same data two times, right? We might have other use cases where we end up replicating the data. How do we make sure that all of those are in sync? Because table by table, we can say local quorum. But if I have an application path and I'm writing into two different tables, how do I ensure that they are in sync because anything can fail? And the nodes are unbalanced, right? Not all customers have millions of data. And in the ADBMS, we can roll back, of course, that's not possible in Cassandra. But we have adhered to all the data modeling rules. So what do we do here? So adhering to pure data modeling rules or whatever you see in the blogs is not the critical factor for success, right? There are certain things that you need to do that you need to look at your own use case and see how you can achieve what you want to achieve. Now let's look at this through the lens of the transformational project. Okay, so thank you, Sunil. So the first problem that we are going to address is like the large customer, right? So, or large partition. So if you go through any blogs or anywhere, so everywhere, like, or even from your side, you might be, you already did it probably in your installation. Okay, so if you have a large customer, or large partitions, let's partition it, even though micro partition, like add a synthetic column, bucket number, okay? So we did it. But wait, as soon as you add that one, then we are creating another problem. So suppose earlier, like one query, you were retrieving 1,000 rows. Now you have to write like 10 queries to retrieve 1,000 rows. So you are instead of like fixing one problem, you are creating one more problem. And that in like, because of that, you are creating a latency. So if you have like a sensitive latency sensitive applications where you need to serve your customer or serve your API within certain SLASLI, then like you cannot just randomly do partitions. So you have to be mindful. So how do we like tame these one? Okay, so before, like we all know, like Cassandra works if we distribute our data properly, like data density across nodes should be proper, not uneven. Now that's a process thing if we added a bucket number. Now too many queries and read latency is going to be cons. Now how do we solve it? In our case, what we did apart from adding bucket number, we added something called kind of T search sizing. As soon as we said we have IoT customers, not all IoT customers are large or we have a business customer, not all business customers are large. What we did, we added like a sizing, like some are small customer, large customer and a mid-sized customer. And we categorize those ahead of the time. Now you might be thinking, oh well, you know certain customers, but what about new customers? New customers you don't know, tomorrow they might become a large customer. How do we solve that? So we'll come back to that problem, but for now how we did it, it's we went with T search sizing kind of concept where like our API side, like we are aware where to look for. Like if you're asking for customer ID one, two, three, okay, so the API side, they are aware customer ID one, two, three is large. So I need to look for large bucket only. So now the question is, do we need to bucket customers ahead of the time? Well, not necessarily, right? So the process that we used is that we had certain thresholds and when a customer moved beyond that bucket, we did something called rebucketing to move the customer to the next bucket, right? So sort of like there's always pros and cons, right? So by this approach, we are able to get rid of large partitions and vast majority of our customers are small, right? So vast majority have like two MTNs, three MTNs, right? So for the vast majority, there is no impact, but for the other customers where we bucket into 10 buckets or 15 buckets, like we do do the extract queries, but the beauty of Cassandra and the Cassandra Java driver is that we execute all those queries in parallel. So there's not going to be that much of a latency impact. So as long as we can scale out and instead of like getting all the data from one node through bucketing we go to different nodes, we are able to achieve what we want. So the next big problem that we had is that since it was a brownfield application and we did the read first, we had to get the writes into Cassandra first, right? So the writes were happening into DB2 and we were doing a real-time sync from DB2 to get data into Cassandra. So this created like couple of constraints for us. Since the way we were doing the data migration from DB2 into Cassandra was not based on API or functionality, but it was based on tables. So whenever something changed in a DB2 table, we put a message in a queue and then we process from that queue and write it into Cassandra. The problem with that approach is that we are sort of strictly tied to the model. So whatever model we had in DB2, we had to follow similar model in Cassandra because the data comes table by table. So this was one of the things that we had to do. The only way we could get the read first done because the read first was our entire strategy to make sure that we achieve what we want in terms of like cost reduction and to even prove the stability of this product. So one of the key things that we utilize while doing the read first strategy was utilizing something called the Cassandra concept of using timestamp, right? So everything that you write into Cassandra has an unique timestamp. So if you write something with a timestamp and after that you come and write something else with an older timestamp, then that's effectively shadowed, right? So everything that happened in DB2 has some timestamp. So if there's an insert, then it has a timestamp. If there's an update, that has a timestamp. So even if we got the two messages out of order, the update came first and the insert came later. Still, because we were storing the data with the using timestamp, we were still able to make sure that the data in Cassandra was functionally correct. So this was one more challenge we had to go through and this typically you would not hear somebody using that using timestamp in such a way, okay? So one more challenge that we faced is like a inconsistency issue, right? So as we said, like one table in our DBMS, it's corresponding to maybe X number, many tables in Cassandra world because the way our API, exactly where to serve our API. We are not changing our API queries or anything because we had to fall back. So when we did that, so same table in suppose customer by MTN, we had to have like multiple renditions. So one by MTN by customers and other customers by MTN. Now question what we faced is like, sometimes like, hey, data are not in sync. Okay, sometimes we are sending like wrong data. One customer's like five MTN and one another customer same time. From another API we're seeing four MTNs. So the question is like, how do we trust our Cassandra data? So we had to come up with the reconciliation process. So reconciliation process was a spark-based reconciliation on the fly is going to calculate like back of our systems like daily recon or weekly recon, monthly recon, the large tables or like where we have like heavy transactions that are coming, those tables like daily recon or hourly recon and making sure that data actual data is coming from source to destinations are in sync before even customer notice it. If customer notice it, they might get a ticket but before even we go there, the reconciliation process that we use that helped us. And another thing that we all know, Tombstone. Okay, this is our favorite thing in Cassandra, right? Like sometimes I tell my son is like, sometimes he said is like, hey, what is Tombstone? That's like, dad, you are working on Tombstone? And I know, I mean, that's a separate thing. So Tombstone is kind of, we all know it's health data consistency issue with Cassandra, eventual consistency. I'll not talk about that, but we all know that. Now, and also we know Tombstone can and like create issue with our read problem and stuff. Now, what, like, I'll tell you what we did. Like, suppose if think about it, we place an order, customer place an order and immediately the order goes to something called pending table. And then that tables become hot. We have all our main data storing over there before it's basically a full field customer, like we had to fulfill those orders. And then once it's fulfilled, it goes to completed order tables. Now, earlier the pending order table is based on customer. One customer place, suppose, thousand orders. So we are placing over there. And then once it's done, we are deleting those rows and we are inserting in the completed row, cold data. Now, while it did that, next subsequent call, we are hitting those Tombstones. And that was causing lots of issues. So what we did, what we did in our side is like, we created a secondary table called lookup table. Over there, we created like customer and order relationship. So we are just storing that relationship over there and our main pending table is the only store order data. That is like only one key and the order informations are stored. And that way, we are uniquely identify the order. And then once it's fulfilled, we're moving that order to our completed tables. So that way, we are, even though we are, like we are creating Tombstones, but our API is not reading Tombstones. So that way, we are able to alleviate our Tombstone problem. And we had like another problem is like too many tables. So if you go to data stack side or anywhere, like clearly say, hey, don't go more than 400 tables. So like what we had like more than 1000 tables. So 1000 tables, think about it like running repairs and all other things. And on top of that is like our transaction volume, like 1.7 million transaction, read transaction per second. And with like more than like 300 case writes per second, we had to go with like 800 tables. So like it's concrete heat pressures and stuff. So how do we even get there? So we can go ahead. But this is what our modeling led us to, right? Because in typical Cassandra, you always hear that you need to have a table per use case. Now we have thousands of APIs. We have thousands of use cases, again the same data. So we end up having that many tables, right? So couple of things that we tried to mitigate for multiple tables. So we tried to, we tried two approaches. The first approach was to like, to combine multiple tables into one was sort of a column-based approach where we took the three tables and laid out their columns in a row, right? So you're combining vertically. The problem with that is that when we try to do that because of the nature where we are getting data out of sync, let's say we are trying to combine three tables into one row, sometimes we might not have the necessary keys to combine those tables, right? So the approach we went with to combine is to combine the rows horizontally so that we created reference tables where we had synthetic keys saying the table name. This is the table name and then these are the values and that's how like we sort of like combine the tables. I know we are running out of time, okay? So one more important question that I want to ask, right? Like, can we ever model for all the questions that we want to ask, right? A business like ours, right? And business users want a lot of information, right? Orders, MTNs, customers, this is just tip-of-the-ice bug. We have like thousands of plants. Thousands of plants have thousands of variations. They want to slice and dice with all those variations. Can we ever model for all those questions in Cassandra? Maybe with version five, where they are promising indexes, but definitely not with secondary indexes, definitely not with materialized views. We could not resolve this. So what we did, we created a CDC pipeline to take the data from Cassandra and put into RDBMS and all our ETAL and business-related ad hoc functionality is served from there. So that's where we are at this current time. So what are the lessons learned? We did all the mistakes that we could do, okay? And our business is a business that runs at scale, so our APIs require millisecond latency. If there's a customer and we all know how customers work, if they click something, they want a response back within a second, right? And if you cannot achieve that, they are going to move away. So what are the lessons learned? So Chandi is going to go over that. Okay, all right, so thank you, Srinath. So the first thing that, like, let's take a pause and reflect what we learned so far. We talked about our unique challenges that we faced, and also we talked a bit on how do we mitigate those challenges. So apart from that, one, like, the few things that I will talk to you is, Cassandra, in general, is pretty dark solid. So it worked for us, like, out of the box, replication, then scaling, it's solid, okay? So once you throw it in, like, it's done. You do not have to do anything. But as soon as you start using the features, or from our side, what we felt, as soon as I started using, like, all the other bells and whistles, like secondary index, materialized view, triggers, then even, like, sassy, okay? So we felt, it's like, since most of our backgrounds are DBMS best, we felt like, okay, let's go ahead and use it. But it didn't work for us. It scaled up to some extent, but in 10X, 12X, it never scaled. So after that, we stopped those things, like, hey, we're not gonna use any of those features. So including collection data type. So we faced many issues with collection data types. So I saw, like, left and right folks use collection data types when we started our journey. Then from there onwards, we kind of moved away from collection data types because it's getting a lot of tombstones. And so one thing that we learned, it's something we want to learn, and we have to unlearn something else as well, because our prior, like, what we felt prior knowledge, sometimes great hindrance to move towards new system, because we are coming from our DBMS background, and folks like our leadership, or whoever is seeing the blogs, they are just seeing the high level, and they're saying, hey, as it enterprise, what it felt like more and more, we want everything from Kasandira, but that's not the case. So, like, even though, like in earlier talk, Josh wanted to say, asset transactions are coming into Kasandira. So that can be our even, vector search is coming into Kasandira. But even though in future, it may or may not scale that company, like us, we are looking for. So with that, I will finish our session, and we'll open up for questions. Thank you. Thank you. Okay. If you guys have any questions, we can answer. If not, thank you. Thank you. And go forth and unleash the power of Kasandira. Go ahead. So, okay. So we just, like, concentrated on little bit issues concerned data model, right? Of course, compaction is a regular issue that keeps people awake at night. So yeah, so compaction is like in Kasandira, we tuned, like even we have tuned in such an extent, like the table label, also we have compaction. And some cases, like even we created a separate cluster for those set of tables, so that we can run our regular repair and also like the table where it's required, like frequent writes, those tables we moved somewhere else. So that way, we are, we cannot like, we cannot, what I'm going to try and say is like, we cannot fix the compaction issue is there as part of Kasandira. It's not an issue, it's a feature. So, it's there for a reason. It's going to chew your resource. So some cases like size-tier, like suppose size-tier compaction strategy if you're using. So it's good. We are making sure every mindful, like we're not storing huge amount of data. If you're using level tiered compactions, we are mindful it's not going beyond certain number of SS tables, because we might say like level DB can go beyond something, but actually in, if you're going to scale, we have a background job that is going to periodically is going to tell us, hey, it's going, like breaching our threshold, breaching your threshold. So either tune it or move your data somewhere else. So that's a very good question. So we touched upon that one. So what we did is remember what Sunil said is some of our, like out of 1000 API, not all APIs are transactions. Some APIs are looking for reference data. So we categorize those API and we find wherever, whatever tables they're attaching, we merge those tables and create a one table. So that way suppose we like at least close to 700 tables, that reference table in our DBMS, we merge into one table. So that way we are, it's like still one is to one, but in Cassandra, it's one table. So that is like reference table. And also we have the way we structured such a way, like we have a key to know where it's storing. It's going to metadata also. We have another table where API can look for the data from that particular table for by the metadata and all other tables, we have many tables. We still like dealt with like many tables. So at this point, like we are as per data stacks, we are the crazy customer that like we have 800, 900 plus tables. So we are still, even though we merge certain tables, but still we have like more than 800 plus tables. So I think in general, right? In general, they say that like one API, one query, right? We are not able to reach that, okay? And if we try to do that kind of thing, we are going to go into all types of data replication issues. So typically the pattern that we use is like creating some search tables. Instead of duplicating all the data, just create some identifiers in some search tables and then get those identifiers first, then get the data, right? So even though we have multiple APIs and typically with this pattern you add, you add a few queries, definitely you add a few queries. Your API is not going to be like one query or two queries. Instead of one query or two query, it might be four or five queries and some queries are dependent on each other. But that is the reality. It's not possible to always model to one query. All right. So if you have any further questions, I'll just answer your questions after this one. Thank you for your time. Okay.