 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager of DataVersity. We would like to thank you for joining this DataVersity webinar, Relational to NoSQL Migration, sponsored today by DataStacks. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them via the Q&A in the bottom right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag DataVersity. And if you'd like to chat with us or with each other, we certainly encourage you to do so to click the chat icon in the bottom middle of your screen for that feature. And if you'd like to continue the conversation after the webinar, you can continue the networking at community.dataversity.net. As always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and additional information requested throughout the webinar. Now let me introduce to our speaker for today, Ankit Patel. Ankit is a principal strategy architect at DataStacks who has nearly a decade of experience with Apache Cassandra. At DataStacks, his initiatives are related to digital modernization and transformation projects leveraging distributed NoSQL data platforms built with Cassandra. Additionally, his previous roles, he has advised numerous enterprises across many industries developing distributed software and specialized in financial services. Ankit, with that, I will give the floor to you. Hello and welcome. Thank you for the introduction. Welcome everyone to the webinar. My name is Ankit Patel. I'm a principal strategy architect here at DataStacks, and I'm more than happy to discuss the topic at HANier, which is a relational, which is known as RDBMS, to NoSQL migration. So with that being said, I'm going to the next slide here. And I want to pause a bit and sort of set the stage for the next few slides. We want to think from the notion of what Albert Einstein had said that year ago. He quoted that we cannot solve our problems with the same thinking we used when we created them. So this is really around the notion that our problem space has evolved over the past few decades and years, and it will continue to evolve. And it's going to evolve and mean that we may never think through. And we can't be solving the same, we can't be solving the problems from the same viewpoint that we've been doing in the past few years. We need to evolve the solution space as well. So we're going to open up our thinking box and create innovative solutions to solve the new problems. So, you know, just want to give a few moments here and think through this and going into the next slide here is really around, you know, the, if you think of the digital era, you really think that, you know, over the past few decades, people have started consuming more computer products and technologies and the internet has really opened up a lot of these usages over time. And you really want to think that, you know, if I have increasing needs to process the data at hand. So, you know, years ago, I used to write things on notebook. Now you write it on WordPad or notepad. And now you write on your cloud of cloud environments, whether it be Google, Google spreadsheets or, you know, some means of Microsoft 365. So things are moving, transforming over time to be more digital. On top of that, people have adapted to the mobile applications before you used to go to the bank to deposit your check. Now you can do it through just through the phone. So, you know, the increase in the digital footprint has led to a new problem space where how do I take all this information from an organizational perspective and process it securely, process it in a very efficient manner where you really, you know, the end customer and user has that experience of being always available and responsive to the new need that you may have from a data consumption and processing perspective. So that's really around the digital aspect of the equation. The second is organizations and users are more data driven now. So you look at, you know, the weather today and you may look at the weather tomorrow. Traditionally, you look at that in the newspaper. Then it switched over to TV. Now people are using their mobile apps to look at the weather every hour, every 10 minutes. And you're more data driven. I can't make that trip in the next half hour because it's going to rain outside or it's going to snow outside. And you want to, you know, that's driving your human decisions. It's also driving key business decisions in in many enterprises. So they're more data driven. They want to be able to digest the data at high speeds. They want to be able to interpret that data at high speeds and they want to drive their next process or decision around that data. And then, you know, that's just moving data from one to the next. That's around the data driven aspect of things. And then you have this notion of AI enabled workflows which is taking human functionality and augmenting that with artificial intelligence or computerized workflows where, you know, less human interaction is needed. So if you think of, you know, you typing something on a keyboard, now you can actually voice command the computer or the mobile app to type it for you. And, you know, that's really powerful and that's enabling a lot of AI workflows of in the new world. And you really can't, you know, solve these new problems with the traditional technology that you have chosen years ago. And, you know, we need to think from a modernization perspective. So having said that, you know, the modern era, you really need to care about the speed of data, the speed of data matters. And some of these silo systems were designed in the old world with RDBMS and, you know, it's really around the sad nature of how they've been designed up because the power space has really evolved. The data access in the siloed RDBMS systems, you know, can be challenging at times. Yes, RDBMSs do scale. I'm not saying they don't scale, but they have an upper limit of where the threshold is before your cost increases exponentially or, you know, they can't simply overnight. You can't say that I'm expecting 10 users today, 100 tomorrow, 10 million the after. You can't really scale that at ease. And then on top of that, the old systems are, they have some sort of resistance to change. You can't change the data models around easily. You cannot, you know, process the data from one flow to the other at ease without developing some complex architecture. And they, you know, you have these old mindsets of they don't want to change the application set. The business needs are different today. But however, you know, they've been able to maintain and operate those old systems at ease for a very long time. But you really not are designing around the new innovative need that you may have tomorrow. So resistance to change is a big thing that we've seen with the silo systems. And then, you know, there's a lot of legacy processes involved. So you may think that, you know, when we were redesigning in the modern era, we may never be able to support the same functionality that we did in the legacy processes. Yes, that is absolutely true. But you need to rethink those solutions that you've done in the legacy systems and sort of, you know, try to mimic that in the new world or solve new innovative ways of designing these processes in the new world. So you can really take advantage of the speed of the information of your processing over time. And then the lack of analytical data skills in the old world. So if you have already been a systems, I guess you can absolutely run out analytics on it. But the there's a lack of, you know, there's there's a lack of scale in the role of running those analytics. So you might need to come up with an architecture to do that. So you don't overburden the RMS systems with the operational workflows that you may have. Or, you know, you don't have to operate with the reduced capacity set. So you want to ensure that, you know, you're really tackling the analytical skills that you may have. And that's really driving the data driven architecture and the AI workloads that you may have in the new world. So, you know, just to think what is NoSQL? So NoSQL is really the future. It is not only SQL database. So it's a non-relational database supports the ability to access data using other forms besides structure query language, which is SQL. It's designed to be used by cloud applications need to handle massive amounts of data in real time. And it provides the ability to overcome scale performance, data storage, data model, and data distribution limitations. So the relation of scaling, yes, you can scale it charting in there. It scales horizontally to some degree. There's challenges in scaling that even further. And, you know, taking advantage of, you know, your DR strategies might not be at ease with these traditional scaling solutions. And you have to think more in the notion of scaling vertically to some degree as well in the traditional world. In the NoSQL world, scaling is simply adding quantity hardware and scaling more horizontally. And you can really do it across region and zone at ease with NoSQL world. So having said that, I just want to highlight, you know, few key differences between NoSQL and RDBMSes. Some of this, you know, you can argue that it is not completely true, absolutely. But, you know, this is at a very high level. Just if you were to compare NoSQL and RDBMSes at the application stack. So when to use NoSQL at the application stack is when you want to think from a decentralized scalable microservices application workflow. And then when to use RDBMSes. It's really around the centralized monolithic applications that have, you know, the need to run in active passive mode. And you want to just point to one data center or one to one instance of the database. And you sort of want to, you know, get away from this new era of microservices and a decentralized world. That's when you think RDBMSes and NoSQL is the opposite of that. And then from availability perspective, it's very easy to achieve 100% availability with zero downtime. In those roles, in RDBMSes, the availability, you can definitely achieve bits and pieces of it. And you might be in scenarios where you have grown active passes or master or slave workload. And, you know, it's, yes, you can argue that in RDBMSes, you can achieve similar availability, but it's really hard to always maintain that. And the DR strategies, again, are being continually planned might suffer in terms of, you know, how long will it take for it to become available in the remote data center or the remote network that you may have when you have disasters. And then from a data perspective, you know, with NoSQL, it's very easy to, you know, sustain low latency on structure, semi-structure, unstructured data at high velocity. In RDBMSes, it's generally around the structured data. And you really need to think how much you need to scale vertically and to some degree short RDBMSes when you have high velocity and low latency needs. So just want to think, you know, prospectively what you're doing with the data from a read-write perspective with NoSQL versus RDBMSes on your system through that. And then from a transactions perspective, NoSQL is generally around simple transactions and queries. While RDBMSes is around the complex nested transactions where you have your joins, foreign key constraints. And then, so, you know, NoSQL does different in that space a little bit generally. And I just want to think through, you know, the advantages that you're getting in NoSQL when you're taking through some of these transactions and queries that you want supported in NoSQL world. And then from a scalability perspective, as we discussed in the last slide, NoSQL is horizontally scalable. It's minimally scalable. If you have, you know, 10,000 transactions per second today, it's very easy to get to 1 million transactions per second, depending on, you know, what the capacity of the hardware and the number of machines associated with the database, the NoSQL database you're deploying. And then the vertical scaling is a key thing that, you know, RDBMSes are known for. And yes, you can scale horizontally with them, but they're shrugging balls again. You just want to think from a scalability perspective, you know, what it offers. So then we think about, you know, NoSQL databases, and then there's Apache Xandra. So, Apache Xandra is the best NoSQL database of choice. And the reasons for that is that it offers that 24 by 7365 day availability, zero downtime. It's active everywhere. It's master list in nature, and it scales linearly. It offers that zero login, so you're not bound to a specific cloud provider like Amazon, GCP or Azure. And, you know, you can design a layer on top and you can really have a NoSQL database that cloud made it to and have, you can drive that market service architecture at ease. And then, you know, there's global scalability with the database. It's the number one choice of world's largest consumer internet applications. And I just want to reiterate a quote that was published as part of Apache Xander 4.0 beta release, which is, if you use a website or a smartphone today, you're touching it to Xandra back end system. So just wanted to reiterate that, you know, Xandra is the cloud native NoSQL database. And the reason is that with Cassandra master list architecture, it is easily achieved. You can easily achieve 100% uptime across on-prem single cloud hybrid or multi-cloud deployment types, which is, and then the architecture is really engraved in the technology. So if you have a cluster of Cassandra, which you want to deploy across AWS and Azure, which is a multi-cloud scenario, you can do that at ease. And if you want to deploy a cluster across on-prem and GCP, which is Google cloud, you can do that at ease. And that's really your hybrid deployment model where you have on-premises infrastructure, merging with the cloud infrastructure. So it's really to drive the experiences, microservices, and type pool nature of the application and the user experience that you may be interested in as part of the deployment model. So just going back to some basics of what Cassandra offers. So Cassandra offers CQL, which is called Cassandra Query Language. It has similar syntax when compared to SQL. It's a standard way to communicate to Cassandra clusters for reading and writing data. It's a feature-rich language that allows you to manage the cluster. It allows you to manage permissions, roles, it has data support, it has user-defined functions, user-defined aggregates. And then there's a lot of neat things you can do from querying the table and putting a work clause in. It's very similar to the SQL nature. And an example of read query would look like select star from a key space and table where partition key equals a specific value. So if you compare this to SQL, you'll find a lot of similarities. And then an example of writing the data is insert into a key space and a table using a partition key, a clustering key, a value column, and then the values associated with it. So we mentioned the word key space in the previous slide. What is a key space? So it's similar to a schema in the RDBMS world. It is a container for multiple tables. It has an application strategy associated with it. An example is there's two different types of strategies. You can read about it, which is simple strategy and network topology strategy. And then there's a replication factor associated with the key space level. And then there's extra thing about durable writes. So if you were to write data to it, it would ensure that it also writes to something called the commit bug within Cassandra. And then an example of creating a key space is create key space test with replication network topology strategy. I want to point it to data center one with replication factor one. And I want to set my door with the rights to the true. So what is a table? So we mentioned what a key space is. So what is the table? So it's the same as an RDBMS table. It contains a primary key. It always has a partition key as part of the primary key. So this is very important. The partition key is a must for the primary key to exist. And then optionally, it can have a clustering key. So this is how the ordering columns are defined of has the partition key, both the partition key and the clustering key make up the primary key. But the clustering key defines the ordering, whether you want a sending or descending. And then both again, just to reiterate both partition key and clustering key can be composed of a multi column of primary key. And as of a parameter can be adjusted to the table level. Some of the parameters that can adjust to the table level is a compaction of compression, GC grace, second and time to live. These are some sub settings within the table. You can read about them depending on, you know, the different types of strategies you may want to employ table by table. And, you know, we have great documentation on our website around this. And then here's an example of how to create a table. So it's a simple create table statement test that is the key space sample table is a table name. And then you have a partition key one now, which is you ID in nature partition key to which is you ID in nature clustering key one time stamp clustering key to integer value one text value to double. And then at the bottom here, you would say, I want my primary key to be partition key one partition key to you want to make sure that the parentheses the parentheses here are associated with the partition key. It can be multiple columns just as this example states. And then you have a clustering key one and clustering key to the additional components of the primary key, which is the outer parentheses here. And then as we stated previously, you can easily state that, you know, the clustering key can be ordered and you can say that clustering key one, I want to order descending and then clustering key to I want to order ascending. So, you know, just want to reiterate how much similarity is here in terms of SQL and through all just want to ensure that standard does drive a lot of its user interaction and application interaction through the SQL language, which is standard query language. And again, you know, just to reiterate what is the replication factor. So replication factor determines how many copies of your data are stored in a Cassandra cluster. Each copy is stored in a different node. So Cassandra cluster container of multiple nodes and each copy is stored in a different node. The replication factor can be defined by data centers that you've set up. So if you have multi data center scenarios, you call it DC one in the East region and DC two in the West region. So you can say in my Easter region, I want three copies in my West region. I want three copies so you can easily set up the replication factor at the DC level. And this is a, this is a primary set at the key space level again. Just want to reiterate that fact that it is a controlled by the Cassandra schema set up by the Cassandra server. And replication factor is something that you set up when you set up the key space and, you know, you can alter as you have different needs and add different data centers over time. And again, I want to reiterate the fact that replication factor is something that you control from the schema when you create the schema, the key space and the tables. So then the reason I said that multiple times in the previous slide is that it is at the schema level is that there's this notion of consistency level. So this is a parameter which is controlled by the client on individual queries. So you can think of, you know, a read operation, a write operation from a client perspective and each individual query that you make into the cluster would have different consistencies associated with it, depending on, you know, what the client wants to achieve with the read and write operation. So, you know, this parameter combined with the replication factor can help you achieve the consistency requirement. The specific use case is looking for. And some simple examples of different, different consistency values are one, which means that I'm reading and writing to my each data center. And I, at minimum, acknowledge back to my client when you think that you've actually written the data to one replica set. And, you know, then the client will be happy from achieving that consistency. And you can see that, you know, one may mean that I'm replicating my from a cluster perspective, I'm replicating my data in the east and the west. I want to ensure that when I'm writing to the east data center, I want to ensure that the east data center, at least one replica set is return acknowledging that I've actually written the data there before I reply back to the client. So that's the difference between one and local one. So local one ensures that whatever data center that you're pointing at acknowledges at least one the right or operation before it goes back to the client. So yes, it absolutely does right to six locations based on replication factor three in the east and replication factor three on the west. But the acknowledgement goes back to the client one, when the first replica within the east data center says that I've actually written the data on a right path. And then quorum is a little bit different. So quorum means that I want my majority of my replication factor, majority of my replication factor across the cluster to tell me that they actually gotten the data. And the client actually gets an act back. So you can think of majority just similar to voting in some scenarios. If you have about 50%, it's no good. You really need that extra percentage, which is 50.1 or 50.001 to say that I have majority. So it's really around the majority. So if you have replication factor six again, the east data center being three and the west being three replication factor six scenario, the majority is not three. And it cannot be 3.1 because it needs to be whole numbers because in instance of Cassandra could be up or down. So there's no partial. So it is three, which is not the right answer, which is 50%. It's actually needs to be four. So four replicas need to reply back to the client on the right path saying that I've actually taken the right and written. And so now the quorum might work in your favor against your favor depending on the network of agencies between your east and west. So you may not want quorum in some scenarios. I just want to go over each quorum before we talk about local quorum. So each quorum means that in the previous example with quorum, it says that right to the majority, which is four out of six. So the difference between each quorum and quorum is that each quorum is that within each data center that you know of, which is east and west, make sure you write to the majority. The previous one is just right to the majority replicas, which is six, which is four out of six. I can be three in east and one in west. But each quorum says that in east, make sure you write to two out of three. And then in west, make sure you write to two out of three. The total is still four, but it ensures that each data center that you have gets the majority. And then there's local quorum just like local one to ensure that, you know, if I'm pointing to east, I don't care about getting the majority in the east before the client gets back back. So you want to ensure that, you know, you're using this in combination of where you are, where your network agencies between the data centers are. So, you know, most of the time we, we advise from a data stack perspective to ensure that client execute local quorum operations to not interfere with the agencies associated with the physical network. That does not mean the data doesn't get replicated to the other side. It actually gets replicated immediately. It's just that the client doesn't get the act. That's all. And there's mechanisms in Cassandra put in place to ensure that, you know, if it did miss a blip of data points due to a small network outage a few seconds, then it does catch up. So you can also read about that. That's the advanced topic is still on the discuss side. But what I want to highlight is that consistency is controllable from the client and want to ensure that you're driving your use cases based on the different consistency requirements that you have. And all means that, you know, just like I said, it ensures that every single node that you're touching on the right path written to before the client gets an act back. So this could be a very expensive operation. And, you know, you may not be able to sustain any notes down in this scenario. So, you know, just want to highlight, you know, what it would look like in the previous slide, we went through the different consistencies and at the high level if you have an application stack, you know, what would the reason right to look like an action. So, you know, this is a multi data center scenario where you have on premises, AWS and Azure application is three per data center. So, and the consistency again is, you know, per read write request from the client and the application has been running an active active scenario is deployed across the DCs on a read write perspective. So, you know, if you have, if you have a client which are pointing to the application, which are pointing to DC one, then it will write to DC one up depending on the consistency. If you choose local quorum on the right path, you would write to the data points to this data center as your primary, but at the same time in the background, it would actually replicate across the board. It's just that the client application will only get an act back when it finishes up writing to the data center that it's pointing to and that holds true across data centers. So if you think from a DR strategy or business continuity perspective and let's say you lose your own crime data center, you can very easily point all of your clients from a low balancing perspective to AWS or Azure. And, you know, you can run with the same capacity, depending on how you design the system ground up. So, this is really, you know, highlighting the fact that you can really run active active workloads across the board. Yes, there's some caveats there, but, you know, generally speaking, this is achievable at ease in a multi data center scenario. So, just want to talk about, you know, how can my enterprise get from an RDBMS based design to a Cassandra based architecture. So, so one of the key points is that the structure data portion is the key for both systems. So in RDBMS world and in Cassandra, it's more about structure data. Yes, in Cassandra, you can get away with unstructured data, but it's the norm to have structure data for both RDBMS and Cassandra. And you really want to revaluate the need for asset transactions. So with RDBMS scaling with different shards of asset also becomes some problematic statement when it goes across, you know, different shards and across tables and different different rows that you might be transacting with. So you want to think, you know, do I really need asset? Cassandra does not support pure asset. It supports lightweight transactions. So within a specific primary key, partition key, you can say that I want to, you know, only update this data point or insert this data point if it doesn't exist. But that's only within a specific partition key, primary key within a table. So you want to ensure that you do understand the difference between what your asset capabilities and RDBMS are versus what Cassandra offers. You want to take advantage of the Cassandra performance. So you want to, you know, if you have the need to join data sets across different tables, you want to move that to the application stack. You may want to think through, you know, what does joining really bind me with Cassandra distributor architecture. You can try to maybe do parallel read and join the data when you get from the application stack. Or you might want to do some sort of sequential read where you get it from table one. And then once you get the data back, you use the data points on table one to go to table two. So, you know, so joins are not supported out of the box. You will need to move that to the application stack. And denormalization and a data duplication is efficient. So, you know, you may think that why am I really joining this data with my I'm storing information related to a person and I'm also storing information related to what car they drive in a different table. So you may want to think about denormalization that, you know, this data is really related and I would never update one without updating the other or I would never retrieve one without registering the other. So you may want to put that into one, one table structure and with Cassandra, you know, whether you need to normalize that data or you need to look at that data to retreated by different means. It's totally okay. And, you know, you have to take advantage of what it's good at, which is the read write speed when you hit a specific primary key. And then you want to ensure they are choosing an index type wisely. If you have latency and TPS concerns associated with the index type, you want to ensure that you're, you're choosing the index type wisely. Cassandra does offer some indexing capability, but, you know, it comes at a cost you will not be able to achieve maybe, you know, the same sort of latency or TPS with the same number of machines or nodes. So if you do want to choose from a, choose an index, you just choose it wisely and ensure that, you know, you're driving most of your use case on the primary key read write path. And then you want to thoroughly plan the data model. So the data model is known to be, you know, one of the most challenging aspects and you want to ensure that you plan that ahead of time and you plan it thoroughly in the Cassandra role. So, you know, just reiterating the fact around, you know, the data model. So in the traditional role, the RDBN says ERD is was a norm in designing. You had foreign key constraints. You have indexes associated in the ERD based design model. And that really goes down to the scheme applied at the RDBN. In the new role, which is Cassandra and no sequel, you want to think from a query based design model, which is around how am I going to read and write the data so that it's efficient for me to access it when I need to. So Cassandra is really around, you know, hitting that data point, hitting that partition or primary key from a read path perspective. Outside of that, you know, things start to become inefficient very fast. So you want to ensure that you're thinking from a query based design pattern when you're redesigning from the old world to the new world. You can achieve some sort of index, some level of indexing like we mentioned in the previous slide, but you may choose it wisely that's all. So just to reiterate that in the old world, it's more about the ERD based design model. In the new role, it's a query based design model. So we'll just go into that in this slide. So what is a query based design model? So at the very high level, you can think of redesigning your application at a very high level in Pi steps. So you want to look at the application as a whole, which is really about deciding the application access patterns to various entities to deliver business functionality. If you have a few simple examples of in the medical world is that if you have medical related data, you want to retrieve medical history based on the queries at the application level. And then you have doctor visit queries at the application level. So then the level of translation is a conceptual model with your step two and the conceptual model says design a mental model of the access pattern. So if you want to retrieve medical histories, then you want to possibly read the surgeries or read allergies and read health conditions. From a doctor visit perspective, you want to read the notes based on some data condition to read the prescriptions by the drug type, for example, or read vitals based on, you know, the different parameters that you may have set on reading the vitals based on a date range or based on the patient name. And then the conceptual model is really translating to the logical model. So the logical model is about defining the structure of the data elements based on the query based design. And that's really around, you know, going back to the example, if you want to read prescription information, what are the data points associated with prescription, which is a patient information day drug dosage and, you know, many other fields that you may have. And then before you get to the final step, which is a physical model, you want to make sure you go through an optimization path where you say that you want to make optimization on the data access. When it does translate into a physical model, which is really, you know, I have a need to read and write information by prescription, but I may want to read by not just patient but by the drug type or by the doctor. So you may want to think, you know, do I need to duplicate the data or do I need to create an index on that data point. And then the final, finally, everything translates to a physical model, which is build a Cassandra table and schema based on logical model that we've decided ahead of time and the optimization that we will apply to that logical model. And then, you know, in an example scenario, you may have a table at the end which stores prescription information where the primary key is patient date and then on top of that you want to index possibly on the doctor and drug to treat it by other means besides just the primary key. So, you know, until now we've talked about Cassandra and I just want to spend a few minutes about what data stacks is. So data stacks have has a product around around Cassandra called data stacks enterprise Cassandra is the key foundational layer to the data platform that we offer. So if you look at all the to the bottom left here, the foundation layer, which is Apache Cassandra, which is our no sequel database of choice, which offers 100% uptime zero lock and then global scale. But however, you know, if you think just Apache standard, it may not meet your organizational needs or your immediate needs and you may want to, you know, make things easier for you from operations and development perspective. So that's really around, you know, the trusted layer that we offer a data stack here which is around operational reliability. We offer advanced performance enterprise security monitoring and many of the things of relates to our product. And then that's the trusted layer is really around the aspect that we want to make the operations of the database very easy for you. And then the development angle is that we want to offer features and functionality from a data stack perspective that at ease you can do multi model capabilities on your data. You're able to drive those operational analytics, whether it be spark streaming or spark batch jobs at ease. But we want to offer that hand search capability. So we have, you know, if you if you want to use solar based tech search, we do offer that type of functionality within our product. So you don't have to duplicate your data or denormalize your data. And you can sort of say that I have table one and table two with X, Y, Z primary keys. And I want to get the data by different calls that stored in each table and you can do that at ease using our hand hand search. And, you know, we invest a lot of time around that. And we have recently introduced a storage attach indexing which we put in open source to Sandra as well with to Lucy inspire. So that's taking our some more traditional search functionality and putting it back into into Sandra. So, you know, we're heavily invested in and reading and writing the data at ease from a perspective and that want to ensure that, you know, we're driving that developer capability around. So you don't have to over design your data models. And then we have a great feature called graph engine. So we, we think that, you know, we are not a graph database going round up, but we have this additional feature called graph engine. So you can do complex entity resolution. So you can go from the patient to the doctor to the drug and traverse through that complex relationship at easy at ease from a graph engine perspective. And that trouble sure was possible through the gremlin API that we offer from a data sex perspective on top of the Cassandra data, which is combined through the graph people that we we also within our product. And then on top of that, we, we have invested a lot of effort around the extensible integration. We can at ease put in a cut cut connector that we offer has been open source. And then we have both loading capabilities in and out of the database. So you can take the information and store it in CSV file format or the CSV file format and dump it back into the database out very strategically and maybe use that for your audience that you may have. And then, you know, we just want to highlight that strategically we're really around, you know, making the developer and and the operators like easy with capabilities, such as the rest or graphic graph to API or CQL that we will. We have to invest our time and effort around it. We want to make a feature rich. We additionally want to make, you know, the cloud story of easy and automation around that. So we have invested a lot in the cool that is operator and we will continue to do that. So you can easily take our two overnight operator and deploy Cassandra or data sex enterprise that ease on any cloud providers, cool. What I didn't know you're on Prem Kubernetes engine. And then that really comes down to, you know, the, the outcomes that you're driving and how fast you want to drag those outcomes for my AI scale perspective, microservice and the insightful decisions that you may want to drive. So it's great that, you know, there's Apache Cassandra and then there's data sex enterprise. But I want to highlight that there's additional product called data stacks Astra, which is Cassandra made easy in the cloud. So you may think from your perspective, I really don't want to deal with the operations and maintenance of Cassandra. So if we are offering Cassandra as a service, it's a cloud native database as a service built on top of Apache Cassandra. You literally have to do no operations. So we want to eliminate the overhead of installing operating and scaling the Cassandra database for you. Again, you know, it's backed by the cloud native aspect that we back it by the Kubernetes operators that we open source and we actually consume that to in our product called Astra that data stacks. And then we have zero lock and so you can deploy your Cassandra cluster on AWS, GCP. And, you know, we have full feature compatibility with the open source Cassandra project. And then we also offer powerful rest API on top of the traditional CQL capabilities of Cassandra. And, you know, you may want to take advantage of that from a development perspective because you don't have to get to know what CQL is like if you don't really want. And you can just connect it to our rest API, our flexible end point that we offer as part of the product. And the best of all this is that, you know, if you personally want to try this, we have a free 10 gigs here for everybody. So you really don't have to think that do I need to swipe a credit card? You can launch a database on the cloud with a few clicks. And you when you register, you really don't have to open the credit card. It's a free tier. And so absolutely please check it out. It's astra.datastacks.com. And then so it's great that, you know, we talked about technology. We talked about RDB messages, what Cassandra offers, but I just want to go through a simple use case that we have in the supply chain world. So CNS wholesale growth. Grocers has been a client of ours for many years and, you know, they've been very happy with the usage of Cassandra. And organizationally, what they do is they deliver food, 140,000 items plus. And some of them being non food items for over to many of their clients and they have approximately 50 warehouses around the around the country. And they they manage approximately 18 million square feet of storage. They have customers like Safeway, Target, Stop and Shop. And then they they had this need, a technical need where the traditional solutions were slowing down the distribution efficiency. And that was impeding their innovation within the organization. And they really wanted the technology, please, the technology innovation to drive business growth. So what was their challenge technically, right? So at a very high level, we understand what business they're in. So the supply chain process was local, was storing into RDBMS is local to their warehouses. If you think they have 50 warehouses, so each of the warehouses had RDBMS is deployed there and we're storing information locally to warehouse. And then there was a business need to consolidate all that and bring it to the central office, all that data. So the management could be at ease for them for the business folks that were sitting in the central office and they can drive it through mobile applications. The transaction volumes were, you know, thousands per several seconds. And they needed a real-time view. So, you know, it was literally some warehouse within a warehouse and some item were to be shipped from warehouse to location to a pilot. They needed a real-time view of all the working parts in the manufacturing operations. And then they wanted a data platform which is capable of operational analytics. So, you know, if you think going across 50 warehouses for your operational analytics that might not serve their purpose, it might be really inefficient for them. So they wanted a data platform that was capable of the operational analytics. So why did they choose Cassandra? So, you know, simply we highlighted within the slide deck in the past is that Cassandra offers that scalability for them. All the volumes can fit on a six-node or a ten-node cluster today. You know, they can easily scale that up to 20 nodes to handle higher transaction volumes. It offered that low latency to serve their mobile users. They had high availability. So if their warehouse operations were 24 by 7, they weren't sure they're invested in a data platform that's also available for them 24 by 7. And then they wanted to ease the development process for their microservice and mobile application. And then they had a multi-DC deployment. They were looking for a multi-DC deployment capability which Cassandra offers. And again, you know, just to reiterate, it was really around the operational analytics of the pieces. So they wanted to, you know, digest all the information of the related to the, sorry, they wanted to digest all the information related to the different warehouses. And they wanted to drive operational analytics to see, you know, where the item might be stuck or, you know, what's holding up the process to drive more business efficiency around it. So the operational analytics piece was a very key component of why they chose Cassandra. So what are some of the business benefits they saw? So some of the business benefits they saw by implementing this data platform on top of Cassandra was that we saw a five-year ROI projection which was saving them multi-million dollars. They were able to optimize the management capability of consolidated warehouse operations. So before they had to reach out to each individual warehouse and there was time involved in doing that. So they were able to consolidate all that and drive the efficiency in that data pipeline. And the transaction volumes were, you know, thousands and seconds. And they were able to support 300-plus users and process over 300k records in five minutes at ease. So, you know, the records might translate into multiple transactions as it's hitting multiple tables underneath. And then, you know, at very high levels, simply what does the architecture involve? So you can think of the RDBMSs here over on the top left here as the different warehouses. They were pushing that information downstream to Spark streaming and to some degree, Spark Patial was holding the databases. And what the SparkJobs did was basically it transformed that data in how they wanted to interpret it in Cassandra. In the primary data center, they pushed it further down to Cassandra and maybe read from it in some cases and transformed it further. And then, you know, Cassandra and me would replicate that data to their second data center where they drove more microservices on top of. And that really served their mobile application. And, you know, to some degree, the microservices with the SparkJobs also did these insightful reports on the data so that their management can easily look at that through their mobile applications. And, you know, we do have this case on our website. You can read more about that if you're interested. I just want to highlight a quote that from CNS directly that we needed an application that was entirely reliable and not vulnerable to unplanned outages because our warehouses were pretty much 24 by 7. So, you know, again, it's really around 24 by 7 capability of Cassandra and ensuring that you can actually drive use cases that you can think from a business perspective or personal usage perspective on top of Cassandra. So, that's not enough. I just want to go across industries and, you know, how to defend use case in the financial services. I don't want to mention the client at this time. They don't choose to be public, but it's a mobile banking use case. It's a very competitive retail banking market. They need to keep up with the demand growth in the digital banking space. They have high cost. They have a need to satisfy their customers at a very high rate because there's a lot of competition in the space. So, and they want to also achieve an efficient DR and business plan. Yes, they had that before, but just took, you know, several hours and they want to minimize that and reduce and mitigate the risk around that basically. So, you know, what was their challenge? So the number of transactions in RDBMS was not easily scalable. So, you know, they have a challenge where the RDBMS was not easily scalable with the number of transactions they were referring to. And the DR was not that easy either. So, again, you know, it took multiple hours, but they wanted to reduce it down to as much as possible. And then achieving the latency in terms of metrics was harder and volumes increased. So, yes, absolutely was fine to a certain degree, but as the volumes increased, the metrics they were trying to target in terms of SLA were not easily achievable. And then they want to get away from any downtime or for experience that would translate into customer terms. If you think of yourself and you're trying to deposit a check or trying to log into your mobile banking application for your bill pay. If you're not able to do that, if there's an average associated with it, you may not want to associate yourself with that bank. So think of it from a consumer perspective. So, you know, why did they choose Cassandra? So, you know, they ended up deploying their solution on three data centers. So they actually, you know, have that strategy in place for business continuity in DR. They are driving on top of a microservices architecture to ensure that they can run an active active active passive and some some scenarios. They wanted to scale the application stack with the database. So, you know, if I have 10 instances of my application running today, I want to scale it up to 30 instances of my application. Can my database really handle it? So, you know, the database also needs to be scalable to some degree for your applications to actively scalable. And then they really want to achieve the SLAs of, you know, they want to target themselves to be below 20 milliseconds on average per rewrite transaction. And then, you know, the DR strategy, they want to ensure it's solid with high availability. So in Cassandra terms, if a single data center is down in this scenario, two data centers are down, they have a third data center to handle their customers. And they want to be able to process billions of transactions per month. So, you know, that was really came down to the two metrics of what they were trying to do. And, you know, these are some other common use cases that we've seen in the customer space that we have. So customer 360 is very popular with our graph engine. We have a lot of use cases on the payment space where people process, organization process parts of the payments were closed up on top of Cassandra and data stacks. And then, you know, Cassandra is also known a lot for its IoT and time series use cases, which is around, you know, you can think from an e-commerce perspective, your shopping cart experience possibly. But if you really want to think pure IoT, maybe, you know, sensor tick data also plays a key role there. And, you know, fraud detection is also important. So you may be thinking of yourself logging in from your mobile application. But what really happens behind the scene is that that device gets tracked in the end organization. And, you know, that needs to happen at a really fast speed so you don't interfere with your logging experience or the fraud detection needs to happen. You know, this person is not just logging with this device, for example, and I want to reject this login. So, and, you know, we went through the online mobile banking use case, inventory management with CNS we went through. There's also this recommendation. There's a recommendation by use case where you can drive if I have, you know, X products and services associated with customers today. Can I actually, you know, digest my data at ease with my operation analytics? I can recommend to my customer being also the interest in XYZ part of that I offer from my organization. And then, you know, you can also drive better regulatory compliance with the data you may want to store in a standard. Because again, you can dissect and bisect the data at very high speed and, you know, go through terabytes of data at ease to ensure that you have regulatory compliance with the device. And then alerts and monitoring is a great use case in our next credit card transaction world. If you were to issue, if you were to do an online purchase or if you were to go into a store and purchase something, you've been notification push to your mobile application or your text message or something that you've made a purchase. So, you know, in fact, those are use cases you can easily drive on top of the standard. Again, you know, just reiterating the global payment aspect of things. There's bits of pieces of the global payment to process where you want to be available across many regions and across the world, literally to ensure that, you know, the system is available for you 24 by seven station drive some key components of the global payment system. The portfolio management aspect of things. If you have a portfolio of products, if you have a portfolio of different customers that you have, you can even manage them through and it's life cycle through a work that you want to put on top of Sandra. And then in the in the lending space, the loan authorization process can also be we've seen put on top of Sandra. And then again, we went through, you know, in the banking space around the authentication fees, make sure that your mobile application is logging in and is ensuring that the right device is logging in. So, you know, those type of use cases we've seen very often within the authentication space. So, having said that, you know, I'm going to pause here. I want to thank you for joining this webinar and hopefully you do find Cassandra interesting. And, you know, please check out after the data sex.com. You can deploy for free. So it's something that you should use and leverage as a tool to learn Cassandra. Okay. Well, thank you so much for this great presentation. We have lots of questions coming in for you. And just to answer the most commonly asked questions, just a reminder, I will send a follow up email by end of day Monday for this presentation with links to the slides and links to the recording as well. So, on diving in here, can relational databases be fully replaced by no sequel? Is no sequel good for OLAP or only or good for OLTP as well? So, I want to touch on the OLTP piece. We are known to be your operational database which handles high logging transactions. So, yes, it is good for OLTP, but you have to be careful that it doesn't have complete asset capabilities. So you need to design, read, go to the thinking board and think from, do I really need my asset transaction? So, just think from that perspective, because, you know, as you start selling horizontally, some of the asset means, you know, can burden the way you're trying to achieve that low latency or go across tables. So Cassandra out of the box just does not offer that at the moment. So you want to think from that perspective that do I really need that asset capability? And if you do then, you know, you may want to stick to RDBMS in parallel and sort of build a pipeline into the new world where you want to serve on top of the market services architecture. But if you can get away from the asset capabilities and just stick to the lightweight transactions that Cassandra offers, then, you know, maybe it's the right fit for the use case. The way to design Cassandra appears to be the same at the CDM and LDM, the physical use, the physical use, the Chebotico methodology to create physical queries rather than traditional relational PDM. Could you provide insight into this design? Yeah, so you're going to see a lot of common patterns there, but you have to be careful. You want to go back to that slide deck when you do get a copy of it and think not from the ERD-based design model, but query-based design model. Yes, in the relational world, you do think from a query-based design model as well, but, you know, you can easily achieve those joins and, you know, your foreign key constraints very easily. So Cassandra does not offer that out of box, so you may want to denormalize the data in some cases. You may want to, you know, think from perspective, is the index going to do justice, or do I really need to, you know, design a new table where I duplicate the data to look up the information? Now, Cassandra does offer something called materialized views that read up on it, which offers that duplication for you. So, you know, there's going to be some overlap there, but you really want to think from a query-based design pattern because that's where you will be able to excel your use case and the pattern that you want with the horizontally scalable database. And is it possible to know which node the data and replicates physically reside on which data center? Yes, there are some back-and-commands from operations perspective that can use to do that, but at a very high level, you know, it's a predictable hashing algorithm that Cassandra uses underneath to determine which actually replicates that actually owns that data. And the client knows that in the draft, we offer a site to finish on that thought. So, the client also knows predictably which coordinators they are hitting from a Cassandra perspective because it has all that information and that hashing algorithm on the client side as well. So, when you're trying to read a partition key, it knows exactly which coordinator to hit, which is more likely to be a replica versus not being a replica. So, the world of polyglot storage engines, where is the niche for Cassandra? Is it specific or generalized database for all types of applications? I'm sorry, just repeat the first half of the question again. Sure. The world of polyglot storage engines, where is the niche for Cassandra within that? Is it specific or generalized for all types of applications? So, Cassandra is, we went through the different use cases towards the end. Some of that, you know, if you're, I'm just trying to think from a polygraph perspective, if you have a need to, you know, go across multiple entities at ease, you want to take advantage of the grat capabilities that we have. You want to ensure that, you know, you do, do you want to take advantage of, you know, complex entity resolution? So, Cassandra, the box does not offer that. You might have to design that from a, from the application stack if you don't want to take care of the, take advantage of the grat capabilities. But you, it's really around, you know, also the structure storage aspect of things. Yes, you can store blobs in Cassandra. But I'm going to stop here and ensure that, you know, if there's any follow up questions related to this, you, we will follow up through email as well. And there's a lot of documentation of, you know, what you can do with the center from the different use cases and, you know, the different patterns you can try from the traditional data patterns that you've seen. Perfect. And yes, we will get some answers for any of you guys have, we have time for a couple more questions here, but if you keep the questions coming and we'll get those over to data stacks as well. If you have any questions that are unanswered so that they can be included in the follow up email. So, is Cassandra, is it one contender instance or multiple instances for all platform apps? It's so you can think of Cassandra from a platform apps perspective, you don't need to put all the use cases in one cluster, right? So you want to isolate and mitigate that risk. So, you know, the platform application perspective, if you want to, you know, drive X use cases, you can drive it on one cluster, you can scale that cluster to, you know, hundreds of nodes. But you want to also think from a risk mitigation perspective, if you're doing the operation yourself that you really want to, you know, put your eggs in one bucket, or do you want to go through that same sort of, you know, if you have to do an OS level patch or an upgrade, would you have to take the same strategy from a high availability perspective? If you get to your disk flows out or something right on a replica. So you have higher impact to the end customers that you're serving. So you want to ensure that you're driving the momentum through, you know, multiple clusters of Cassandra depending on types of use cases that you're serving on top of it. I think we have time for one more question here and then I will get the other additional questions over to you. So what model formats are supported in the multi model feature? Yeah, so so the multi model capabilities. So underneath what we, what we offer is, you know, we support the Json capabilities on top of the SQL language, you can do through our graph engine, you can drive complex entities so you can have table one, table two and then sort of, you know, connect the tables to edges through our graph engine. And then on top of that search capabilities is that if you have a specific table associated with an entity, you can, the multi model is that you can traverse through that other complex relationships that you have on a different table and connect that table one at ease through the graph engine as well. So, you know, it's, it's really the aspect of taking the high level entities that you want to store and you can store them at ease and look at that data ease through materialized views, or, you know, from the application stack. So, and we want to ensure that, you know, the multi model of designing piece, you want to ensure that you don't overdo it. You do it to a degree where you don't end up reading like 10 different tables versus reading five tables because there's going to be operational efficiencies involved there. Well, thank you so much for this great presentation. I'm afraid that is all the time we have for today. Again, just a reminder to everybody. I will send a follow up email by end of day Monday for this presentation with links to the slides links to the recording. We'll get some additional information from data stacks for you as well. And thanks data stacks for sponsoring today. Thanks everybody for all the great questions and thanks for engaging in so much and everything that we do. We just love it as always and hope everyone stays safe out there. Thanks everyone. Thanks. All right, thank you.