 So good evening all, glad to see all of you here and thanks for joining us today on this call. So the topic we are discussing is our journey with DataSachs and Cassandra over the last six or seven years and also some, we didn't just want to leave it high level, some tips for users who are actively using Cassandra in their own environment, especially with focus on the recent DataSachs Astra deployment. So this is a team who worked on the presentation. So myself, I'm a co-founder and chief architect at Alfaori Technologies. We have Tim, who is the data engineering manager and our engine couldn't join us. So what is Alfaori Technologies, what do we do? It's a company founded in 2017 and we have about 400 plus staff, development centers in US, Singapore, India. We transformed the maritime, we built IoT platform for ships, we tracked sailing ships. We have platform where all data from various machinery onboard ships is captured. So there are some servers on the ships and there are servers on Amazon cloud. So the data is collected from various machinery and we transform it, call it, send it to the cloud. We use this data to drive various solutions for stakeholders, so which include modules which help ships reduce emissions. So there is a big try on decarbonization because we are more directive to reduce emissions and achieve net zero by 2050. So this is one of the objectives we have. We also do have various other modules which help things like predicting machinery failures or alerting users for, so we leverage AI and ML to a large extent to drive a lot of our solutions. So tech stack, we are on to Node.js, Java, Angular, Python, Amazon cloud, we use Cassandra database and recently we have moved to Astra. Data warehouse is on Snowflake for DevOps, Azure DevOps, Observability, Neuralink and CloudWatch. That's our stack. So just a summary of what we have achieved about 230,000 metric tons of carbon dioxide saved, 75,000 metric tons of fuel saved and $38 million of revenue saved for our customers. These are our products, so smart ship, smart voyager and ship arm. So this is in summary. I just explained in brief already, but there is a ship shown in the diagram. There are multiple ships like this. So we currently have about 450 ships using our software. So the software onboard is accessible to the crew. They can use it. The connectivity from ship to shore is over satellite. It's particularly bad. So we have to do a store forward mechanism to make sure that the data sinks correctly. Now on the ship, we collect about 3000 to 5000 data points every 30 seconds from various machinery. Now there's a lot of challenge in the shipping space because it's an issue area that are very little standards. So every ship is unique. The machinery itself, even though it's the same machine on two ships, they follow a different format. And we had to write our own parsers to parse this data. So once the data is retrieved from the ship, we have some parsers sitting there. We have a local database for the application to run. And even when the connection, like the satellite connection is offline, and then the data is synced to the shore. And on the shore side, the whole data is aggregated. And so a lot of our solutions run on like the optimizations and ML algorithms on the cloud, run on the cloud, because the compute on the ship is not sufficient to perform this analysis. Some of it, smaller ones run on the ship itself. So why did we go with Cassandra? So one of it, the main aspects is performance. So we do need a high write throughput. Hundreds of vessels are sending roughly 5000 data points. High read throughput, large number of concurrent users and service requests from even AI, ML algorithms and other services requesting the data. Cassandra is able to handle this with ease. High availability. Now we, of course, this is used by various stakeholders including ship owners, operators, to some extent, owners, operators and other insurance providers and few other stakeholders who make use of this data. So it need to be highly available. It need to work on off the shelf rugged hardware. We didn't want to procure very specialized hardware or anything on the ship because it has to scale and it has to be reasonably priced. It has to scale on the cloud for future. So even though you would have noticed like the number 500 ships is not large, but the whole shipping, if you look at it, there are roughly 55,000 ships of commercial ships which can use a platform like this. So we already have a decent market size and so that it's, you know, considering the standardization and all that it has been a challenge to even scale up to the number we are currently. So it's going up much faster at this time. DB replica. So considering the ship and shore disconnect, we had a challenge that on the ship and on the shore, we have to use the same database. This also has the advantage of limiting development or duplicate development for another database to be on the ship, which is lighter. So we had to use same DB on the ship, same on the shore. Now, which other DB can be the same? It can scale. It has to be redundant on the ship. So while we need, when we say redundancy, there has to be multiple nodes and for some of the other nos equal databases, you need a separate leader and the leader has to be highly available as well to have redundancy. So that would all increase the cost. With Cassandra, it's a masterless database. It can have a minimum three is what we need. And so that's the reason Cassandra suits this. Cloud agnostic, we can go to any cloud in the future. It's low risk when we started in 2017. Cassandra was pretty much accepted by leaders, mature technology. So that's the reason we went with Cassandra to begin with our Cassandra journey. When we started, we had roughly five engineers. Although currently we are 400, we still are DevOps and database team is pretty lean. So we hardly have five engineers altogether. So when we started one DB admin, we selected Mongo and Mongo wouldn't work for us. We realized it pretty soon. And so specifically because of the masterless requirement. And then we moved to having our own on-prem data center with Cassandra with our custom scripting for backups and other custom repairs and other operations were being managed by our own scripting. Of course, with the open source support. Monitoring was quite a dim entry at this time using CloudWatch. We had one product, roughly 50 vessels, a single three, like a three node cluster about 750 gigs of data. In 2020, we grew to 200 vessels, roughly 10 clusters, three to six nodes, several TBs of data. Still one DB admin. And at this time we have already switched to data stacks enterprise. The reason was we were finding it challenging to manage the cluster using our own scripting. There are some risks we are identified on security, people doing handwritten scripts, keys were hardcoded, scripts wouldn't sometimes work, things like that. So a lot of challenges. So we moved to data stacks enterprise. And then 2022, we are already 1600 plus vessels across different platforms, three products. So we have 20 plus DBs. And this is the time when we decided that we'll move to Astra because it's serverless, it scales according to our need. You don't need too many hours to be spent on creating and managing the cluster, et cetera. So that's where we moved to Astra. Now, what were some of the challenges we faced as a new user of Cassandra? Cassandra, everybody knows is easy to, is a very powerful database, easy to get started. But it becomes painful when you have to maintain it and when the data load increases. So we made our fair share of mistakes. The first set of issues we faced is poor schema misconfiguration, performance issues leading to failed queries. Unstable cluster, you know, nodes would go down at times. We wouldn't like if there was a heavy query coming in it would cause a particular node to shut down. And then the ops would have to look into why and bring it back on and all that. But all of this was impacting our business. We had random outages at times. And this was also partially because of our poor monitoring capability. So we did have some basic monitoring from the open source using Grafana and all that. But, you know, it was still not sufficient to the extent with the limited resources we had. You know, this led to extended resolution times. This also had business impact. Next was complex operations. So when you start, when you set up a cluster for Cassandra, you don't really think about all the other aspects which has to go along with it. So you have to repair the cluster. If a node is down for a certain period, you have to bring it up and repair. You have to make sure that all the nodes are in sync. So all of this was being done using our custom scripting. And this was resulting in wasted manoeuvres, security compliance, governance risk, again support, community support, which is somewhat ad hoc in nature. If there was an issue really which we couldn't identify, we had to go to the community and ask questions. And sometimes, you know, we wouldn't get a proper response or a timely response. And this also had a real business impact. So this is where we moved to DataStacks Enterprise. We had a session where we decided to refactor our schema to a large extent with DataStacks consulting. Support. So they identified or helped us identify some of the issues. They clearly understood our use case and we refactored some of the major blocks. And that led to a healthy cluster and eventually we moved into the DataStacks Enterprise product. So OpCenter is a big thing which comes with the DataStacks Enterprise tooling. And that helps you do monitoring and troubleshooting as well as cluster management. So setting up a new cluster, monitoring it, repairs, health status, alerting, all of this is taken care by the OpCenter platform. Ad hoc support, of course, 24-7 Enterprise support from DataStacks was extremely useful for our team. So if there was an issue, they would come online on video calls and support us through if there is a real hot issue happening, production outage, things like that, we would get immediate support from all different time zones, which was extremely useful. Now, recently we moved to Astra. Why Astra from DataStacks Enterprise? So one of the challenges, it says paper use, but one of the challenges we had was setting up these clusters for demo and sales. So there are a large number of clusters which are not really production grade, but Cassandra itself needs a certain minimum size and resourcing to operate properly. And so we would have a three-note cluster doing very little and we would have wasted these resources just. So out of the 20 clusters, maybe three or four are production grade, the others are wasting resources just for nothing. So but with Astra, you pay only for the amount of reads, writes, storage and data transfer. So that's a big advantage and we decided to go there. Rapid time to market. So setting up a cluster involves a lot of hours. Even with automation, you have to make sure the cluster is healthy. Looking into best practice, the security setup, certificates, everything. By the time you provision a cluster, test it and make sure it's at least a day to two days spent by one or two engineers before it's available. But with Astra, it's you just go and click, type some configuration and just click and it's provisioned immediately. A few minutes is all what it takes. Scalability. So Astra scales horizontally, beautifully, very fast. So you don't really care about scale anymore if your schema is optimized appropriately. There are some minor changes we had to do to adapt to Astra because Astra has to is to be used by a larger number of users. They have to put in guardrails and to avoid misuse or abuse of the system. And so that is in fact, a blessing in disguise for our developers to start following the best practices which actually makes the server or the platform stable for the future. So when we have more users, more scale and more clients connecting and making requests, all these guardrails help our system. So security and compliance. So with Astra, it's inbuilt, baked in. So you just follow their standard practice to connect and so there is data is stored at rest, data is stored in transit. And you can have your own custom key if you want. You can have VPC peering. So the data never, even while it's connecting it doesn't go over public internet. It connects through private tunnel. It's cloud native. So if you look at the on-prem, it really, if you connect a Cassandra cluster to a big data pipeline or some other service, so you have all the other components of the pipeline which can scale like, let's say you have Lambda, you have EKS, Fargate, all of that scales, but the database doesn't scale. Now this gives us the option to scale the database also. And so it's cloud native in that way. You don't have to think too much about sizing the machine and capacity planning and year over year spending time on all that. So fully managed. You don't think about a lot of the aspects of manual management repair, monitoring the nodes. So specific issues we get alerted by the customer support team. And when there is a real issue they are on top of it supporting us. So it has been excellent decision in our view to move. We have in fact saved some and also made a lot of our resourcing available for other tasks. So what's the result and impact? Less operational overhead, no scaling worries, low latency, high throughput, availability across workloads, replacing oversize clusters with serverless database, security best practices on each cluster, no variability by environment best in class support, no more tar balls. So one aspect we observed is when our deployment team sets up the cluster, if it is a production cluster they would have a certain standard for security or for configuration. But if it's a non-protection then they have a different take on the security needs. So all that is gone, it's all production. So Astra guardrails, so it follows sort of, so the difference between Astra and some of the other managed service providers like casepaces for example is what we have monitored, what we have noticed is that Astra is a bit more flexible to your needs. So sometimes you need X number of tables or you need some flexibility in the guardrails itself and Datastax team have been reasonably flexible to accommodate those needs which are really genuine. So it has helped our team even though there are guardrails they are not really tight as in if it's really needed and it's a genuine use case they are also flexible. Innovation, so over a period of time since we started using they have come up with different products. For example, CDC, Astra streaming, change data capture, vector search for RIG, et cetera. So in our case we are still starting to explore the vector search and some of our internal use cases but we are already using CDC and Astra streaming for a certain use case. So for example, we had a application where we needed to move data to Snowflake and so with Astra streaming we were able to directly connect Snowflake and move data out without having any external components. So just direct connect into Snowflake and some little bit of code it starts writing code as soon as data comes into a table that's available in Snowflake for you to consume. So that's just our journey and so we wanted to leave you with few tips for Astra. Some of our lessons learned and Tim will walk you through those. Thank you. Thanks, Irving. Yes, so as we evolve from Open Cassandra to Data Enterprise to Astra there's a few examples that we came through and we learned over it. Our use case mostly evolved around this time series data. So we have a few to share here. So from left, so there's a few options that we have tried. The focus requirements of our time series is that we want to minimize the cluster results requirements, right? Bias for the right performance. We want to make our schema future proof, right? Flexible. So with that we want to look at the option how we arrange the schema data model to support these requirements. So the first one on the left is, you know, like sensor to have name and values as a pair, right? So the first option is that each row will have a tag name and a tag value, right? So we try that. The good thing about it is that when you have few tags, right? The selection will be pretty quick and you know, it's sort of purpose. When you know what tag that you want to get, right? It gives you directly that value because it's right there in the row. The downside of it is that because each row have just one tag, right? We want to support 3,000, 5,000 tags for each 30 seconds. So that's a lot of right, right? And that is not what we want. Another downside is that if you want to query across multi tags, right? You have to gather all the rows for all these tags, right? So that is not optimized, right? The next one is considered a flat model, right? So instead of each row have each tag, we put more tags into one row, right? And then the key we arrange by group source, right? So sensor with grouping them and source is another instant that belongs to that tag, right? So when you query, you can query by source name on the date that you want. So you get a row, you get back a list of tags, right? So that is a little bit better than the row. Each row have its tag, right? If it's more dynamic, you can get more various tag on one row. But then the downside of that is that we have a lot of tags for each vessel, so it's end up like a lot of columns, right? On the table, and that's another downside. We try another one is collections, right? Instead of have a flat table, a flat row, we try to do like map collections, where you have a pair of value inside. You can even put index in the map. And with that you can, you know, instead of having, so you can overcome the wide table. You have only one column where you have map and value. So then when you query a row, you can look up by tag name. So good thing is you don't have the wide table problem, but then you have to deal with the collection, right? So when we go and test it, the performance result of having a collection, it doesn't really meet what we want, right? 3,000, 5,000 rise by 30 seconds, right? And then the last one we try, right? Instead of having a collection, we have a string representation of the map, right? It's in text. And that achieve what really what we want. It's simple, it's a compact structures, it's future proof. If ever you want to change that structure in that collection, string representation is free form. You don't have to change the schema. One thing about Cassandra, when you don't want to change schema, you want to go back history and you have to fix all of that data migration. That's a lot of problem. But the downside is that with the free form JSON string, when you read it, you got to unpass it into JSONs, on the clients, that's a kind of downside. But that's why the client side, they know what they want, they know what they put in, we know what they get out. So that's okay. I know the downside is that, yeah, if you want to get some specific tags out of that, it's maybe a challenge because you have to unpass the JSONs and get to that, right? So we chose the last one and here we did some kind of performance test on an option that we have. So the first one is insert for flat collections and the text string, right? We take about 20 tags and we test that and on the rise, you got the time that, what is the test we have? It's come down that the long text string takes only 49 milliseconds for write 20 tags, right? On selections, the long text string is the winner. It's only take my nine millisecond to read that string of tags, okay? And our lesson learned we have is that we have some tips on improving performance from the traditional OpenStack to Astra. With the traditional open, traditional Cassandra, we deal with the parallel queries by using batch inserts and then we have this range selection queries that you see the example below where you want to write using batch. There's a limitation that we have which Astra, we don't have to deal with that. We can have many inserts going in parallel and let Astra dealing with scaling on that. So that worked well. Same thing for selections, right? Instead of you do an entire statement, you can send multiple selection. Okay, so moving on. There's some tips on improving cost efficiency. So we have usual on, we have too many read and write and updates. We have a lot of data size, right? We have different approach by doing caching. We recompile the Astra into a file, right? Instead of reading on the same table repeating read, we do computation and we write to S3 file. And we have an API just open that S3 file and streaming back. We also done compression for last size of data. Last one, data migration strategy. We tried different approach using CDM Spark DS4. And there's useful for each, there's limitation for each also. Okay, so for CDM, it's useful when when doing both time series migration. You don't have to do a lot of coding configurations. It's fast, reliable when you do transfer data transfer. But the downside of this is we're trying to use table columns which use a defile type when the destination key space are different, right? Yeah, I think we're running out of time. We leave this on the slide. You can look at it and if you have any question, we always be here. Okay, thank you.