 Hi there, everyone. Thank you for joining my talk today. We're going to talk about all the databases, and we're going to discuss them all. So let's just write into it. So I like to have an agenda in my talks. I know some folks don't like them. They think they're too basic, but I do like them. For me, the type of learner I am, I like to have an overview or a high level before I go into the more technical information. So I like an overview. It gives my brain a basis of what we're going to talk about. So that's what we're going to do. So we're going to talk about the basics and database history, because we're going to talk about all databases, and I love history, so we're going to talk a little bit about history. We're going to talk about relational databases and use an example, Postgres and MySQL. We're going to talk about NoSQL databases. We're going to use, as our example, Apache Cassandra and FoundationDB. And then we're going to talk about graph databases, and we're going to talk about, as our example, Apache TinkerPop. Then we're going to go through some next steps. This is just a very introductory talk, so it's going to help guide you on where to take this learning even further. So, and just one thing before we jump into, here, let me go back one, before we talk about me, of course we want to talk about me, but for each one of these database types, we're going to explore, like I said, a little bit of history. We're going to show an open source example of that type of database. We're going to talk about when to use that type of database and when not to use that type of database. And we're going to talk about a bit about, like, the open source communities that are around these databases, and actually even ways that you can kind of contribute and continue to learn. And again, just give us thinking and talking about this. So, let me introduce myself first, now that we kind of have an idea of what we're going to spend our next 40 minutes talking about. So, I have a Master's in Computer Science from Santa Clara University. And before that, I actually have a BS in Biology that I got from the University of Washington. I worked in Silicon Valley for over eight years now, many different companies from big, very big companies like HP and Teradata, to even very small startups with only 30 people. So, in that journey, I've worked on five different databases. So, I've worked on three different proprietary databases, such as that HP and that Teradata. And then I've worked on two different open source databases, such as Apache Traffodian and Apache Cassandra. And even from there, I've worked on two different distributed systems, which is Apache Spark and Kubernetes. So, I'm well versed in the different types of databases and the different types of distributed systems. Also, some of you may know me, every once in a while, I get someone who comes up to me and says, you're my teacher and I have no idea what they're talking about. And that's because they have taken the Udacity Data Engineering course. And I taught the data modeling portion of that course. So, hopefully you know a little bit about me and some of my credentials of why I'm here to talk to you about these things today. So, what I also wanna talk about is why should you care about this? So, first and foremost, for me, I find the technology itself around databases, which doesn't sound very cool and interesting. It actually really is. So, I took quite a few years ago, a Stanford database course by Professor Jennifer Wittem. The course is still available. I actually link it in the slide notes. It's a great course. It gives you, we're just lightly touching the surface. She goes into such depth. It's an amazing class. I'm so glad that I took it and it kind of started me on this journey. And it was free online. It was one of the first free online courses and it's still free online today. It's just a little bit different than it was before. I was just checking that out the other day. It's a little bit broken up, a little bit differently, but the content's all there. But it's really all about the data, right? That's what's important here. So, where your data is persisted, where your data is being analyzed, how your data is being quickly served to you, this is all done by the backend and the underlying databases. And then also, your job depends on it. So, that sounds a little dramatic, but it's very true. So, for example, if you're a full-stack engineer, you have to know how to be able to interact with these databases. It's just part of the stack and you're gonna have to know how to interact with it. And depending on the type of database that you're interacting with, you may get to choose that. You probably don't get to choose that. And so, you need to understand some of the basics of that so that you can interact with it and your application properly. Also, for DevOps, I mean, it's pretty obvious. You're doing the maintenance work. You have to have an understanding of these databases so that you can actually, most of the time, when your application engineers are filing issues with you in tickets, you're gonna have to explain to them these high-level concepts so they understand why they can't do the certain things they can do with that particular database. Also, for folks in analytics and machine learning, like machine learning engineers and data scientists, again, this is something that's important to you as well because this is where you're gonna be grabbing your data from. And so, because of that, that's gonna make implications on how you're building your models, how often you're bringing in that data via FATCH, via streaming, et cetera. You're gonna have to understand these different databases. Also, at some point, like I said, your job depends on it, you may have to choose one of these databases for your project. And so, you wanna make sure you have a solid understanding of the different types of these so that you can choose the right one for the right use case and for the right job that you're trying to do. Okay, so let's talk a little bit about the history of databases. So this is this timeline here that I'm showing you. It's a little different than ones I've kind of shown in the past. I have a photo credit here. I really like this because it just kind of makes, it makes you stop and think a little bit about how what's being represented here and it really gets you like just really paying attention to it. So that's why I really liked it. But as you can see here, that relational databases started being researched and developed. And so it shows the time of popularity for relational databases is around like the 1980s, but they were researched and developed long before that in the 1970s at IBM. And like I said, you can see they got to be very popular in the 80s and 90s and they're still very popular. Don't get me wrong. So this doesn't mean that these are now as they trend down in time that these are still not hugely popular because they really are. So relational database management systems otherwise known as RDVMS are these powerful systems that are generally very large systems that have, they don't generally have easy ways to scale. They certainly can't scale horizontally. And when they're scaling vertically, they can do that, but it's difficult. So because of that, you actually have to have data modeling concepts around that, that kind of limit your data. You can only have, not that they're small, but you need to make sure that you don't have a lot of data redundancy, it affects your data model. And you're wanting to make sure that you have data integrity. So you want to make sure when I say data integrity that your data is only in one place. And also you want to make sure that you're reducing the actual space because you're not able to scale out horizontally only vertically. So you want to make sure that you're taking that for account. So with the exploding of the internet in the 2000s, that's when you saw no SQL databases really starting to get traction. They had been in research and development long before that, but you really started to see them get traction as an internet explodes and data explodes. So there's a variety, and you'll see here in this graphic, there's a variety of different types of no SQL databases that help you scale out horizontally, like I was saying before, and help serve your application with much to get that data much quicker with lower latencies. And so the different types of no SQL implementations, as you can see here, are key value, column oriented, document oriented. And also as you kind of see is a little, it's also no SQL database, but it's a little bit different which is a graph database. And we're going to talk about that as well. So like I said before, we're going to focus on a catchy Cassandra and FoundationDB. FoundationDB is a key value database and Apache Cassandra is actually a partition rose store, which isn't listed here. So most people call it a columnar. That's fine. It's really a partition rose store. But for the sake of this, we'll just put it in the columnar category because most folks do. So then also you can see the graph databases. They've started to become more popular here in recent years because all around analytics, right? It's all about understanding the relationships between your data. And we'll dive into that here in a little bit. But now that data is really at the forefront of what we're doing, it's not just about persisting the data or serving some data to a website. It's about also analyzing it. And so that's where graph databases come in. So that now that you know a little bit of your database history, we're going to now start talking about each one of these databases and really the reaction to the error that they were in and help us shape our understanding. So, and also in this talk, we're going to focus on open source. So open source has been proven to have one. And I actually link an article here as well. It's really proven to win in the industry. So companies feel safer investing in a technology that has a strong community behind it. They don't want to be locked in. That's why they care about using open source. And it's easier to hire experts as well when you're talking about open source technologies. Also, companies like to have a little bit of say over the roadmap of a particular technology as well, which is what you get with open source. If you contribute to it, you can become part of that community and help shape the way the product actually goes. Whereas when you're buying a proprietary product, you have to be a very large customer to have that kind of sway, right? Also, I just, here's just another note about open source and the community. Communities are amazing. It's not just the contributors who are committing the code, which is also incredible, but there's so much support there in training, tutorials, documentation, mailing lists, all for free. So many of the communities I've been a part of, especially in the Apache projects I've been a part of, you can write to the mailing list and post questions to get support from the community. And generally, I mean, there's no SLAs around that, but you generally get people responding to you within hours, just out of the goodness of their heart. So it's really impressive and it's really great to be around these type of communities. All right, so let's jump in here to our very first database. And we're gonna start talking about relational database. So this is the one that you're most likely to be familiar with. Like I said, in the very first database course I took, of course it was all around relational databases. So like I said, when it was developed in the 70s at IBM from Cod and his team, they actually developed not just the technology, but the 12 rules of what makes a relational database, a relational database. So I really focused here on rule one, which is all information in a relational database is represented explicitly at the logical level in exactly one way by values in tables. So here you can see it's a little bit small here on my slide, but you can see we have our relation, which is our table. Then we have our attributes, attributes of whatever we're trying to represent. That's normally our columns. And then our tuple, which is our row, right? So from there, one of the other rules of having a relational database is it must have asset transactions. And asset transactions are around atomicity, consistency, isolation, and durability. And we'll just talk, let's just a second, let me just break down each one of those for you. So really think about asset transactions are all around data integrity. It's making sure, thinking about a good example of data integrity and asset transactions is around like a bank transfer, right? If I'm moving money from one savings to checking, that all has to happen within a transaction. And one fail swoop has to happen because if anything gets messed up along the way and it's not in a single transaction, I may have, you know, it deducts the money but then doesn't put it into my checking, right? And so now I'm just lost this money, right? And banks can't have that and we can't have that. We can't handle that kind of stress. So that's why you have to have an asset transaction. So asset transaction, like you said, it makes sure that every step along the way is correct and it guarantees validity even in the event of errors or power failures, things like that. So it's a set of operations to make sure that transaction happens. So with automicity, it's all around all or nothing, right? So either it all happens or it doesn't and it rolls back. With consistency, the data has to fit the rules of the database and the column. So if I'm trying to insert a string into a bullying column, it's not gonna let me do that, right? It's just gonna kick that out and say, yeah, no, sorry. With isolation, this is around the transactions are done as a single unit and not in any particular order. So just because I submit one query and then a second query, those are completely independent. And from there, with durability, that's really around the transactions are saved or persisted. So after the transaction is complete, I could go ahead and pull the plug and my database could crash and go down. But when I boot it back up, because that's been persisted, it will be there. So relational databases, so not only do they have these two rules, they also use SQL as their primary query language. So that's what you're gonna see most of the time. And here's just a little bit of SQL that I wrote for you that we're probably all familiar with. Something like select star from my really cool table or the conference equals virtual. And that's gonna get me back a result. All right, so let's talk about the two examples of relational databases here. So first off, we have Postgres. I call it Postgres. It can also be called, I've heard it called Post-CQL or Post-QL. There's a lot of different names for it. I call it Postgres. I was looking at the documentation and it sounds like there's some debate on what it's called, but you can call it in any one of those. So Postgres, and now here I'm gonna go into a little bit of comparison between the two because these two technologies have a lot of similarities. And then they also have some differences. So I just wanted to highlight those differences. So Postgres is developed by the Postgres Global Development Group. So there's not just one company behind it. They're not backed by any particular one company. It's just a lot of developers coming together to collaborate on this system. And so it's actually, it's not just a relational database, but it's an object relational database. And we'll kind of dig into a little bit of what that means here in a second. And then MySQL, it was actually originally developed at Sun Microsystems. And when Sun Microsystems was bought by Oracle, now Oracle actually maintains MySQL. So there was actually multiple forks of MySQL after the Oracle acquisition. So a really good example of that is MariaDB, which was a fork of MySQL right at the acquisition from Oracle. So MySQL is actually, from when I was reading, doing some research, it's significantly more popular than Postgres. In this report, and I've linked it here, it says 39% of developers use MySQL. So that's pretty substantial, especially as a developer. There's so many tools out there. It's hard to get us, it's hard to get us on board on one thing at any one time. So it's actually pretty impressive. And it does have more third-party tools because it is more popular than Postgres. But let's go back to the idea of being an object-oriented database. So not only is it object-oriented, it's still a relational database. Like I said, both are very popular open source databases. At the research that I was doing at the time, Postgres for quite a bit of time had significantly better performance than MySQL. But over the last couple of releases, MySQL has really kind of decided to focus on performance. And so they've been able to bring up those performance numbers. So from what I was reading, it seems like the performance is actually kind of comparable now. Both use SQL as their primary language to query with. Postgres has kind of less of a user basis we talked about before because it's not quite as popular. But it does have functionality, the different functionality than MySQL because it has this object-oriented ability. So it allows for functionality of like more complex data types than just rows and columns than MySQL has. So also I read that read workloads can be a little bit because of this object-oriented nature when you have a heavy read workload, sometimes Postgres, the performance is a little bit degraded over MySQL. So it's something to consider. So this isn't really a statement on either which one of these to use. I'm just kind of highlighting how while they do seem very similar there are a few differences. I also, I've linked here in the speaker notes and a nice article that compares them as well. So just a word, since we're talking about relational databases and databases in general, it's good to just have a word on O-L-A-P-O-LAP versus O-L-T-P. And I always say that wrong. I always want to say Olaf instead of O-Lap. Olaf's like from Frozen. But anyway, one is the online analytical processing and one is online transactional processing. So you can have relational databases or even non-relational databases that kind of fall in either one of these two categories, these larger categories. So you use the analytical processing for doing analytics, for doing a lot of ad hoc queries that are really optimized for reads. You're not really looking to do high rights. The data normally is loaded in batches, not necessarily, but at some places I've worked that's kind of the lens itself to that. And you're gonna have a lot of joins, bringing tables together, bringing information together. Now the online transactional processing that has normally less significant complex queries, but you're gonna have many, many, many queries, rapid throughput. So lots of read, insert, update and delete. Okay, so when should I use a relational database? So first and foremost, if you want to use SQL, standard SQL, ANSI SQL, there's some relational databases out there that have the ability to put maybe a SQL layer on top or things like that, or they have languages that are similar to SQL. But if your skills really rely with SQL, and you kind of also, some of these other things I'm gonna list as well, you don't wanna just use, oh, I only know SQL. So even though I have a huge workload, that needs to be in multiple regions and have high throughput and all these types of things, but I only know SQL, so I should just go with that. Well, no, but it's something to consider. Also the ability to do joins and aggregations and analytics, that's probably when you wanna consider using relational database. We haven't talked anything about joins. If you're familiar with SQL, you're familiar with joins, but it's basically the ability to take two tables and to join them together and have all that information on a common key that exists between the two tables. So it's actually one of my favorite database jokes, and yes, there are jokes about databases. So a SQL query walks into a bar and the query walks over to two tables and he says, may I join you? So, okay, I'll just leave time for everyone to just be laughing their heads off, right? So funny, but all jokes aside, joins are actually very costly in resources and they do slow down performance. So no SQL databases, they don't provide joins for that very reason. And we're gonna talk about as we, if we move into no SQL databases, we're gonna talk about why that is. But, and you're gonna have to do your data modeling to kind of work around that as well. So they also, they don't provide aggregations or you can do analytics on no SQL databases. I don't wanna say that you can't, but it's just not really the use case for it. So generally also, when you have smaller data, so you don't have big data, you just have smaller data, relational's probably gonna fit well. If you need a lot of flexibility in your queries, so you need to be able to do ad hoc queries, you know, your manager comes down and he says, I need to know X, Y, and Z, you know, right now. Then you probably wanna lean towards a relational database where you can just, you know, fire off that query and wait for the results to come back. As we talk about no SQL, you'll see how that's not impossible, you can do that, but it's just a little bit more difficult. If you need asset transactions, so if you need that consistent data, you can't have eventually consistent data, then you wanna stick with relational. And also just, as we've seen, it's pretty simple. There's a lot of simplicity there. So, you know, you might wanna lean towards that if that's what you need. So when not to use our relational. So if you have a large amounts of data, if you have a need for high availability, so with relational databases, a lot of times you can have a single point of failure and need to have like a hot swap. You'll have like a primary and a secondary and the primary is functioning and then if it goes down, the secondary comes online. The secondary will come online, but that takes time. So you're gonna have a tiny bit of downtime, but you know, when you have a tiny bit of downtime, depending on, you know, your use case, that could be very significant, right? So you wanna, if you need high availability, you know, your database never goes down, then relational, you know, it depends. You could make it work. Like I don't wanna act like you can't and like this probably some of you out there are like, I make this work. I have high availability with my relational database. And that's true, but just as a high level talking point, the high availability generally comes with no SQL. So if you need a higher read performance, so like we talked about with joins, you know, ACID is great. Those ACID transactions and joins and all that, but it will slow you down. So if you need flexibility in your schemas, so relational as we saw, you know, when you create that table and you create those columns, there's generally not a lot of flexibility. So with no SQL, as we'll see, you know, you can actually add columns only for rows that need it. And so you're able to save on space that way, whereas relational, you're gonna have to, you know, provide a value for that if you have that, or you're gonna have to put a null value, which in other databases you don't have to. Also the ability to store different types of data and data formats you can do with no SQL that you cannot do with relational. So as you're kind of seeing, as I take you through this journey, you're kind of seeing how one set of technologies worked for a period of time and then as things started to change, you know, now we're moving into the issues that there were with relational, they're basically solved in some way with some trade-offs with no SQL. But, you know, again, don't forget the reasons why you would need relational or no SQL is weaker, right? So like I said, no SQL was a reaction to the limitations of relational databases. So what you get with no SQL is that a high scalability, you can easily add nodes horizontally. You don't have to be trying to add, you know, memory and CPU vertically, right? You just add another machine, bootstrap it in and you're good to go. You're gonna have high availability. It's made for big data, has fast performance, and generally there's a very easy automatic data replication. So also with no SQL, your data is not necessarily in tables and you're gonna see that when we talk about the difference between Cassandra and FoundationDB. So they don't necessarily reside in tables. No SQL, it can be no SQL, not only SQL, or non-relational. All those terms are kind of interchangeable. So you may hear any one of those. There's many different data types with different data strengths, or different strengths I should say. So like what we talked about before, there's the columnar, key value, graph, each one of those is gonna have a particular strength for what you're trying to do. So the different data structures and the data modeling that you're gonna do, you're gonna do different data modeling. We didn't talk much about data modeling when we were talking about relational, but if you're familiar with relational, you know that you have to follow basically, you know, getting that data redundancy down, and normally that's done by achieving third normal form with no SQL, you actually don't wanna do anything like that, you wanna have denormalized tables, and you're gonna be fitting your data modeling to your application and to your queries as opposed to getting it into third normal form. And when you do that, it's actually gonna allow for faster operations. Also no SQL, a lot of them were, you know, they were built either for the cloud directly or with the cloud in mind. So many, many of them are cloud native. So yeah, let's talk a little bit about each one of these different types. So we have the document type of database. So a good example is MongoDB is a document database. And I'm gonna show you actually just a query that you can hear in the next slide, showing you kind of the difference between these two or these three. Key value, which we talked about foundation DB. And then like I said before, call in our family, I put Apache Cassandra, even though it's really a partition row store. But this graphic kind of outlines out for you kind of the difference between each one. Okay, so let's just take a look at, because I mentioned that these different no SQL databases, they don't use SQL, no SQL, right? So they have generally, they have their own unique query language per the different type of database. So they also have, and SQL and relational databases have this as well, but they also have a lot of drivers that you don't, that'll wrap around and you don't have to necessarily use the query language. You can use a driver to then interact with your application, you know, in Python or C++, et cetera. So if we just look at this query, select star from my cool table, right? So that's just gonna, you know, it's gonna give all this information for my cool table. So in MongoDB, you see that the syntax is quite a bit different from that. So it's DB dot my cool table. So my cool table that lives in a database named DB, I'm gonna do a find, and then in this particular case, because I'm doing a select star, which means select everything, that's what this syntax here is saying is select everything. Now, if I move over to Apache Cassandra, which uses the, it's called CQL, which is Cassandra query language. So I can do that select star from my cool table, but I have to have a where clause. So in Cassandra, the way the data is partitioned across the cluster and across the various nodes, you actually have to help it in pinpointing where your data resides by a particular value. So in this particular case, I'm just pretending that my data with Cassandra is partitioned by state, just because this is just an example. And then I'm gonna have to use that partition value or partition key to grab, you know, my information. So I can't just do with a cluster like or the database like Apache Cassandra, you can't just do a select star from and just get all your data. And the difference is when we talked about earlier about why relational databases are good for small data, because likely you just have a small amount of data, you can do the select star and you can get back all your data in a particular table. It's not really anything to worry about. With, when you're doing no SQL, we have these very large potentially, you know, thousands and thousands of nodes, clusters and your table is spanning all those thousands of nodes and all that information. You could just, you know, you're gonna be just flooded with information back to your application, back to your driver. And so because of that, Cassandra and other no SQL databases don't allow for that. So you're just gonna have to take a subset of that. And then, so let's just talk about a second about foundation DB. So now with foundation DB, they actually don't really have a query language. They only, they do have, and we'll talk about foundation DB here in a minute, they have a core and then they have layers on top of that where they add functionality and they do have a SQL layer, but I don't think it's highly used from what, from my research that I was doing. I just only just recently started working with foundation DB. It's pretty cool, but yeah, so it doesn't have that same query language that I'm kind of used to and uses this API. So as you can see here, you basically have to instigate a transaction and then from there, from that transaction, you take your table and then you're gonna unpack it. So again, you're doing that select star. You're gonna iterate over everything in that table and you're gonna get each, and remember it's a key value store. So you're gonna get, if I want everything, which is what I want in a select star, I have to unpack, so iterate over all the rows and then give me that key and that value. So it's very interesting to see each one of these, these are just three examples that I'm showing you and there's far more no SQL databases than that and they all each have a different language. So remember when we're talking about with relational databases, the nicety of having SQL and having simplicity, well with no SQL, not that these aren't, well, they're not simplistic. They are a little bit more complicated, but once you learn them and you get to love them, then it's very straightforward. So let's just talk a minute about Apache Cassandra. So it was actually donated to the Apache Foundation roughly 10 years ago. So it's 10 years old. It's supported by many different companies, not just just one single company behind it. Many different companies contribute to it, use it. Many, many customers. It has a leaderless architecture. So it has that, remember what we were talking about with no SQL, that high availability, the ease to scale, the fast reads and writes. It does use SQL, which we talked about. And essentially all the big apps that you have on your phone's home screen, right? Think about on your phone, you have Netflix, you have Twitter, you have Uber or Lyft, not so much right now in the times that we live in, but you definitely have Netflix and Twitter and all of those use Apache Cassandra underlying to serve those applications. So let's take a second to talk about FoundationDB. Again, like I said, it's a key value database. It was actually open sourced by Apple after an acquisition. So it was originally a part of an acquisition called Found, I believe the company was called FoundationDB. Don't quote me on that, but it was called FoundationDB. They developed this open source database. And then once they were purchased by Apple, Apple then continued to open source it as well. So it's open source today. Like I said, it doesn't have a query language. It uses that API instead. And like I was talking about with the simplicity, it's a little hard to get your head wrapped around it first. It's a tiny bit difficult at first, but once you get the hang of it, it's very powerful. And like I said, it has a layered architecture. So it has its core functionality in what they call the core. And then it layers functionality on top of that. Like for example, there's a MongoDB connector that is one of the layers. And then another layer is a SQL connector, et cetera. And what's actually unique about FoundationDB is that it supports asset transactions. So that's actually something to consider if the only reason stopping you from using a NoSQL database and getting all the benefits of that is for asset transactions. Maybe something you wanna consider is looking more in-depth at MongoDB, FoundationDB. So I think we've kind of already kind of talked about this quite a bit, excuse me. But when do you use a NoSQL database? So just kind of going over these high level points again. If you need that high availability, I wish I could spend more time and kind of talk about the architectures of how you get high availability with NoSQL databases. Because for me, that was really the key because when you hear a lot of these high level concepts sometimes it can tend to sometimes sound like marketing speak but what's cool about NoSQL is like once you dive into that architecture you see why a lot of these things are true and it makes a lot of sense. So maybe that's my next talk. There's some more architecture on NoSQL databases but if you have really big data, if you need linear scalability, if you need a low latency and you need fast reads and writes and you need flexibility with your schema. Now some of those, like for example, Apache Cassandra, it has some flexibility with this schema but it's not as flexible as other databases. So it really depends which NoSQL database you're using and how much flexibility you need in your schema. It's like for Cassandra, it's more flexible than relational but less than some of these others. Also if you have distributed users because you're able to scale out horizontally and also have ease of replication across all those different nodes. So you can have users coming in from a variety of places and getting that low latency instead of just one box somewhere here in California. And if you know your queries in advance for your applications because I kind of mentioned that briefly when I was talking about data modeling for NoSQL you have to know your queries in advance to use a NoSQL database. You can't really do those ad hoc queries. So if you have those and you know what you're gonna need to do for your application, then you're in really good shape. If you, you're not gonna know those you, they're gonna be ad hoc, that's where there's NoSQL. It really, it gets difficult to use in that way. Now also just a quick word on that as well. It sounds almost impossible when I say know your queries in advance. Like how am I gonna know all my queries for my application in advance? But actually when you sit down with your team and you really start detailing it out, just on a whiteboard, just having an offsite and talking about it. It actually comes into focus much quicker than you would think of the queries that you actually need. So it's something to explore. So when not to use. So if you need to use SQL, now there's ways around this like I kind of touched on before. If you do need those asset transactions and for some reason you don't, FoundationDB isn't gonna fit your needs, then NoSQL may not be right for you. You need the ability to do joins on those tables. You just need that more flexibility. You have to do, you need to add hot queries. And if you have small data. So I've advised multiple folks and customers I've worked with in the past, if you have small data, you don't really need the headache of using a database that's really built for big data. You could just go for something very simple like using Postgres or MySQL. You don't really have to deal with all this added complexity. Oh, so just a quick word of warning around a NoSQL. Cause like I'm saying, it's kind of a journey from relational to NoSQL. So be aware when moving from an RDVMS to NoSQL because like we were talking about with data modeling, you can't just, you can't just move over your data model and your tables just as is. It seems like you can because you can just do a create table and create the exact same table. It seems like, okay, I migrated all my data. Everything's great. It's more about the queries. You're going to have to do different queries on that data than what's going to be supported. Like for example, when I was talking about Apache Cassandra with that Wear Cloth. So there is a little bit more of a learning curve. You have to think about your queries and your application first. So it's not anything to scare anyone off, but it's just something to consider. So then lastly, I just want to touch here on graph databases. So here was just a really nice definition from Wikipedia. So uses graph structures for semantic queries with nodes and edges and properties to represent and store data. So with a graph database, and you can see here. So in the graphic, you see the node and then it has data within it. Like in this case, the name is Peter. And then from there, we have edges that go out from that piece, that node, right? So in this particular case, we can already see that the attribute on the edge is follows. So obviously this is a graph database around who your followers are on social media, right? So with a graph database, it's really all around relationships within between the data points, right? So exactly what I said, yeah. So the key is it's a relationship between the data. Dependencies between the data is very clear. So like for example, like you know the attributes between those that hook the data together, right? It's not always that clear in a relational or no SQL database, the dependencies between the data. So when you are trying to do queries to get relationships between the data in relational or no SQL, those queries can actually end up being extremely complicated because you're having to do extremely complicated things because your data is not represented as a relationship graph like it is with a graph database. So in a graph database, there's really fast ways to query, especially when you're trying to look about those relationships between the data and to retrieve that data. So going on a traversal or walking the graph as they say, and graph databases, I don't know if I would say all of them use Gremlin, they probably don't, but Gremlin is a very common query language out there for traversing graphs, and that's the graph query language. So here's an example of Gremlin here, and you'll see straight away the difference. So if I have a SQL query, like select star from Michael table where conference equals open source summit, and then Gremlin, you're gonna see it's a little bit different. So instead, I called it my cool graph, but it could be my cool table, it doesn't matter the name. Then you wanna, you're basically gonna traverse the vertices, you're gonna start a particular start node, and then you're gonna walk those vertices and you're looking for the, if it has the label open source summit. So like I said before, very complex SQL queries can easily reduce down to very simple Gremlin queries because of the nature of the data. So just a word here on Apache TinkerPop, which is a great open source graph database. It was started in 2009 in Los Alamos National Lab, and it graduated to a top level Apache project in 2016, has a very active community, as you can see here, just a little graphic I showed you from GitHub. And there's a lot of nice trainings as well and a lot of nice training using Gremlin. So when should you use a graph database? Like I said, it's all around relationships. So when trying to understand relationships, graph database may be good for that. When you need better performance, when you have very long, very complex joins, you can actually just walk the graph instead. Now it's not necessarily gonna get you better performance for all queries, but when it's relationship driven queries, kind of node in edges type queries, then you're gonna get better performance of the graph database. So if you're thinking about, do I need a graph database, or can I just use something else? You should really try the whiteboard test, get up on the whiteboard and start drawing out your data and the relationship between it. If it starts to look like a graph, then it might fit well in a graph database. So this is a really great talk and article and I've linked it here. Actually, it's kind of funny with our open source communities and all of our communities. I clicked on this and this article and videos along with this actually are a really good friend of mine who was, I just happened to pop on it and I was like, oh, there's somebody that I know and it's nice, someone you can trust, right? You know, it's good information. All right, so when not to use the graph database. So if you have disconnected data, if it's not relationship driven data, then or the relationships, even if they are there, they're just not important to what you're doing, you may not need a graph database. If you have very right, heavy workloads, you're gonna be writing a lot, then this is probably not right for you. If you're using it as a key value store, also really not really what it was intended for, go think about using a key value store database, right? If your data is not from a known point, so right, you need to kind of know where to start to start working your graph. If you don't know what that is or you don't think you ever will, it's probably not gonna work. Also, there is a little bit of overhead in creating the graph, adding the edges. I don't say I would say that necessarily when not to use, but it's just something I thought I'd bring up because it is a little bit different than what we're used to with these other databases. Alrighty, so just to kind of wrap up here. So relational versus no SQL versus graph, it's really not either or. Most organizations have all of these and honestly, most organizations have multiples of each one of these. So they'll have foundation DV, they also have Cassandra, they also have, because they all fit particular use case and I didn't have enough time here in my talk to go over in depth each one of those, but they all serve a very particular purpose and most organizations have each one. Each have their own benefits and drawbacks. So you really wanna make sure you get informed before choosing. So now we've just kind of talked about the high level between relational, no SQL and graph, but really kind of diving into now once you know that, the differences between each one of those types within those. They're all really easy to explore on your laptop. So that's kind of the cool and the benefit of open source. You can just download it and start like just exploring. And also just something to consider. You might wanna consider managed platforms that are based on open source. Those are nice ways so that you don't have to do all the operations behind hosting necessarily, these databases, something to consider. Again, make sure you do your research. Also always consider your use cases when you're choosing which one of these database. Reach out to other groups within your organization and reach out to the mailing lists and the boards to get some information about that and say, hey, does my use case fit this database? People love to be able to answer that. So what do you do next? So keep learning and make sure you get hands on as well. So I've just listed some resources here for you so you can just start diving in and getting more information. So thank you all for your time. I hope this was helpful and I really enjoyed pulling all this information together for you. It was a lot of learning for me as well. Thank you.