 Welcome to databases in the microservices world. I'm Robert Richardson. Here's the part where I tell you, I'm definitely going to post the slides on my site tonight. I'm sorry. Okay. So I'm definitely going to post the slides on my site tonight. Except for you're going to go check tonight, you'll check tomorrow, next week, next month, and ultimately you'll get tired of waiting, which is why I've posted my slides on my site right now. You can go to robrich.org slash presentations. Let's take a look at that. Robrich.org slash presentations. Right here, databases in the microservices world. This is the slides that we will see right now. Feel free to get ahead, feel free to heckle me about the things I haven't said yet. But databases in the microservices world, that's what we'll look at today. While you're here on robrich.org, you can click on about me and see some of the things that I've done recently. I'm a Microsoft MVP. I'm a friend of Redgate. AZ GiveCamp is wonderful. AZ GiveCamp brings volunteer developers together with charities who otherwise couldn't afford software services. We start building software for them Friday after work. Sunday afternoon, we deliver completed software back to those charities. Sleep is optional caffeine provided. If you're in Phoenix, come join us for the next AZ GiveCamp. Or if you'd like a GiveCamp in your area, hit me up at Rob underscore rich on Twitter, or email me by clicking on contact me up here in the email, and let's get a GiveCamp installed in your area too. Some of the things I'm particularly proud of, SQL source control basics, minus chapter eight. I worked on Gulp in version two and version three. That was a lot of fun. One of the things that is really cool, I replied to a.NET Rocks podcast episode. They read my comments on the air. They send me a mug. Woo-hoo. That's my claim to fame is my.NET Rocks mug. Let's dig into databases in the microservices world. Databases in the microservices world. We talked about this guy. Let's look at application architecture and move our way from applications into databases. We took this path from spaghetti through lasagna. We started building interior architectures where we separated the various tiers much like lasagna. Then we moved into ravioli where we have lots of little pieces, lots of little microservices. This is the methodology of how we got in our applications from a monolith, a big ball of spaghetti, to now microservices that accomplish simple things in really elegant ways and are streamed together. We took this move because deployment is no longer hard. In the monolith experience, deployment was really hard. We wanted to save that up and do it as infrequently as possible. In time, we would build our tiers, but those tiers were still deployed as one piece. Now we've moved more towards microservices where each of these different services now communicates by calling the other service. Some services may choose to own data. Some other services may just call out to other things and do computations themselves. So we moved from monolith to microservices and applications. Each microservice does one job. It's easy to replace if ever that code changes or the purpose changes, and we can scale that microservice independently from other things. We did this because deployments are no longer hard. Ultimately, one of the tenants of microservices is that it should own its own data. Usually when we're doing microservices, we end up with something like this. We have all of our microservices doing all the things. Our application is well-architected, but notice that it's using one big monolithic data store. Refactoring our monolith into microservices worked out really well, but then why do we default back to our monolith database? Provisioning the database is still hard. We tend to create a database and leave it there forever, where in microservices, we can just blow away that container and start again. One of my best answers. The DBA said no. The DBA's purpose is to guard the environment and reduce the potential impact for change. Trying to create lots of little databases means that that DBA needs to work extra hard to ensure that that data is secure, that that data is backed up, that it's authenticated to correctly. Those are additional tasks that we're now burdening our DBAs with that may not scale as well as other things. Too many little databases. As we get towards securing our monolith database, we want to get to mechanisms where we can treat that monolithic database more like microservice databases. One technique that we can use here is a bounded context. A bounded context represents that portion of the data responsible for a particular task. For example, if we have a shopping cart type experience, does our shopping cart own the product catalog? Probably. How about our fulfillment service? Does our fulfillment service own the product catalog? Probably not. The fulfillment service needs to know what the product ID is, but it doesn't need the description, it doesn't need all of the pictures, the shopping cart experience needs those things. We can create a bounded context around the product catalog around maybe a little bit of product availability, and we can use that bounded context in our shopping cart to create this experience of a micro database without it actually being a micro database. Ultimately, our goal then is to get to, and I'm going to claim the term, micro databases, that mirror our microservices. A micro database would then be owned by the particular service that uses that database. This microservice is going to own this micro database, and this microservice is going to own this micro database. If this microservice needs some data from this database, it doesn't go directly, it calls this microservice to go get that data, micro databases. Each micro database is owned by the microservice that creates it, and because of that, the microservice can start to evolve and change that data structure as it needs to. The problem with a monolith database is, well, who owns that table? Can I add a field? Can I add a column? Maybe not. With micro databases, I definitely can. I know that nothing else is hitting this database besides this microservice. When this microservice needs replacing, I can replace or upgrade the data store at the same time. Micro databases, well, how do we do joins? How do I need to then call that microservice in a loop asking for the next record? Well, ultimately, when we want to do this type of thing, we can look towards service buses, perhaps Kafka. In this mechanism, I would create events that happen in my application and publish those events anytime my data changes. Other consumers, other microservices may choose to subscribe to those events and keep track of that data. An example is our shopping cart experience. As a user checks out of our shopping cart, we'll publish that purchase event. Fulfillment doesn't need to know anything about that purchase event yet, but perhaps billing does. Now billing is going to subscribe to that purchase event, and then they're going to go look towards the payment process service. The payment process service will then go try to charge their card, and if the charge succeeds, then that payment service will publish a order status update message that the fulfillment service can now subscribe to. Now it's ready to ship. Now the fulfillment service can go look for the products on hand and start to package the box. As we create this service bus, these events publish between things. Now the way that I join on data is I subscribe to the events that I care about, that will create the data that I need to join against. At that point, I'm not actually joining against your table. Rather, I'm subscribing to your events so that I can update my own data store. So micro database, one of the master questions, so who owns the customer record? Should the shopping cart own the customer? Should billing own the customer? They may have some sensitive data. Human resources, human resources definitely has really sensitive data if they're owning that employee or that customer record. How about sales? Well, that's probably where the customer came in. They came in as a lead and eventually they got converted into a customer. While in a lead or a prospect, the data validation is much different. I may have a name and a phone number, but I may not have a company name and address. But once they're a customer, I definitely need an address because I need a place to ship the content. So who owns that customer record? Well, arguably, none of them. We should create a new microservice that is that customer microservice whose job it is to own the account number, maybe the customer name, and probably that's all. That customer service is responsible for creating new account codes, for validating that that primary key is unique. And all the other services that we've listed here can have foreign keys into that customer microservice. So human resources will have the data associated with the employee, all that sensitive data that doesn't need to get out. Sales will have the purchase history associated with that customer. Fulfillment probably needs the current shipping address, but they don't need any billing information. And so that customer microservice can be that primary key. And then through event sourcing, we can get those primary keys into the other service or maybe at the point of checkout or login, we go grab that customer ID and we propagate it into the shopping cart micro database. So we've talked quite a bit about micro databases and how those play really well with microservices. Now we still have that DBA who's gonna be a little grumpy. How do we get to micro databases when we have so many different data stores? Well, we need to really leverage automation. We may look to Terraform or to Kubernetes, you know, configuration as code so that we can get these databases enrolled in all of the things that they need to. As we create a new micro database, we can enroll it in the authentication mechanisms and the backup mechanisms. And only when those are secure and fulfilled, then we spin up the micro database. If we have that automation around our micro databases, we can get to the point where we can spin up those micro databases without the DBA really hitting the brakes. Until we can get to micro databases, bounded contexts allow us to treat our monolith database as if it was a micro database. And that can start to get us in the habit of dealing with these data stores in this way. So micro databases, what's really interesting, now that we have lots of different types of databases, they don't all have to be the same kind of database. They don't have to be the same database vendor or database type. Because we have that automation, we can spin up whatever database we want. I really like this quote. They don't all have to be the same database vendor or database type. This was said by me right now. But what's really cool is now because we have lots of different databases, we can specialize those databases to the tasks that they need to accomplish. Let's pivot into looking at database types. Any questions about micro databases or microservices before we go there? I don't see any questions coming through, but if you find them, definitely add them to the chat box or you can tweet me at rob underscore rich and we'll definitely dig in your questions as they come through. So the different database types. We have SQL databases, relational databases, no SQL databases or document databases, graph, key value, time series, search databases, object or blob store, file store databases, column store and a type you've probably not heard before, new SQL databases. Let's dig into each database. We'll take a look at the characteristics, the pros and cons of that database type, the best use cases and some example vendors. I'm definitely not here to promote a vendor but if you have those search terms that may get you to more details about this type of database. So relational database, a SQL database. We can think of this as, well, a spreadsheet. Maybe we have an Excel spreadsheet or another type of spreadsheet. That's a table. Each row represents a unique record in that table and each column is gonna be a specific type of data. So I'll have integers in one column, I'll have strings in another column, I'll have everything in that column is that data type. And one of the real benefits of this relational database type is we have this SQL mechanism where we can query from that in what's kind of English like and that's pretty cool. Each row in the table has all of the fields and so in time I may choose to split that table into other tables and join on those tables. SQL is really good at joining those tables together, process of normalizing that data into different pieces. For example, if I have a customer and I have all of their addresses, homework, et cetera, I may have an address table and I may have a customer table. The address will have a foreign key into the customer table so that I can keep track of which customer this address relates to. Acid compliance, one of the big benefits of a SQL database is this acid compliance, atomic, consistent, isolated, and durable. Either the data transaction will completely succeed or it will completely fail. In no case will I have part of the data but not another part. Acid compliance is a real big feature of SQL databases. By contrast, I have eventual consistency which is kind of the other methodology of doing this. Eventual consistency, we might think of a bank statement. I can go to my bank right now and they may not have the charge that I just made or maybe they do but it's pending or that check that I wrote hasn't shown up yet but when I get my monthly statement, that is consistent. It has shown all of the activity that I did within that date range. It is eventually consistent. So that eventual consistency, I can't go select straight out of my table and get the exact result I wrote but in time, it'll probably be okay. By comparison, acid compliance means that it is exactly the way that I expect. So acid compliance is a real big feature of SQL databases. The SQL engine has been tuned for dozens of years and is really, really well high-performance. So relational databases. On the upside, it has a very strong schema. We have table joins. It's really a very optimized query engine and it's compatible and familiar syntax. On the downside, it's optimized for storage size. We talk about third normal farm normalizing our database. That was built when storage was a real big thing. Now I can get a terabyte hard drive as Richard Campbell says I can stick a terabyte of data up my nose and that wasn't always the case. So is database normalization still important? Maybe. One thing in particular SQL databases were usually designed before clustered computing was a thing. And so most SQL databases assumed that they're running on a single machine. So to scale, we must scale up. We must get a beefier and beefier machine to get at what we need. That can be hard. So when would I use a SQL database? It's a good default. If you don't have any other preferences, SQL is perfect. When you're in a monolithic database, your monolith database is probably a SQL database and that's a good choice. Some examples of SQL databases are here for you to continue Googling. A document database, a NoSQL database. Now there are other things that aren't SQL other than document databases. But when we say NoSQL, typically what we mean is a document database. We'll typically store a JSON document and each JSON document then has as much or as little data that we need to associate with this record. So we don't have tables anymore. We have collections and we have these collections of these JSON documents. Now what's interesting is this document can nest inner content. We can have arrays and inner objects within our document. And the schema doesn't need to be consistent across these documents. We don't need to have one standard schema. We can kind of add fields when it makes sense and remove fields when it doesn't make sense. Do I need a big array of addresses for a prospect? Probably not, so that address array may be blank. But as they become a customer, as they change their address a few times, we may have lots of addresses in that address array. Document databases are generally known for eventual consistency and they're often distributed across many machines. So the way I scale a document database is I just add more machines, add more community hardware. And that works out really well. So document databases. On the one hand, they don't have a schema. So I don't need to migrate the schema. I don't need to adjust it when I change my application. On the downside, they don't have a schema. So I need to build code into my application to be able to look at the document that I got back and compensate for missing data or query against data that may not be there. I often have a SQL-like experience around a SQL database, no SQL database, but it's not real SQL. We have to create SQL-like addendums to be able to query in nested objects and do interesting things there. I don't have transactions. I don't have joins. We may have synthetic joins, but for the most part, the way I join is I do a second query against the other table for rows that contain the keys that I'm interested in. Okay. On the upside, it is denormalized, which makes reading faster. So a great use case for a no SQL database is like a news site or maybe a product catalog where I need one query that will grab all of the pieces associated with that article. I need the article title. I need the body of the article. I need the author. I need the author's name, maybe a link to their profile picture, maybe the first few comments on it, the category that that news story will be in. I need all of that baked into that record so that I can grab it at one shot when you hit the URL. Now, that does mean if I'm updating the author's name, I need to go updated in all of the pieces that that author has written. But we've chosen here in a no SQL document database to optimize for reads. And so we accept that that denormalized data will make writes a smidge slower. Next up, graph databases. So graph databases are really great at highlighting not only data, but relationships between data. Graph databases allow us to do things like friends and friends of friends. And it creates this really elegant query expression for doing those joins against relations. So I can not only create those relations, but I can also store data associated with those relations. And that's really cool. So down here is an example of a query that allows me to query into a graph database. I believe this is Neo4j. And it starts looking at those relationships and the direction of those relationships, which is pretty cool. So how is this different from a SQL table with two foreign keys? Well, I could simulate a relational database or simulate a graph database with a relational database in this way. But what makes that relational database really elegant is that relationship syntax, the friends and friends of friends type things rather than, I don't know, inner joining to the 27th degree or left outer joining another thing. That's where graph databases really shine is highlighting that relationship. So on the upside, it is great for those relationships. On the downside, it isn't great for those other things. It may do okay, but it really is optimized for relationship mapping. So a great use case for graph databases, messaging apps, social networks, recommendation engine, where I wanna find related products, those can be great for graph databases. A key value database. A key value database is great for configuration details, for caching, it's when I have a key that represents the uniqueness about this thing, and then a big blob of data. This might be a JSON blob, this could be HTML, this could be binary data. Ultimately, the contents of the blob are irrelevant to the key value database. That key value database just has a unique key and some data. What's interesting here is that the data doesn't need to be the same between keys. So it's great for configuration where one object might be really heavy and the other one might just be the name. So key value database. It's really great for get by ID queries, but it's really not great if I need to do any, where value contains type of thing. I can't really query by value. Rather, I would read each key, I would deserialize the value using whatever technique put that value in place, and then I would query against that value to do some kind of inner join thing. Bummer. So key value databases are perfect for configuration. They're great for caching. User session data might be another good use case where I have a unique key, and then I have a bunch of data associated with that that I never need to query that data directly. Rather, I just query by ID and I get the entire blob. Time series databases. Time series databases are great for ingestion of event-based data. Time series databases have windowing functions where I can grab a window of time and I can look at the content within that. What's interesting here is rather than selecting rows and columns and assuming that there's only one value in that cell, as I do this windowing function and I get that column, the value of the cell is all of the data in that time range. So now that I have that bucket of content in this cell, I can do things like min and max and average and some things that aggregate data within this time series window. So time series databases are great for those windowing functions when I need to do event-based things. It's not great for other things because it's really optimized for that time series use case. So time series databases. Because I have all of the data within that time window and it can be a rolling window, five minutes, two days, months, I can find really easily find highs and lows within that. I can find other data within that time window. So it's great for event-based things, sensor data, stock trades, mobile device analytics, that content that streams in that is very time series dependent. Text search databases. These are really great at focusing on querying. Now with a text search database, we might think of Elastic as a great example here. My goal is to find when this word appears in that document. So unlike traditional databases where we wanna store the document, in this case, we kind of wanna separate that document into all the words and then index on all the words. If we squint real hard and this really isn't a great analogy, we can think of this kind of like the index over the top of a SQL engine. Still not a great analogy, but that'll kind of get us through this concept. We think of this as an inverted index. So text search databases. They're great when I have unstructured data and I wanna query across it. Some text search databases actually inflict the schema during query time rather than ingestion time. So that can be really, really helpful in pulling out the content in this way. One of the downsides is stop words. It and the, these kind of words that don't really have contextual meaning in most cases, but these are varied language and culture dependent. So if I'm trying to create stop words for all cultures and languages, I can really easily get false positives. For example, you know, A doesn't really have a whole lot of meaning. A thing, a shoe, a table. Well, what about Avenue A? Suddenly A does have a whole lot of meaning. And so stop words can be really problematic. If I don't use stop words, well, now I found the word A a whole lot more times in my document than perhaps words that I actually care about. So I can inflate my index artificially just because I'm not handling stop words. So text search databases, they're great for searching through big text documents. They're not great for transactions. Examples of text search databases. We talked about Elastic. Splunk is another good piece here and they're great for finding matches within documents. Object blob stores. We can think of this roughly like a file system. The key is the path and I use the path loosely because a great object store is S3 where the key is the URL to the bucket or the URL inside the bucket. The value then is a big binary blob. So these object stores are great for storing lots of data, images or videos or database backups. Ultimately, similar to key value stores, I can't really query by value rather I query by ID. And so if I have a thing that is really easily indexable by ID and I have a big binary blob that I don't really care about, what's in it until I grab it, then that's perfect profile picture images, database backups, big video files. That's what's great here in these object file system stores. Ultimately, if I do wanna query by value, what I end up doing is I end up reading each value, deserializing it or uncompressing it in the way that makes sense for that value and then searching through that value with another tool. Definitely not fast. At that point, I may choose to create an index in another data store that allows me to query into this. So object stores are great for big things that need storing, but not random access. S3 is kind of the canonical example here, but Azure Blob Storage works out really well. MinIO is another great example or the NFS drive that you have on your network could be an object store. Column stores, column stores are really interesting in that they kind of take the methodology of SQL tables and kind of put it on its head. As we look at relational databases, typically here on the left, we have rows. Now, each row represents the data in that record, but if we were to switch it and store it in columns, well, what if we had a column that was, I don't know, the countries in the world or the states in the country? There's probably a limited number of states and so we can simplify this list quite a bit. We can compress it a lot. So if we have our data in column stores, we can do analysis really, really quickly. We can quickly seek to that group and then we can bulk read all of the data in that column store. Column stores are great then if we wanna do analytics over lots and lots of data, where rows stores are really geared towards random reads and random writes, updating specific transactions. Column stores kind of fall down when you look for a specific record because you have to find that group and then you have to find all the related data within that group. Column stores are really tuned towards bulk analytics. So column stores, you get much, much better compression but on the downside, you have slower reads or slower random reads and writes rather. Column stores are great for OLAP workloads, analytics, and some examples of column stores. Next up, let's talk about new SQL databases. We can think of these as like SQL plus some stuff. In this case, we're gonna look at SQL plus partitioned. A lot of no SQL databases use clusters of machines so that they don't have one big system that they have to keep scaling vertically. So in this example, we have MemSQL database, we have a master aggregator, another aggregator and you can query against these machines and then they'll forward it off to one or more leaf nodes that store the actual data in partitions. So here in this partition database, we have each of these as commodity hardware machines and so we're able to scale our database really easily by adding more machines. We still have all of the benefits of asset compliance and SQL but now we have it on a distributed platform. New SQL databases, we can think of this as SQL with an SLA. I want SQL but I want it to go fast. So new SQL databases, it's cheaper to scale because I don't need to keep buying a bigger and bigger machine. On the downside, if I'm joining data, perhaps I'm joining data from this partition to this partition, I may have some data moving across the network. Now, they're definitely optimized for ensuring that that isn't going across large portions of the network. You definitely don't want to replicate across regions in your cloud provider, for example. You wanna keep those partitions close together but that network latency may become a concern. New SQL databases though are great when you've kind of hit that window where you're waiting for the spinning wheel to go and you need it to go faster with more users or with more data. When you hit that limit with your regular SQL database, look to a new SQL database to get that extra performance. So hybrid databases. Hybrid databases are a really good methodology of kind of taking SQL and something and a lot of databases nowadays are kind of moving in this hybrid approach. So it's a SQL database plus it has a JSON column so now it's also a no SQL database and we'll extend our SQL capabilities to be able to query into that JSON data or perhaps we need time series windowing functions and so we'll add some windowing functions but primarily our database is a SQL database. We're seeing a lot of this even in the no SQL world. It's no SQL, but it also has SQL or it's distributed, but it also has... And so those methodologies where I wanna store some data but I want it to be mostly this other thing, that's where hybrid databases look really great. And a lot of database vendors are kind of moving in this hybrid approach to be able to get that key functionality plus a few other tentacles that maybe we're not perfect for but we can add that capability and now you don't need to reach out and create a completely separate data store for this data. So here's the money shot. If you've got your print screen ready, here's each of the database types kind of a very, very short description of that and what it's best for. SQL, no SQL as a document data store, graph, key value, time series, text search, object or blob or file system, column store and new SQL, SQL that's distributed. I'm sure I've left off some. So definitely tweet me or add questions in the chat to highlight the things where I've stumbled here or other database types that I should add here. So we moved from a monolithic database to a micro database and that was really cool. Once we got to micro databases, we could swap out different database types to be able to augment the use case that we needed and that was really great. Once we have the automation in place to be able to move to micro databases, then that is a really, really elegant solution. Until we have that, we might want to use bounded contexts over our monolithic database to be able to treat it more like a suite of micro databases. Micro databases, not all databases need to be the same type. We can choose the type that is specific to the problem that we're trying to solve and really optimize for that use case. Here's another money shot when you've got your print screen ready or you can grab these slides on robrich.org right now and walk through all of them. I would recommend use database as a service whenever possible. Let them handle the tasks of backup and point and time restore and ensuring the firewall is just so. Use automation if you've got more than one database server to ensure that you're subscribed to the security policies that you need, the authentication, the password rolling, the backup schedules. Choose SQL database by default. This is a great all around use case. When you have a need that is specific enough and you're willing to accept the trade-offs with other databases, then you can start to wander into other database types but SQL databases get you ACID compliance, they get you that maximum compatibility. SQL engine has been tuned for decades. And you'll choose other databases if you have that exact use case, if you have the concerns that exactly match that other thing and you're willing to take the trade-offs associated with that like losing ACID compliance or that SQL syntax. Databases can be really, really fun. And as you dig into moving your data stores into a more micro database type approach, I would definitely invite you to find the database that best matches your needs and not just fall back to the database type that you had before. What are your questions? I don't see a whole lot of questions in the chat, but we can definitely continue this on Twitter, tweet me at rob underscore rich, or you can grab these slides at robrich.org or you can join us in the open source databases Slack channel and we can continue the conversation there.