 Okay, if you Google for diversity, you get pictures like this. And it's a funny kind of diversity, because there are, actually, there's some skin color variation, but there's not that much actual diversity. And they look the same, they act the same. They don't have any different kind of, they don't have a fancy hairdo, they don't have anything special. So they are actually very, very similar. I do have a bit of a slight issue, one moment. Yes, let's start over. Yeah, long time ago, I was a young boy. And my parents had a philosophy that it's better to teach children to deal with dangerous tools than to tell them that I shouldn't touch dangerous tools, right? And I guess it worked, because I have still all my fingers. But this is the sprocket knife I got, and my sister also got one. And if you look carefully, you can see the problem with this sprocket knife. The problem with this sprocket knife is it has no corkscrew bottle opener. My sister's sprocket knife did have that, and I hated that. I was like six years old, so I didn't care about actually opening bottles. But I know a missing feature when I see it. So, and I knew it was silly, but still, but still. So, if it was up to me, I got this one. This one is hardcore. So that one is such a much better piece of equipment, because it can't do so much. Just look at it. It can do everything. I can't actually see the bottle opener, but I guess it's there. It must be there, right? So, in a way, the beauty of a Swiss Army knife is it can do anything. But if you try to do anything with it, it's horrible. It can do effectively nothing. If you open a can with it, it looked like the can was attacked by a shark. And if you try to chop a chopper, you can't even hold that thing. So, they are actually kind of silly. But at the same time, there's sort of a poster child. A poster child for feature bloat. For features to have because you want to have it. Not to actually do the thing. That's like boring. So, they seem to be very much into the feature bloat. And we, as industry, as software engineers, we make a lot of Swiss Army knives in our field. And it starts like you're really focused. You want to make it specific, too. You have really tight, tight specifications. You build it, and then the feature requests flow in, and you add it, and you add it. And while every feature is useful, it kind of dilutes all the other features. Because it will be a little bit less suitable for all the other stuff. So, at some point, it is nearly, it's like a Swiss Army knife, in the sense that it is actually good for nothing. So, a similar way of looking at this is this expression. We all know this expression. If all you have is a hammer, yeah, everything looks like a nail. And usually that is used to indicate that you should use the right tool for the right job. But it actually has a bit of a more profound meaning. It means that your thinking influences the choice of tools. But your choice of tools also influences your thinking. And if we move over to, if all you have is a Swiss Army knife, where do we end up with this? Because it does not really translate. But my theory is, well, it's not really a theory, because I've lived it. You end up with wanting a bigger Swiss Army knife. You always want more. And single-purpose tools like a hammer don't have that problem. You not look at your hammer, it's such a pity I can't open a bottle with it. Well, you can, but not in an elegant way. So, you're this kind of inevitable that you have this feature blown. And at some point you end up with something that is really silly. So, I'm actually, of course, not really talking about Swiss Army knives. I'm talking about databases, our old friends. They are everywhere. And we grew up with them. And if I could start with some history, when we started building applications, it turned out that storing data reliably and quickly is hard. So, we didn't want to do it over and over again for every application. So, we have a generic database that can store data. And we just take it, we can take that database and we can build our app around it. So, what happened then is that over time, those database turned into really expensive flagship products, like IBM, Hedwan, Oracle, all those databases. And they tried to pack in the features because they want to be the best database. And they still do, they still add more and more features. And the average application does not use even a small fraction of them. So, feature bloat, even there. So, what I want to argue today is that it's time to let that feature bloat feeling go. And to really focus on what is really what I want to do. Because in many ways it's so much nicer to have a single purpose tool. So, with that I'll start to introduce myself. I am Frank Leroux. I'm a bit of a technology hipster in the sense that I like to use new and shiny technology just before it's actually practical to do so. And that's what my colleagues around me always say. And I kind of know, but it is a strong motivator. Because if I want to do something with something new and shiny, I work hard. If it's boring, I don't. So, it is a human weakness, I guess. So, if we look at the traditional stack of application. Something like this, a database code in some kind of language and some kind of client. It's there must be millions of those out there, right? So, and if we are talking about those applications and we talk about the code, we talk about the database. We store it in the database, we get it from the database. And I think that is wrong because that's Swiss Army knife thinking. We need to get rid of that notion that there is one place. We put everything. I mean, it's a very compelling to do that because then you have everything in one place. But there are also reasons not to do that. And I want to talk about it here. So, a stack like this kind of derailed in the sense over the years because it got more and more difficult to change code. And for that, we invented the holy microservice. I'm not going to dwell here long. We know what microservice are, right? I'm going to... Okay. So, microservice, the idea of a microservice is that we slice a bit of our application and we only access that application through a networked API. Within that application, we can choose whatever language and technology stack we like and we can use any database we like. And what's in there is just a technical detail and we slice up our application so that we have all kind of blobs of microservices and internally, we can swap out different databases. We are much freer in our technology choice because it's not one big thing anymore. So, the application would look like something like this, right? So, we have here one server that uses Scala and a GraphDB, one that uses Node and a DocumentDB, and one uses Java with a nice old SQL database. So, that's completely fine and that's like what microservices promise, right? You can always pick the right tool. So, but if you look at presentations about microservices, mostly they end here. They say, now everything is good. And I believe that for a brief while. Because after that while, I started thinking about my day-to-day problems and then I noticed that they kind of avoid a certain problem. And that's that different parts of the architecture need the same data. But in a different, they look at it differently, but it is the same data. We need to share that data between the microservices, but the microservices are only API calls. So, that's not actually that easy to solve. Because if we look at an example, so we have an application here, for example, has a SQL database, has some code in the language, but it also has a part of analytics that does some aggregation or something about that database. And we have a UI and we have an analytics UI. Fair enough. So, suppose we want to go into microservice land and we want to split this. So, for example, if we use a microservice, we can use a different database. Maybe SQL is not that practical for analytics. So, we need to use a different database. But, that's also an example, an advantage story, is that we now don't burden the SQL database with when we use analytics but it has its own database. But there is a problem here, and that's around here in the middle. Analytics API. What's an analytics API? I made it up, but it's what you need to get the data over there, right? Because analytics is not something that's a small piece of data. That's usually you need everything. So, do you make an API call that you just get all the data and then do your analytics? Technically works. In practice, I don't really see that happening because it's basically what batches. But it is very much of a burden to the source database. And it is, if you look at this, generally you could say that we should have left it in the original service. So, you can also go more granular and that you can just query specific pieces of data with a list data also fine, but you have the same problem. You are querying a lot of data and actually the performance might be even worse. So, can we do this? Is this fair? I mean, it's a nice attempt, but it's not really a microservice in the sense that we are not abstracting away the database that's inside the original service. We're exposing the guts of the database all the way out. So, this is cheating, even though it will work this fine. But then again, you could also have left it in the original service. So, what we end up usually doing is that we have the two services and we have some magic. We have some tool that replicates the data from one into the other. So, those tools you probably know, like DBVisit, like Golden Gate from Oracle, usually expensive. And what they do is they take the data from one database and they put them in the other. Which is a valid thing to do, but that leaves out the fact that we might not want to use a SQL database here. And that's pretty limited. There's some leeway you can tune it a little bit, but generally you need to put the data away in the same shape you found it. And that is not good for analytics, generally. So, yeah. Another problem is, I addressed the first two, the last one, this doesn't scale. If you have two services, it's fine. Pretty okay. Four. And the rest, no way. You'll just spend all your time pumping data back and forth. So, there is a model that works. Event-driven microservices. Who knows them? Good. Good, good, good. So, event-driven microservices have really gotten a lot more popularity. Like anything new in our industry, it's nothing new. It has been around for ages, but now all of a sudden people talk about it. So, what's it about? Instead of having services that ask each other things, we have services that tell things. They push events. They say, this happened. And they listen to topics. And they hear what other services said. They don't care who said it. And they also don't care who listens to what they say. So, it's a bit more decoupled. And it's generally backed by a public subscribed bus. So, how would it look in our example? So, in our main service, we will just listen to all the change in the database. We will push a message onto the bus. And it will just post it there. And the analytics will just listen to all the changes. And it can build up its own database in whatever form it wants. Yeah, so that's basically what we do here. And this scales very nicely because you can plug in something else that also listens and does something else. But neither of these two need to know about that. So, that scales pretty nicely. So, for that bus, we generally use Kafka. Who have you heard of Kafka? Everybody. That's good. Wow. Nearly everybody. So, really quick, Kafka. A persistent message bus. So, if you post a message, you'll probably not lose it. In that nutshell, you can put a lot of messages through it. Thousands, millions per second. An important point compared to some other buses is that it's okay. If there are some fast clients and some slow clients. So, some consume it quickly. And some might take a week. And that's okay. And that's really powerful because especially like analytics, you don't want to slow down your application because you want to do analytics. So, you just have, if that analytics part gets behind for a day, well, so be it. It's not that bad. So, we use this for our clients. It's a small company. We work for sports associations in the Netherlands. It's around for a while. It's not exactly a green field application. We have a lot of regrets in our code already. We've been around 10 years planning competitions and everything. And all the stuff around matches. We have about a million players in mostly like football and handball and some others, 4,000 clubs, 40,000 matches every week. It's a spiky load, but it's predictable. Our traditional stack was an Oracle database with Java-based application servers and a whole bunch of clients. Nothing special here. So, we wanted to move from a club centric that the clubs did all their changes on our database to a personal level. So, instead of the 4,000 clubs, the 1 million players were our clients. So, that kind of implies quite a big upgrade in the amount of traffic we're going to get. Oracle was not going to scale. Or maybe it is, but we don't have that kind of money. So, we need to do something like this. So, what we started with was this. Exactly what we talked about. We have our Oracle database. We have the changes. And we have the Kafka bus. And here we insert those messages into a MongoDB database. Because that one scales a bit better. This was really easy and worked really well until the client applications started using that database. And they informed me that that database really, really sucked. Because we are using a relational data model, we copied one-on-one to a database that can't do joins. And that hurts. Because that's assumption of a relational database, that you can join all the data you need. The advantage of a document database is that you can put a lot more in a document, but we didn't do that because we were copying one-on-one. So, that's also not really great. So, we had a bit of an issue there. So, if we take this as an example, you have a SQL. You have some persons. You have some phone numbers and communication things that are associated. In SQL, you would just query both in one query. No problem. In MongoDB, you would want something like this. That you put the phone numbers, et cetera, inside the document. But to do that, we need to join those change streams from two different tables, because we don't... We need to join them, but we don't know when those change are going to show up. So, there might be millions here and millions there, and we need to join the right one. And in the meantime, we need to remember everything that happens there before we can join them and then continue on. Yeah? So, we can use Kafka Streams for that. That's a streaming engine, basically. And that does that. But it's a whole lot harder and heavier than you might think, because basically, you need to store everything that happened, and Kafka Streams uses RoxyB for that. But that's a huge beast. There's a lot of data there. So, it's actually a lot harder than it looks. And this is just one join. And if you have more joins behind each other, the documents get bigger, the load gets heavier. So, it's actually not a trivial thing to do. Easy as it sounds. So, if you look at our specific case, we have, like, half a billion rows of SQL data, but that turns into some pretty serious amounts of data to move around. And if we want to build that entire data set to our replicate set, it takes hours or days. So, once it's warmed up and every pending change is gone, and then it will just process all the messages, then it's fine, then it will be quick. But before that, it's a lot of work. Also, when developing stream transformations, it's harder than you're used to when you're just doing stateless code. If you're doing something stateless, like a web page that gets something from a database, you can edit your code and you press F5, and then you see if it does what you expected to do. And you can repeat that. But for this, if I change the code that creates a new set, actually, I should really redo the whole batch, right? So, that's not a very trivial thing. So, you can cut some corners, you can have nice small data sets, but it's a whole lot more work than when you are used, when you're developing stateless code. So, we went in production the day before yesterday. So, it was quite an effort to go here and prepare to say how great it is before you actually know. It behaves well. I lost quite a bit of sleep, but it works. Kafka Stream was in a really rough state when we started, and now it's much better. A problem is, though, is that the average database nowadays is not really used to being a team player. Like every database expects, I am all you will ever need, and I will do everything for you. So, that translates in the fact that if you want to capture all the changes from a database, for example, a SQL database, how do you capture that? It's possible. Every database has a way in to capture the changes. You can use triggers, or you can tail the archive log in Oracle, but it's never really an API. You only have to fight for it. And to stream the data in is also not something they really help you with. But now we've done the hard part. Once we have this infrastructure in place, we can start using fund database, because we have all the streams of data. We can grab some random database of the street, plug it in, and see what happens, because that has gone pretty easy now. So, I'll go through a small parade of databases, which might make sense. For example, Elasticsearch. We feed a lot of our data now into Elasticsearch, and that gives you the ability to search in an unstructured way. Something like SQL doesn't really like searching in an unstructured way. It can do it if you really force it to, but it doesn't really work that smoothly. Also, you reduce the load of your source database. And, often, users nowadays, they expect to be able to Google for anything. So, they want to Google-like interface within an application. You just write down some random words, and it will somehow find what you're talking about without saying what it is you're specifying. So, users kind of like that. Neo4j. Graph database. Graphs are amazing. It's really like a bit of a rabbit hole once you go into graph database, because most of the graph database people are a bit weird in the head, but they do really powerful things. And some analytics, for example, are really easy to express in graphs and not so much in relational database. For example, if we want to know how often you play against another person in another team, if you think about it as a graph traversal problem, it's not that hard. If you write down the SQL statement, it's two pages. So, even if it's not exactly something you couldn't do before, it's so much easier. Firebase. Who knows Firebase? Nice. It's sort of a back-end as a server. It looks like it's basically a JSON file in the sky. You have like the model of the database. This is a JSON file, and you can insert stuff in different paths inside a JSON document. And there are some really powerful client libraries that you can subscribe to certain parts of the file. And when you change it, your webpage will change or your mobile app will change because all the notifications will be done for you. None of these databases makes sense to use as your only database. Maybe for really simple cases, but generally you don't want to put your whole enterprise business in a JSON file in the sky. But for specific things, if you want to publish something, these things make it a whole lot easier. And now we have the time to do it. Another way of looking at it, sort of the inverse way of looking at it, if we take all the infrastructure we have now with all the database we've plugged in, you could just draw a box around it and give it the name, and then you have a database. And that's called a multi-model database. Microsoft is a nice one that just came out a few months ago. Cosmos DB. It's too early for me to tell if it will go anywhere, but it's a really interesting take on where we are going with databases. And maybe that makes a lot more sense than the stuff I said, but I'll believe it when I see it. So in closing, I have argued today that databases are better as a team than as an individual. And you need to realize that going from a database, a single database to a team of database is no joke. There is a lot of pain you will endure on the way, because you go to a more eventual consistency model that will hurt. There is a lot of data moving around, and you just need quite a bit more power, but it is worth it. And also, it is really nice to use the right tool for the job. I mean, if you really can write except Cypher query in Neo4j to exactly specify which graph traversal you will do, and it works, that is so much nicer than just trying to make that database do what you want to do, and we're struggling and struggling, and at some point you'll succeed, and you won't really enjoy the victory anymore. And I would also like for you all to check how much the use of the tools you use influences how you think, and you can only do that by trying different tools. If you always use only SQL, you will think that SQL is all you need. And if you start playing around, you'll get a lot more input in what makes sense and why. So that, and finally, the last thought I want to leave you with is that nobody uses a Swiss Army knife professionally, right? If you bring a car to the mechanic, they won't show up with a Swiss Army knife. A chef in the kitchen won't use a Swiss Army knife. Nobody uses a Swiss Army knife. I am pretty sure that even soldiers in Switzerland don't use a Swiss Army knife. So in that respect, why should we use a Swiss Army knife? And that's all I got. Thank you. Thank you, Frank. We have a whole lot of questions and discussions and all sorts of things happening on Slack. I'm going to try and choose a few good ones. First of all, you talked about how you had to use our colors, your SQL database in that piece. If you were sort of starting that from scratch and you had the choice, what would you use instead? And specifically, somebody asked, for a small scale web app, what would be your database of choice for that role that's not feature bloated and performs well? I think I would go with MongoDB now because it has done everything I wanted from it. You never really know until you really got your hands dirty with the database. They all look great on paper. So you need to feel it a little bit. But in general, you will always have a situation where you have made choices and that might be right and might be wrong. But you always have to go on and deal with your choices. Okay. It was a question asking whether you looked at existing solutions before you sort of rolled out your own version with the Kafka screams and stuff. For example, Datomic. If so, why did you reject them? I don't know all of them. Okay, obviously. But we had that solution in place. But the fact remained that we were... that we really wanted some more freedom, actually. And also, yeah. So most of the database solutions are point-to-point. And we wanted... Once you get that change stream into a Kafka system, you can do anything you want. And I believe, and I still do, that that investment was worth it. Yes. So once the database is set up and you've got data streaming in, all of that makes sense, what happens then in terms of adding a new service, loading all of that data in? Is that a big ETL job? And also, if you change migrations on those databases and want to redo things, do you put things back into the streams? How does that work? Yeah, so I glossed over a whole lot of things. Of course. But generally, Kafka can hold a lot of data. You can hold terabytes just inside the streams. So there are some tricks to make the size of data not completely go berserk. But apart from that, you can just put everything in a queue and you just restart the processing of the queue. And depending on the database, it will refill. And it will put some load on Kafka, but not so much on the source database. That will be unaware of this happening. Okay. How do you, or can you, write integration and end tests across this whole stack to test everything's working? That's difficult. That's difficult because of eventual consistency. That makes pretty much everything difficult. But what we do is generally just periodically push changes in a certain table that don't really are semantically important. And we check how long that change takes to end at the other side. And with that, we can monitor if there is any lag. Okay. This is an interesting question. So I'll read the whole question. You've answered the first bit. Does Kafka store a log of all the events, which I think you've said? Yes, it does. Yeah. I mean, there are some tricks there, but yes. Yeah. Does that then mean that all of the actual databases are effectively a projection of the Kafka streams and that your Kafka streams now become your single point of truth? Yeah. You could say that. I don't see many people who are so brave to remove the original database because you technically could do that. But, yeah, that's basically the, this is called event sourcing that you just really think that the events are the thing that's important and you can create anything out of that. And that's like the other approach to take to this subject. But I started from the more practical thing. You want to replicate data. But yes, that's definitely true. Awesome. There's more questions on Slack, but I'm going to leave it there for now. But please do dig in and join in. It's been great. Thank you very much. Thank you.