 Well, hello and welcome to another Dev Nation. We actually have a huge crowd here today, hundreds of you on the line, and we have a great presenter, Gunnar Morling, who's going to walk us through how he uses Kafka and Kubernetes and Debezium and all kinds of amazing things, right? Gunnar's going to wow us with everything under the planet. But more importantly, he does have a very practical set of capabilities from a cool open-source project that he's founded and worked on very hard. I think you guys really find very interesting, especially if you have databases in your application infrastructure. I'm betting most people here have databases, and they're trying to figure out how to use those databases better in a microservices world. That's really what today's presentation is all about. I'm going to put Gunnar on the spot, say he's going to wow us and blow us out the park. So Gunnar, I turn it over to you. All right, hello everybody. Good morning, good afternoon, whatever time it is. I'm really excited to be here. Thank you so much for having me looking forward to this very much. I got lots of contents, so I would suggest we just get started right away. So let me share my screen for you. So you see what I have here. I'm doing a bit of talking, and then I'm doing a live demo of all these things. All right, so I hope you can see my slides right now. All right, so changed in a streaming with Debeza, my Patrick Kafka, and everything else. So let me just talk a bit about myself, so you know who's talking to you. I work as a software engineer here at Redhead, and I'm leading the Debeza project, and so this is the one I'm going to talk about today. Before that, I used to be a member of the Hibernate team. Hibernate is a family of open source, Java, data-centric projects, and as part of that, I used to work on beam validation 2.0. I do some stuff on the site, and if you would like to know what I'm up to, just follow me on Twitter or check out the log. And with that, let's get started right away. Change data capture, so what is it about? The idea is pretty simple. So as per that, you have data in your database, so being customers, purchase orders, products, whatever your domain is about, and now you would like to have an event and a notification whenever any data items change, right? So if a new customer gets created or a purchase order gets updated or something gets deleted, you would like to have an event, and this event describes this change, and you would like to propagate those events, and Patrick Kafka comes in here because, well, you would like to have some sorts of decoupling between those events, producing database and any consumers, and Kafka gives us this in a very nicely scalable, performant way, so we have an asynchronous communication with any consumers. So we push change events into Apache Kafka, and then we can enable all sorts of interesting use cases, right? Once we have those change events, and we can keep them in Kafka for as long as we want essentially, for instance, we could think about data replications. So you could just think about replicating data from your primary database into another database, and you might wonder, so why I'm not just using the replication tools which come with my database? And one reason might be such a CDC pipeline change data capturing CDC, it allows you to replicate data across vendor boundaries. So you could have a database from vendor A running in production, and now you would like to push the data into, let's say a free open source database which you run on the site, and such a CDC pipeline would let you do this, or maybe you would like to push your data changes to Hadoop cluster, your analytics system, Apache Spark, or your data verus, whatever, right? So you would like to have the data in those other systems so you can run all sorts of interesting queries you can gain insights into your data which you might not get into using your primary database and CDC helps you with that. You could think about having a data feed set up to another team. So maybe your marketing folks, they have their own database which they use for running a marketing related queries, and you could use CDC to push the data from your main database to just this team's database. All right, so this is replication, and there's many, many more interesting use cases for CDC. One would be auditing, right? So maybe you're obligated to keep a history of all the changes of your data, and well, if you have those change events, you can keep them in Kafka, and you could even keep them indefinitely if that's what you wanted to, and maybe you enrich the data or you enrich those events with a bit of metadata like either a person or the user who changed a specific item in some time step, and then you have this order block and it allows you to go back and see how your data changed over time. You could use those change events to invalidate caches. So we just did a very nice blog post about this on our blog, the BZM.io. So if cache invalidation is something you're interested in, just check this out. You could use CDC events to push data towards full text search indexes and we will actually see this in the demo. There's many more. You could think about updating read models. If you have a CQRS architecture where you have one canonical write data model and multiple read models, those read models, they must be kept in sync with your primary model and having this change data stream allows you to keep those other models in sync. So there are many opportunities and I've worked on this stuff now for two years and I still come across new use cases for CDC. So this is really a powerful tool to have in the box. With that being said, let's see how Debezium fits into this. And Debezium is an open source change data capturing platform. So this essentially does all the heavy lifting for you. So it taps essentially into the transaction log of your database, be it Postgres, MySQL, whatever, and it gets the change events out of the transaction logs. And this is very good because it's efficient. So it's not like some trigger based solution. So this doesn't really add much overhead. And this also is fully transparent to any upstream applications, right? So they don't have to be adjusted. Just Debezium goes to the transaction box and then it gets the changes out of this and any applications that are right to the database, they just can remain as is. And then the idea is that Debezium emits a rather abstract change event format into Kafka consumers. So they don't really have to care too much what the specific databases, where a specific event originated from. So there's an abstract event representation. And there's tons of features in there. I cannot really describe them all. There's snapshotting. So you don't only have the possibility to stream changes beginning right now. You also can have an initial snapshot of your data. So you start with one consistent representation of your data. There's a very active community, which is something I'm very proud of. So we just, as it happened this week, we released Debezium 0.9, which added a connector for SQL Server and never contributes this by more than 30 community members in there. So this is really a cool thing to see that many people contribute to Debezium. And it's already deployed at very many companies. So we see huge deployments just the other day. I learned about one company who used Debezium to stream changes out of 35,000 MySQL databases. So I would say that's quite something. In terms of connectors, there's support for those leading open source relational databases, MySQL Postgres. We have the connector for MongoDB, which is used quite a bit. And as I said, just in 0.9, we added support for SQL Server. Then Oracle support is very often asked for. So we are working on this. There is currently a tech preview, which is based on the XStream API. And at this point, we're also looking into other alternatives for making this work with Oracle because obviously it's very popular database. We might add more connectors down the road, but let's see what those could be when we get there. So now let's see how such a CDC pipeline with Debezium and Kafka would look like. So already mentioned, we have our data and our change topics in Kafka. By default, that would be one topic per table we capture. So you would have a topic with all the changes. You would have a topic with customer changes. And then there's another component, which is called Kafka Connect. And Kafka Connect is a framework as well as a runtime environment, which allows you to develop connectors and which allows you to run connectors. And now you might already have guests. Debezium essentially is a set of those connectors and you have two kinds of connectors. There's source connectors and Zinc connectors. And source connectors get data from whatever source you have into Apache Kafka. And then you have Zinc connectors, which get data out of Kafka and into whatever Zinc system. So there is a rich ecosystem of connectors. You have something for elastic search, relational databases, how to BigQuery. So you can essentially push your data once it's in Kafka into any third-party system you could think of. So you would deploy Debezium connectors into Kafka Connect. One may be connected to MySQL, one connected to Postgres. You would have the data in those Kafka topics. And then you could, for instance, deploy the elastic search Zinc connector if you wanted to have full-text search on your data. And then this Zinc connector would take the data from the topics and propagate them to corresponding documents in elastic search. And this all by means of configuration, setting things up in Kafka and Kafka Connect, but without any code you would implement within your application. So let's talk a bit about microservices because this is also a very interesting use case. And if you have heard the talk or blog post by my former colleague, Christian Poster, you would remember once you said, well, the hardest part about microservices is data. And this is certainly true because microservices, they don't exist in isolation. They are very often need data from amongst each other. So let's say you have those three services here, order, item stock. A very likely to order service might need data from the other two services in order to provide its functionality. And now the idea in microservices is they shouldn't share a database. They should have their own local database. And well, if the order service needs data from the other two services, it shouldn't do something like synchronous requests because this gives you like a close type coupling which you don't want. So you would like to have some more loose coupling and change data capturing could help you with that. The idea being you set up change streams using Debezium so that you have topics with item changes and the topic with stock changes in Kafka. And then the order system, it could subscribe to those topics could create a local version of this data in its own local database in its own locally optimized representation representation. And then it could performance functionality just by its own. It doesn't have to do any synchronous requests to those two systems. So you might think, oh, maybe this is a bit too low level. I would rather have a bit more control about events I sent to those other two, you know, to those other event consumers. And this is very interesting and powerful outbox pattern comes into play. And the idea there is, so I have a writing service, it could be the auto service and this now would like to change its own local database but it also would produce events to other consumers. Now, you cannot have one transaction of which spans the local database in Kafka that's just a doable. So you want to avoid what's called dual rights and Debezium or change their capturing can you nicely help with that? The idea being that you have an events table in the source database and instead of just writing directly to Kafka now this auto system, it writes to its own local business tables and it writes an event into this local event table. This all happens as part of one local transaction. And then you have Debezium which just swims changes out of this events table and propagates those events which might have, you know, information about what's the type of event, what's the category, so those are all events and probably they will have a payload like a JSON structure. Debezium will stream those changes and then consumers could subscribe to these topics and you have a very reliable pipeline between those services without having this dual rights issue. And the last thing about microservices I want to mention quickly is microservices extraction, right? Very often you don't start on a green field you have existing monolithic applications and you would like to go to a world of microservices and you would like to extract microservices out of models. And again, CDC can help you with that and you could do something like this. So you begin to extract a new microservice with some part of functionality and for some time you still keep running right requests against the old monoliths and you use CDC to stream the changes from the old monoliths to this newly extracted microservice. So this has its own version of the data and then it can run on the side you can see how it performs, how, you know, what its calculations look like and so on. And at some point you're happy with the functionality of the new service which so far has been fed with data from the old productive monoliths and at some point you just switch over you do the rights against the new microservice so you have finished this transition period and you have, during this transition period you have used CDC just to build this new microservice on the side. So there could many more things be said but I want to show you a bit of all this in a short demo. And the idea for demo is I have an order management application so this creates, this has purchase orders and purchase orders belong to certain categories and now we are interested in the accumulated revenue per category in a timed window. So I would like to know what's the aggregated order value of the tools category in the last minute or last five seconds or whatever. So that's the basic idea. And let's go to my other browser here and I have this running an open shift which is Reddit's distribution of Kubernetes and I've already prepared a couple of things. So let me quickly run you through these here. So I have Kafka run. So I have a Kafka cluster with three nodes and to set this up I used a project which is called Strimsy. I will talk a bit about this in a minute. So this is just very helpful for me to set up this cluster. I have Zookeeper which I essentially need to maintain state and managing the cluster. I have Kafka Connect which already contains the debisium connectors for me. I have this cluster operator which is a component of Strimsy which deals with managing the cluster. It keeps the cluster in the desired state. I have a MySQL database and this MySQL database contains those two tables, orders and categories. So this is my source of data. I have this event source application which just randomly inserts purchase orders. So this is like a randomized function which just goes and inserts random orders. And I have two more components here. I will talk a bit later on about those. So let me fire up this event source. It's not running yet. So let me scale it up to one pod. And now this launches and let's take a look at the logs. And this should insert random orders, right? So it inserts 50 orders, 100 orders. All right, so that's cool. Now let me go to my command line here. And so I have, this is now the shell bar open shift is running. And the first thing I need to do is I need to deploy now the debisium connector. So I get changes out of those two tables. And for that, let me copy this one and I will explain it to you. So I take this request and I submit it. And this one is using the REST API of Kafka Connect. So Kafka Connect comes with a REST API which I can use to deploy connectors, configure them, start them, stop them, and so on. And now in this case, I'm deploying an instance of debisium's MySQL connector. I am saying, okay, this is the host, the MySQL host, please get the changes from there. Use this port, use those credentials. And I'm just interested in those two tables, orders and categories. So I'm just, I'm having this table whitelist. And now this connector, it should get changes out of those two tables and write them into corresponding Kafka topics. So let's see what we have there. And I'm using a tool which is called Kafka Cat. So let's first take a look at the categories topic. So let's see what's in there. And I have those events. So I didn't mention the structures. So essentially they have like this before part and an after part. So the before part tells what's the old state of row in case of an update. This would be the previous date in case of insert. Oh, it's empty. I have the after state. So for my categories, I just have the name and an average price. And I have this source block, which is some metadata like what's the database over where this event is coming from, some timestamp, the position and the log file and some more information. And now I have essentially those entries in this topic. No new events are added because, well, those categories, they just exist. And this is just the data from the initial snapshot. So I can take a look at this other topic, which is the other subject. So let's see what's in there. So here is some movement. So if you look, new events are added. This is because this event source application, it produces more and more orders. So let me stop it. Again, I have the source block. I have the before state, which is empty because it's new entries in the old table. And they have some information, like a purchaser ID, quantity, sales price. And they have the category ID. All right, so I have those two topics. And now I said, I want to have the aggregated order where you per category in a timed window. So how can I do that? And for that, I'm using an API which is called Kafka Streams. And this is a very powerful, very, very mighty API which is a part of Kafka. It's a Java API. So I need to run something, or I need to have an environment where I run this API in. And for that, I'm using Fontail. So Fontail is a project from Reddit, which allows me to build microservices based on the Java E or Chakarta EBE APIs, I should say, and micro profile APIs. And it creates a Fedjar for me, which makes it nicely runnable, let's say, in OpenShift. So that's what I'm using to deploy this Kafka Streams API. And I can run you a little bit through this pipeline here. So I'm essentially getting a K table of the categories topic. So this is just the current state of the categories topic. I have a K stream of the orders topic. And this stream, it will produce new events or new elements, let's say, whenever something is written to this orders topic. And now I would like to join them so I can use this join method from this Kafka Streams API. For joining, I need to have the same key on both topics, which I would like to join. So I'm first selecting the category ID as the key for the streams for the orders stream. Then I can join them, I join them on the key and I also produce the category name into this joint result. I grouped them because I would like to have revenue per category in a time window. So I'm using this group by function, this groups by the key, which is category name at this point. And I would like to have windowed values now, in this case, time windows of five seconds. Now I have the data grouped and now I need to aggregate so I'm using aggregate. And essentially I'm just adding the sales price of the orders within one category and within one time window. I'm doing a bit of string representation, not too exciting. And now I could take this data and I just could produce the entries from this stream to another topic. And in fact, I'm doing this down here. So this is written to another topic, but to make it a bit more interesting, I'm also taking the output from the stream and I'm writing it to all connected WebSockets clients. So this font-ale application, it also runs a WebSockets endpoint. And to all the connected endpoints, I'm just for each element, which is contained in this stream, I'm just emitting a small JSON with the category name and the accumulated sales value, right? So I have this WebSockets endpoint. So let me go to the browser again. And now I have here, this is the aggregator application, which I was showing you in the clips. So this is right here. It exposes this endpoints, let me go there. And now this has a little chart which I quickly built and this whenever a new JSON entry with the sales value for one category like computers, tools, whatever is pushed towards this client, the chart is redrawn and I have this live update of my data. And I find this pretty, pretty powerful. And now the cool thing about having data in Kafka is I can enable more and more use cases, which I maybe didn't even think of when I originally started to set up such a topic with change events. So let's say now I would like to have full text search on my auto data. All I need to do is I need to deploy another Kafka connect connector and this will make it happen. So let me take this request here. So we do much to type. So I'm copying this, but I will tell you what I'm doing. So now in this case, this deploys an instance of the elastic search sync connector. So this takes data out of Kafka into elastic search. I'm saying, okay, please subscribe to the orders, to the orders topic. Where is it here? Subscribe to this topic, please. I would like to have a index name of just orders, maybe a bit shorter. I write data to this elastic search endpoint and I'm doing those transformations. And the reason here is so I have those complex events which I get from the business. They have this before state, after state, source, meta information and some more. Whereas this connector and many other sync connectors, they would like to have some simpler representation just of the current state of this data item which this is about. And then this unwrap from envelope transformation which I'm applying here. This is just doing this, just takes the after state and just propagates the after state, the new state of this row to what's this sync connector. And I'm applying this key transformation which is there to make sure that the primary key for my change events or the message key which is the primary key from the captured table that this also is used as the document ID in elastic search. So this is what I'm using this key transformation here for. So now I have deployed this connector. I should have data in elastic search. So let me go to OpenShift again and let me go to elastic search. So I have this running here and now I just happen to know that there is this orders index there and I'm just taking a look at its contents but I could obviously now use elastic search, full text search query API or query DSL and run more meaningful queries. But the interesting part is whenever I refresh this, you see the total number of documents in the index it increases as more and more items are added to this. So this is really nice because when I set up this change stream, maybe, yeah, I didn't even know I wanted to do full text search and just by means of adding new connectors I get more and more use case support. So this is really quite useful. All right, so let me go back to my slides and just a few words on this stringy project. So as you, as I mentioned, I use that to set up Kafka on OpenShift in this case and strings makes this very easy for me and what it provides for me is A, gives me container images of Kafka, Kafka Connect, Zookeeper and everything else. So I have nice images but more interestingly I have Kubernetes operators and those operators, they take what's called custom resource definitions which essentially are YAML files which for instance describe a Kafka cluster. So you can say I would like to have a Kafka cluster with three nodes. I would like to have a topic with this number of partitions and this kind of application factor. And this operator will come take the YAML file and make a Kafka cluster or make a topic. So it looks like your CRD, like your resource definition. So this is a very convenient, very practical way to set up Kafka. Streaming steps from community project but there's also a supported version which is called AMQ Streams. So this is definitely something which I recommend you to check out. And with that I am actually done just to wrap it up quickly. So I hope I could kind of convince you that change data capturing enables all sorts of interesting use cases be it replication, be data synchronization between microservices, full-text search as we have seen, a live updated UI as we have seen and many things more. And Debezium is an open source implementation of CDC with support from MySQL, Postgres, Mongo and so on. And this all comes to you transparently and very easy to set up. So you don't have to alter your Austrian applications. You just tap into a transaction block and Debezium is doing this heavy lifting for you. So yeah, last thing I have got a couple of resources. If you're interested, check out the website. It's all open source, a patient license, so the source codes and GitHub. And if you would like to get in touch, just join our Google group and we can have a discussion there. And with that I would hand over back to Burr and I think we can take a couple of questions. And we do have some questions. And besides we have questions that are not, I can't hear, I can't see. So these are good questions. So one key question that we're gonna get a lot is does this work with my data source? And so does it work with Bigtable on GCP? Does it work with an RDS database on Amazon? Does it work with my site base? So can you quickly learn which databases are supported today? And what do you hope to have in the future? All right, so currently we have MySQL, we have Postgres, we have MongoDB, we have SQL Server. Those are what we consider they are readily usable. We have Oracle, which is kind of a tech preview. It runs on RDS. So we definitely have users who use Postgres, who use MySQL on RDS. There are some restrictions what you can do and what you cannot do there, but this is working on RDS. We don't have support for site base. I think it's the first time I heard about this. What was the other one? I think the Google one, we don't have this as well. So I can say a couple of times we have heard about Cassandra. So this might be something which could come. A couple of times I've heard about MariahDB. This could come, but we will have to see what the most requested ones are. Okay, another great question. And really every database was asked about. Every, think of any obscure database you've heard of. Someone asked a question. CosmoDB or Azure. I think I might have to do two and right. So all kinds of. I mean, I would love to support them all. The thing is we need to implement those APIs and there's some efforts. So, but yeah, definitely give me those names and then we can take a look. Yeah, fantastic. Another question, one is just can you give us the link to your demo code based on GitHub? We think we are using the DBZM examples on GitHub, but people wanted that project. Exactly. So there's the DBZM examples repo on GitHub and there it's called KStreams life updates. That's the name of this demo. Okay, KStreams live update. Fantastic. Right. A couple other, again, more database questions. Hey, what about Apache Lucene? Oh, I mean, so that's the first time I heard this one. So there's definitely no support for getting changes out of Lucene. I'm not even sure whether there's an API which would enable this. I mean, keep in mind, we don't want to calling, right? We want to have some sorts of getting into it log. I'm not sure what's there for Lucene. Definitely you can use it. Or you could think about getting data into Lucene. I mean, we have this, there's this connector which is provided for Elasticsearch. You could think about Lucene, but it's not supported as a source at this point. Okay, and that's the key thing to understand, right? Is streaming changes from the database commit log, right? Is that the right way to say it? Absolutely, right. So there is no calling, there is no trigger. So this is not, it doesn't have any impact into your model and it meets some means of configuring, let's say the log in the right way. But then this is coming from the log and this is giving you like near real-time updates, I would say, and it's super efficient. Okay, well, we're nearly out of time, but a couple of key things that are housekeeping. The slides were available in the handouts, a button at the bottom of the screen and you're in this platform. You can get the slides now and download those. Make sure to follow Gunnar on Twitter. That's another key option. I put in the chat, the Debezium URL. I put the StrimZ URL. So you can go to those websites and get more information. There was a question on StrimZ and is it GA or not GA? Well, we do support it with customers, right? So it is a part of the AMQ Streams platform. So customers can feel free to use that. That's how we think of GA, right? Is do we support it and can you get a subscription for it? That's the easiest way to think of it. But when will Debezium be part of a subscription? Will it be part of something about nature? Do you have any thoughts on that? So I have thoughts. I'm not sure whether I can talk about it. But that question did come up. It's like, hey, when do we think Debezium is production ready? Absolutely. So I mean, I totally get this. And in this point, I would really love if you come to our Google group. Just shoot us an email that you have interest in. Maybe describe your use case or give me a private email or send me a DM on Twitter because I'm really very keen to hear about those requests. OK, that's a good point. The Google group is a great place to come in for other folks. Let me see if I can. Looks like you use Gitter as well, though, for your group. Exactly, right. So if you are more of a chat kind of person, just come to the Gitter chat. We have a user chat. We have a dev chat. And we can discuss in more real time there as well. And I had a Google group link to the chat so people have that too. Don't forget about that free ebook I mentioned, written by Edson Yanaga. It does mention Debezium in there and gives you other techniques for dealing with monolithic databases in a microservices world. Awesome demo, by the way. I loved it. Great stuff there. All right. Thank you. Seeing Debezium and you, seeing Kotlin use, seeing those database changes fly through, I love that. So this is really fantastic technology. You should be very proud of what work you've done in the open source community there. But we have to sign off. We're out of time for today. But thank you all so much for attending. There will be a recording available. Look for an email coming back out about that recording. You can also follow my playlist on YouTube. And I basically publish them out to my playlist on YouTube too. So I'll add the link to the playlist. You'll see all the previous donations there, including this one where we get it published out to YouTube. Thank you so much for your time today. Gunnar, thank you so much. Thank you for having me.