 Hello and welcome to another DevNation deep dive. I'm really excited to be here today to talk about a pretty cool open source project called DiBisium. We're going to learn today all about change data capture, what it is about, and how DiBisium can help us implement change data capture scenarios across our landscape. A few housekeeping things upfront before we get this going. So usually, there is a colleague assisting in the chat and having an eye on the stream and everything today. I'm going to stream all by myself. So I kindly ask for some patience in terms of me not being able to have the chat view at all times. But nevertheless, fear not, we have some time left towards the end where I try to address all your questions or at least those questions that we can take given the time we have left. Before we start this, let me just share some links in the chat because as usual, those of you who have joined some other deep dive sessions in the past, all our material is out there in the public for you to inspect any time. So let me share those links. So here, I just posted the link for the deck here that I'm going to use. And then there is another link for the tutorial. The tutorial contains step-by-step instructions for all the things that I'm going to show you during three simple demo cases that we are going to look at in that session as well. So feel free to revisit those materials anytime you want so that you can work through them at your own pace. With that, please, for those that might join in late, again, feel free to ask any questions in the YouTube chat towards the end of the session. I am going to address them one by one and see that we hopefully can answer most of them today. Without further ado, let me dive right into today's material. So I hope that everyone is able to see the screen by now. I verified it should be good. So let's look at the agenda. So we want to learn in that session today, in general, what change data capture is all about, how it works, and then specifically, we try to understand how this fantastic open source project called Divisium implements it and how it can help us to do that and use that in our own projects. We're going to learn that Divisium is a so-called log-based change data capture solution, and I try to briefly highlight some of the major benefits that come along with this as well. Then we're going to see the databases that are currently supported by Divisium. We also take a glimpse at the change event payload structure in order to understand how these change events that are extracted from databases will look like and how we can process them further. And finally, I explain to you three different ways how you can run Divisium. There is not just one way of using it, and this is also something that you should take away from that session. And then as usual for those deep dives, we have a good half of that session dedicated to this tutorial that I pasted the link into the chat already, where I'm going to show you three demos based on Divisium. A few words about myself, my name is Hans-Peter Grasle. I am a developer advocate at Red Hat. I am working here for about one and a half years. I am remotely working from Austria based in Graz. I'm an open source enthusiast for a few years by now, and I'm lucky and humble enough that some of the things that I have done in two different developer communities have received some community recognition. I'm always happy to have conversations. I think it's easiest to stay in touch if you are on social media to do that on Twitter or access it's called now. So feel free to follow me there and also contact me anytime you want to discuss something related to any of those topics around change data capture, data streaming, Kafka, and things like that. Without further ado, let's dive into today's content. So when you go to the web page of the Divisium project, this is basically a screenshot of the landing page, you immediately read that Divisium is a so-called change data capture platform. It's fully open source. And what this means on a very high level is that we can take Divisium, we can point it to any of the supported databases, and then we can start to react to any changes that are happening in those databases. So Divisium will capture those changes and will expose them, allow us to send them to downstream consumers for further processing. So any inserts or updates or deletes are then something that you can act upon from various different services or applications that you build around maybe an already existing system that uses one of the databases that you are capturing with Divisium. Divisium is a very, very healthy open source project. And you can just look at some of the high level stats that you can see on the GitHub project directly, but more important than metrics such as stars or forks or whatever is the fact that a project has a vibrant community. And I guess Divisium definitely can confidently say that this community is there. Last time I checked, there were 480 something contributors. So they are reaching, I think the 500 milestone rather soon. Divisium is a change data capture platform. I mentioned that already. And it also is specifically using the transaction logs that are underpinning the databases that you can use Divisium with. So every typical database, all of those popular databases that you may already have used and are running in production, they have some notion of a transaction log, something like the bin log in MySQL or the write-ahead log in Postgres, the transaction log in Oracle or the op log in MongoDB, you name it. So all of these are essentially helping the database to fulfill certain characteristics around ACID properties. Now those transaction log implementations, they are very, very different in nature. So there is not something like a common kind of transaction log standard across all those different databases. And so this is also a pretty nice, Divisium characteristic that it basically abstracts away all the gory details and all the peculiarities that you may find in working directly with the transaction log. There is no need for that. Divisium does that under the covers and it exposes change events that it captured from those database transaction logs in some kind of standardized format. You're going to see that later as well during the session. Now, those transaction logs are usually not kept around indefinitely, which means that you might wonder, how can you make sure that you can capture all the rows in a particular database table if you might not even have the transaction logs available anymore for some of the changes that were done on that table. And this is exactly where a pretty cool feature called snapshoting comes into play. It allows you to create a snapshot and that's what the name implies. You can say, hey, please make a snapshot, essentially take all the rows that are currently residing in a specific database table and please send those as change events to some system that we want to use for propagating those messages. Very often this is Apache Kafka, but it doesn't have to be Kafka as we are going to learn also during that session. And then besides that, Debezium then after doing that snapshot seamlessly switches into that continuous mode of capturing all those inserts and updates and deletes that are happening in a database. It also has some way more features. You find those in the documentation so you can apply certain filtering on those change events. You can route those change events. You can then flatten the somewhat nested structure. It's pretty a lot of information that you get from those change events. If you're only interested in the actual data parts, you can do that as well. And then another nice feature we're going to see a short demo as well of that is the Debezium UI, which allows us to manage connectors that we are running. We can inspect the running ones. We can create new ones. We can get insights into the state of running connectors and things like that. Also, there is a dedicated page on the Debezium projects website that shows user or customer testimonials. There is very large scale deployments of Debezium out there. One very famous use case just to name one of them is in the e-commerce space which is Shopify. They are capturing thousands of databases and tables using Debezium and this notion of change data capture. I mentioned that Debezium uses transaction logs. Let me briefly walk you through some of the major benefits that this brings. First of all, we are going to see that in the demo as well. Usually those change events, they are captured and exposed with very low latency, meaning you can react to those changes in near real time. Another cool aspect of this transaction log tailing that basically happens behind the scenes is that it only means very little overhead to the database compared to other approaches such as query-based polling to understand what changed since the last, I don't know, 20 or 30 seconds. So there is another approach that is generically referred to as query-based CDC and log-based has those benefits in contrast to that that I'm going to refer to here. No change is missed, meaning every change will be reflected in the transaction log off the database at one point. This means all those changes can be captured. You don't miss any of those. Again, when you would think of polling a database at regular intervals and in one such interval multiple changes would happen to one and the same row, you would miss those. In other words, you would only get the last change to any of those rows when you would do query-based polling. Another nice property of log-based CDC is that you do not have to have any specific data model in order for this to work. So it's data model agnostic, meaning you don't have to prepare your data model. You can use any data model you currently have without the need to make changes, such as introducing auto incrementing columns or specific timestamps to understand when things have been updated last time and stuff like that. So it works with any of the data models that you are currently using. Also, it can capture deletes. This is just another database operation that is reflected in the transaction log. So it's quite natural. So similar to an insert or an update, you capture deletes and you can propagate those deletes as well. Then depending on the database, you can even get not only in a change event the current state at the moment it was captured but also the previous state. So when you think about an update operation, it would give you both representations of a specific database table row, namely row representation before the change and after the change. And you would get that and you could also understand then what has changed by basically calculating a diff on those fragments in the CDC event. And finally, you get additional metadata that also Debezium exposes on the one hand. This means Debezium related metadata around the connector itself or so a database specific metadata such as the transaction law position or transaction ID, a timestamp and operation type and things like that. We are going to see that when we take a glimpse at the event payload structure very soon. Currently what databases are supported. And here on that slide, I try to explain that to you. You see lots of usual suspects so all the popular databases are there. So definitely the top five according to the DB engines dot ranking that is continuously updated are there. So this is Oracle, the Postgres SQL Server, MySQL and MongoDB and also the others are somewhere, I think between the top 15 databases according to that specific database engines ranking. What we also see is that it's not only working for relational databases but also non-relational ones. MongoDB is a very good example. In fact, it was one of the first databases that the Debezium project supported after it was published on GitHub. Eight of those nine are deemed to be production ready in terms of what the Debezium project, the open source, the upstream project sees as production ready. And one of them currently Cloud Spanner is still in incubating preview as of Debezium version 2.4. Now historically, Debezium, and this is what we are talking about so far was always about getting data changes out of those databases and propagating them further for processing in various different ways. Not too long ago, the project added a sync connector. So basically the opposite part, that you need when you want to feed those change events into any target system of your choice. And there have been other sync connectors supporting JDBC out there, but it was a pretty good idea to add that to the Debezium project itself because using that Debezium JDBC sync connector it makes our lives easier because it is a better fit for understanding the change data capture format and payload of Debezium source connectors, meaning it's just easier to work with that. And then you can address any target system that you can access by means of JDBC in order to feed those change events into these systems. When we talk about that, I'm always talking about configuration-based approaches. There is no need to write any code for an end-to-end, for instance, streaming pipeline that would read those change events from a Postgres database and feed them into another, let's say MySQL database using this sync connector. So that is all you need is configuration and proper configuration and the way to run Debezium. Now the database support is steadily growing. There are various ways how this database support is extended. There are core connectors written by core maintainers. There are some folks in the community which step up and lead certain additions, either features to existing ones or maybe even start new connectors that for other databases that aren't supported right now. And another category is that of some third party company that would write a new Debezium connector for their own database product. So that is also something that happens, meaning instead of re-implementing everything from scratch, Debezium in that sense, also serves quite nicely the purpose that when you need some building blocks, so you can think of it also as a framework that allows you to implement new connectors on top of those building blocks that the project already provides. And also of course it's a good idea to build on top of the existing change data capture format and the event payload that Debezium uses. Talking about this event payload, we talked about those change events now for several minutes. This is one example of how such a change event payload is structured. So every CDC event is composed of two parts, a key and a value. The key usually simply contains the primary key of a specific database tables row. And then the value is where all the other interesting information is to be found. So first of all, we see here an example on the right a JSON snippet for one of those change events that reflects an update operation in the database. And we see two sub-documents before and after sub-document. And the before, as the name implies, shows us that state of a particular table row before an update was applied. And the after sub-document in that payload shows us the current state. So basically the state after this update has been applied. If there is an insert, there is of course no before state. So meaning this would be null, we only have an after state and conversely for delete operations, there would only be a before state. So the state, how a table row looked like before it was deleted and the after sub-document then again is null. Then we have a source block here and additional top level fields, which are referred to as metadata in general. And here, for instance, in that source sub-document we see specifics about the respective, the BZM connector for Postgres in this case, a version. We also see some database related metadata, the database server's name and timestamp if this event is coming from a snapshot or not, how the schema and the table are named where this change event was captured from and transaction ID and things like that. Finally, it's important to understand that the way how those change events are serialized is also configurable, meaning there are some serialization formats available and supported, meaning JSON is just one way. I chose JSON here because it's friendly to the human eye and easy to put on slides. There is a way, of course, to work with binary serializations such as Avro or Protobuf as well. And if you want to and if you prefer that and have your reasons, you can also say, I rely on the cloud event specification and then those change events will be exposed as cloud events. Before we look into the demos that we want to walk through together today, just very briefly, how can you run the BZM? How do you actually make use of it in your project context? So I have three deployment modes that I wanna walk you through here and the mode one and in my experience, still the most widely deployed way to use the BZM in production is that in the context of Apache Kafka. That means you have some Kafka infrastructure somewhere and this Kafka infrastructure will be used to store and propagate those change events in Kafka topics. In addition to Kafka itself, you would have a Kafka Connect cluster and that Kafka Connect cluster is supposed to run a specific instances of the different source connector plug-ins that the Debezion project provides on. That illustration, we see two such connector instances in this Kafka Connect cluster context being depicted on the left of Kafka itself. And here we have a MySQL source connector for Debezion obviously capturing changes from a MySQL database and another one that does the same thing for Postgres. And then if you think about an end-to-end kind of data pipeline on the right side on that illustration, we see various sync systems that we could use in order to feed those change events into. And again, if that's a proper fit for what we need, we can use sync connectors to feed those change events in these target systems. Here are examples for Elasticsearch to maintain something like a full-dijk search index based on those change events or caching infrastructure such as in Finishpan or you just wanna send those further into a data warehouse infrastructure that you might operate. Now Kafka is very popular, but Kafka is not everywhere meaning can you make use of change data capture in Debezion without Kafka? And yes, you can. And this is the mode two that I wanna briefly walk you through and this is called Debezion Server. So Debezion Server is just a Java, a turnkey-ready Java application that is written with Quarkus and it uses those connector plugins based on the embedded engine that Debezion also part of the Debezion project meaning you have a library dependency essentially for that Debezion Server app and you can use that just by means of configuration you start that Debezion Server app with the proper config and you can capture those changes from those databases that are supported. Another cool thing is that when you run Debezion Server you do not need Kafka as a way to propagate those change events further to other systems. You can directly feed other messaging infrastructure and three of them are listed here, Amazon Kinesis, Pulsar, or PubSub. And in our demo, I'm going to show you an even simpler way. Maybe you just want to send those change events that you capture to an HTTP endpoint using a REST API and a simple post request that should send that to a target system. Also that is possible when you use Debezion Server. And finally, the third way is to make use of the Debezion embedded engine directly meaning you would have your own custom Java app, for instance. You add that Debezion embedded engine as a library to your Maven or Gradle project or whatever it is that you use and then you can programmatically access those change events basically every time this embedded engine will then capture a change in the database lock, you will be notified, you have a listener function and you can do whatever you want in code with those change events. Enough talk, let me come to the demo part. And this is where I shared the link. You should see that in the YouTube chat a little bit further up. This is the tutorial where all of what I'm going to show you in the next 25 minutes is there with detailed step-by-step instructions so that you can walk through exactly the same steps that I'm going to show you at your own pace. With that, let me go out of the slides and let me bring up my terminal. And I wanna show, or let me actually go to the tutorial and just show you that once you go there, the first tutorial is that of Apache Kafka. So it's always helpful to have some illustrations. So the first demo that we're going to do is we use a Postgres database. We have a Kafka and Kafka Connect infrastructure. In that case, it will be all based on local containers running on my machine. So all you need is a way to run containers either Potman or Docker. The tutorial contains both commands for both. Just any of this will be fine and you can use and work with these demos at your own pace on your own machines. We have Kafka disaster that we use to propagate those change events. We run the BISUM in this mode one that I explained within Kafka Connect and we are going to extract changes from a Postgres database, all being containerized. So with that, let me switch to my terminal and let me spin up the infrastructure component. So I'm going to use Potman here. I'm going to create a pot and then once that pot is there, I'm going to run Suekeeper and Kafka. I am then going to run my Postgres database in a container. And again, like I said, no worries, all the commands are there. So if that looks to be fast, you can read it all up and work through that on your own later. We have the infrastructure with that. And so the next step is to just verify if we have everything properly running. So I'm trying to go into that database container, connect with the Postgres CLI and then I'm going to show you sample data that this database already contains. And here we have just four database records in that inventory.customers table. That's not much, but that's definitely enough to illustrate the concept and understand how all of this works and nicely comes together, hopefully. Once we have everything set up, the database with some data and our infrastructure that we need, we can instruct Kafka Connect to run a debisium source connector for Postgres for us. And we're going to do that by interacting with the REST API of Kafka Connect and I'm sending a post request using curl, only specifying the configuration for that connector. And I'm showing that to you in the browser because it's easier to look at. So this is the configuration snippet for this Postgres connector that I just launched within Kafka Connect. We specify that it's a Postgres connector. We say we want JSON serialization for those change events without explicit schema information. Then we specify some database credentials in order to access this Postgres database. And finally, we configure what schema or catalog and what table in that catalog we want to capture. And then as I explained earlier, what happens is that debisium goes to the Postgres database, we'll create an initial snapshot and afterwards switch to the continuous capturing mode. So at that point, we should already have those change events as part of the initial snapshot in our Kafka topic. Let's verify that here. Going to run that command to consume from the Kafka topic that is supposed to store those change events. And there we have it. We see the key part and then followed by the value. So this is representing the first database table row up here. We see that the primary key as the key part. And then in the value, we see this whole structure that we looked at on the slides. The before but is not because it's an initial snapshot. We only see the after state and that after state reflects exactly that row up here. And this means we have four such events. Now let's try to understand what happens when we introduce changes. I'm going to update customer four. So this row and specify a new first name right here. So we're gonna rename this customer to Anne-Marie here and take a look at the window down here which constantly listens for new Kafka records on that topic. And once I update the database, we immediately see that change reflected as a change event in our Kafka topic. Here now we have an update event reflected by this update operation here. And we now have that before state meaning we see how the database row looked like before applying that update and also how it looks right now after the update. So then let's take a brief look at what happens when we execute the delete operation right there. We do that and immediately afterwards, we see a deletion, we see again that the key and then the value, the whole value being a delete operation right now meaning we only have the before state and the after state is now which is fine because again, the database table row is gone by now. So meaning when we select that table, we see exactly that state, the fourth database row is gone because we successfully deleted it. Now you might wonder what happens in terms of resiliency or how reliable is all of that. And here, let's do something in order to understand at least one failure scenario in that context here. And the failure that we try to simulate is a downtime of our Kafka Connect infrastructure meaning if Kafka Connect is down in this way of running a debisium also means that the source connector for Postgres is not operating and will not capture changes. So let me do that now on purpose by stopping the Kafka Connect container. When I've done that, let me verify that. Here Kafka Connect container stopped meaning we don't capture changes right now but the other containers are fine meaning the database is running, Kafka is running. So we can of course have new changes that are happening in the database. Let me insert two new rows here to show that. So now we just inserted two new records. If I select that, we have five customers. These two are new. And obviously here we are missing out on those changes which is okay because we have a downtime here for our connector. And the good part is that and we are going to see that very soon is that debisium is clever about that and tries to remember where in the database transaction log it left off meaning what events in those from those database transaction log have already been successfully processed and propagated. And so it will remember that point in this case by default it will save this offset in Apache Kafka itself. And then once we bring it up again it will seamlessly continue basically where it left off meaning we are going to restart now this container for Kafka connect and our connector will start running again and debisium will figure out where it left off and soon afterwards we should see those change events coming in here and there they are in our Kafka topic. So this is just one way to understand that this process is quite resilient. That already concludes the first demo and let me just bring down the infrastructure that I'm using here so that we have a fresh infrastructure later for the second demo that we want to look at. So now it's shutting all those containers down we'll remove the pod and we should be good to go then for the next demo. The second demo is about debisium server. Let me again go to the webpage and let me show you the scenario before we dive into the demo. So here again, the idea is to do pretty much the same thing of what we have just done. So it will be all very familiar we have our Postgres database but the two major differences here out there is no Kafka meaning we first use debisium server to capture the changes directly from Postgres. And then instead of writing and producing those change events to a Kafka topic we use one of the supported things and there are many again on the slides are referred to other messaging infrastructure here to keep things even simpler. I'm just going to send them to a web API endpoint. So what we are going to do is each and every change event will be sent as an HTTP post request towards a web API. This is what we are going to look at in this second demo scenario. Let me bring up my terminal again and show that to you. So first of all, we need to configure. Remember I said what we have is debisium server it's just a Java application. Usually this debisium server app itself would run it in a container. What I did here for simplicity reasons I just downloaded the archive and can run the Java app directly on my machine also basically on bare metal if you want to. But of course if you think about containerized deployments on Kubernetes or anything like that you would containerize the debisium server app itself as well. This is an application properties file quite common to have for Java applications. We specify the Postgres connector. We have our database credentials again similar to what we have seen earlier. So strikingly similar in fact to what we have seen earlier when we configured the connector in the context of Kafka Connect. We had that JSON configuration snippet. Here we have an application.properties file. What I'm going to use is I'm going to use webhook.site here directly in my browser that to have a web endpoint and this is an individual web endpoint. I'm just removing what those events from last time. So this is my unique URL that I can use to understand the requests that the debisium server app will send to that endpoint based on the change events it will capture from Postgres. Let me take that and put that into my configuration file because you see right here we specify the sync type. So the target system or the destination for those change events to be HTTP and then we specify the URL where to send this HTTP requests to. And this is this webhook.site URL that I just copied from my browser window. Once that configuration is in place or we need for that demo is just again a containerized Postgres database. This is something that I need to start right here. I'm going to do that. And once that database is up and ready to accept connections I should be good to go with my configuration and use the Run script for debisium server. It will spin up this Quarkus application. It will go to the Postgres database and according to my configuration do that initial snapshot again. And instead of using Kafka or anything we should see this change events now being sent over HTTP with post requests against this API. And in fact, we have it here. All those four change events one, two, three, four reflecting the four table rows from my Postgres database are here. So similar to what we have seen earlier these are the read events representing this initial snapshot of those four rows, one, two, three, four so those four customers. And now just to verify that as similar to earlier if we are going to make changes to that database we should see those changes being captured and propagated in the same way as happened earlier. So again, let me go into that Postgres container real quick and then select the status here we have the fresh or the original status again with the sample data that we used to begin with. And then if I run an update for customer one and the delete on customer four, updating one, deleting one, we should see two new change events that have been captured after this initial issue that happened after doing the initial snapshot. So this is the update event again changing your first name here and then a deletion event for again customer four. Now you might wonder if this is just a Java app and better or not this runs in a container on Kubernetes or wherever what happens if it crashes or if the pod needs to be rescheduled and things like that. And again, there is some resilience that the BZM server has built in and the way this works is as follows I have skipped that part when I showed you the configuration of the app. Let me do that. There is an offset storage configuration for this BZM server app. And in this case, it uses the local file storage on my machine, we specify a file and this file is used to keep track of again how far it progressed already in terms of the transaction lock tailing effectively. So it remembers those offsets in that file. And again, if you are in a containerized setting or in Kubernetes, you need to make sure that this is going to some persistent volume. So some persistent mounted storage. Otherwise, if it's container local storage you would lose it after the container crashes. So here, if we're going to look at that if I am now forcefully quitting that application we should see that we have an offsets file right here. It's that file, it's a binary file. And this is how it then determines where to continue when the BZM server app is started up again. So with that, let me just make again one or two changes while we are having this downtime. So two updates and one insert is what we are running here. So three statements in total, meaning we get the following table state, two new records and some updates. Now, again, we don't see this right now because remember the BZM server is currently down I forcefully quit it. So again, we just try to understand now if we just spin it up again, how it then continues to capture from where it left off. And again here, ideally, and this is it. We see those three events, the two updates. I think what was it? Two updates and one insertion exactly what we see right here. Update one from George to Georgia, update one from Sally back to Sally and another new customer inserted being John Doe here. That concludes the third demo, the second demo, I'm sorry. And this brings us to the third demo already. And remember when I introduced the BZM in one of the slides at the beginning of the deck, I mentioned there is a UI. And this is the final part that I want to show you. Again, this will bring us back to this Kafka Connect context that we were in for the first demo. Let me just shut everything down so that we have again a clean environment to run demo three. Let me just stop a Postgres container right here. And once that's done, let me switch over to the BZM UI part. The tutorial explains that similar to the other two. So we try to understand this web application that we can use to get an understanding about the running connectors that helps us to spin up new connectors for Kafka Connect deployment mode because again, without the need to understand the REST API specifics, remember all the endpoints or having the need to manually assemble the JSON configuration, there is a nice UI that helps us. And I think it's pretty helpful to use that UI in particular when you are new to the project because it lets you quickly experiment with it more or less you can immediately get going without learning about the specifics of the underlying REST APIs. So for that, we have again some infrastructure components and here what we offer in the tutorial is a compose file that allows you to spin up the whole infrastructure for that demo within with a single command. So Popman or a Popman compose or Docker compose is what we can use for that. Let me do that here. Popman compose up, so I'm using Popman but again, Docker is of course fine. And let's give it some time to spin up all those. I think it should be five, something like five containers that need verify that real quick. So we have MySQL, we have Kafka ZooKeeper, Kafka Connect and the Debezium UI. So yes, that looks good. With that, I should be able to go to my browser and go to localhost 8080. Let's give it some time. Now this is the Debezium Web UI. It will try to connect to the Kafka Connect REST API behind the scenes to understand what connectors are running there. And this is okay, there's currently no connectors running that makes sense because we are basically just brought up all the infrastructure from scratch. Let's create a connector. And this is now where this nice UI helps you to get going quickly. In contrast to earlier, we use MySQL here instead of Postgres, so let's try that with MySQL. Also a nice property is the UI will understand what connector plugins are currently loaded in your Kafka Connect infrastructure. So there's a specific REST API endpoint to figure out what your Kafka Connect worker nodes are configured with and we see those five connectors have been loaded during bootstrapping the environment. And we can use those. Now let's say next, and then we enter some, just enter the configuration properties as we go along. We give it some host name where to reach MySQL. We specify some credentials, how to access that MySQL database. We give it an address how to reach Kafka. Again, we are back in Kafka context, meaning the change events are sent to Kafka topics. We specify a schema, history, topic name, and that's about it for the basic settings. We can validate those to understand if all of that is correct. We can then say next, and then we can specify, okay, a whole MySQL database server might have many databases with lots of tables. So we can say we are only interested in a specific database and only in some of the tables. So the addresses table and the customers table here. So two tables is what we want to offer possible six ones that are found in the database. We apply that, we see, we narrowed it down to the two tables that I just specified right here. Then we could come up with column filters and a lot more configuration options that are described in the Debezium documentation with great details. So feel free to check out the documentation if you want to make changes to those change events on the fly, transformations, or if you wanna route those with certain SMTs and things like that, you can do that here. I am fine with those basic settings for that simple demo, meaning I can just say review and refresh. And what you then see is the automatically assembled JSON configuration of what we just entered in the form visa. I say finish and ideally after some time, let's hope for the best, it talks to the REST API behind the scenes using that configuration and it spins up a new MySQL source connector. And indeed we see now that this connector is running. And also that's a nice kind of tabular dashboard if you want to showing you which connectors are currently running in your Kafka Connect cluster. Imagine if you have tens or dozens of connectors here that you can search and filter and see if they are running, if they have problems and things like that. You can see that, you can pause them, resume them, completely restart them. Again, look at the tabular or JSON configuration of the specific connector in question. And that's just quite handy to have because Kafka Connect itself doesn't ship any web-based UI. So there is definitely a need to have something and the Debezion project also provides us with this nice UI. So that already concludes essentially the third demo that I wanted to show you in that session about Debezion. Let me go back to the slides just once more to quickly wrap it up. And then I will look into the chat to address potential questions that some of you might have. Let me go to the slides because we have looked at the, basically just at the technical level and try to understand how to make use of Debezion, how to start capturing changes, how to communicate and propagate them either to Kafka or elsewhere. But the question now is what type of use cases can you implement on top of that? And the answer to that is many, lots of possible use cases that you can implement using change data capture with Debezion. And this is a nice, short and compact tweet by the former project lead of Debezion and a good friend of mine, Gunnar Morling, who summarizes a lot of typical use cases that you might want to address when thinking about applying change data capture. So one of that is just to replicate data. So you can use it to replicate data using this mechanism between different types of data source. So maybe you want to replicate data from Postgres to MongoDB. At totally different cases, maybe you want to update or like basically update caching infrastructure. So let's say you have Redis or Infinispan and change data capture with Debezion solves two of the fundamental challenges that you have when you use caches, namely the warmup of this cache and then after that the cache invalidation. So the snapshotting phase, you can see that as a way to directly warm up your cache based on existing data in a data store. And then all those change events that are continuously streamed out of the database can then be used to update and invalidate this cache. So if a deletion happens, this item will be removed from the cache. If an update happens, you get the latest information in that cache and you get the idea. Another use case is to use Debezion and change data capture to synchronize data between microservices. Those microservices are supposed to have their own backing data source. And the question is sometimes you need to communicate changes that are happening in one microservices in one microservice to another one. And again, there is various different ways to change data capture being one way to try to address that on the persistence level and a bunch of other. So I think once you start to embrace this idea of being able to hook into any changes that are happening in the database, I think you have lots of ideas in your specific landscape and environment that you can use a change data capture and Debezion for. That concludes the session. Just a few further reading materials. And again, the slides are there. I share them already. You can go there and read about how to use change data capture to modernize a stack and also an interesting use case featuring some health related in the health related domain also featuring some kind of machine learning scenarios can be something that you want to read up right here. And finally, tons of more resources for Red Hat developer that you can find on that link. Free eBooks, lots of more tutorials. So all you need is to register an account for free and then you can consume tons of great Red Hat developer resources around many of those topics that you see depicted based on that eBooks that I shared right here on the screenshot of the page there. That concludes the session. I thank you a lot for taking the time tuning in today and listening to that stream. I hope it was interesting. I hope you learned something new about change data capture and this really, really brilliant project called Debezion. I'm now going to take a look at the chat and try to address the questions that you might have had during the session. Let me do that right now and make sure that I don't miss any. So, first of all, it says here, Andrea Claro. Thanks Andrea for joining. What happens when the followed tables are too big to make snapshots on? Well, so let me try to understand the question. You are mentioning you have very large tables and it might take then a very long time to do snapshots and that's true. I can't, however, imagine a scenario where a table is too big to even try to make a snapshot. I mean, yes, it may take a long time but related to snapshots there, the project made some great additions. So first of all, there is this notion of incremental snapshotting. So Debezion is able to make a snapshot incrementally while still being able to incorporate changes. So that is based on an algorithm that was published in the wild and they implemented that. Meaning earlier and in all the days of Debezion, this was indeed the problem because snapshots needed to run before you can make any progress. And if this lasted a very long time, many hours, maybe even a day or more, or maybe there was an error in between that snapshot, you basically needed to restart the whole snapshot again. And this was indeed the problem in the old days of Debezion. These days, it's not so much of an issue anymore thanks to those recent advances related to incremental snapshotting. There is a blog post about it and further material, how this works under the covers, but I hope that at least addresses the question. Another thing is you don't have to use a snapshot. A snapshot is not always needed. Maybe you say, hey, the use case I have in mind is totally fine without making a snapshot at all. So you don't have to do it. You can say, no, don't do a snapshot. I just want to hook into the changes of a specific table as of right now and do a good to go. So then snapshotting isn't done at all. So also that is something that you can simply configure if that fits your use case. Another question, thanks, Andrea. So there is, it says about the Debezion scalability. What do you suggest? Is it better to assign each task table connection to a single different worker? Or is it better to assign every tasks to only one? So that's indeed a great question. So the way this is usually done is that Debezion, so one connector that points to a single database server is usually just running one task. And this is important mainly because of the fact that you want to be able to keep those change events that are read and published in order. And so the standard setup for most databases that Debezion supports is to use a single task for a single database. That means what you can do if you want to capture individual tables, you can just spin up multiple connectors each having one task and maybe each referring to a different schema or catalog in the database and things like that. So yes, ideally you spin, if you want to scale it out to capture in parallel, you will do that by means of running multiple connectors and each running a single task. For some, I think if I'm not mistaken for the SQL Server connector, there have been some recent advances or maybe not so recent, but at one point they introduced the way to capture changes using multiple tasks with one and the same connector instance. But again, it depends a bit if the connector and if the specific Debezion connector for a database supports that. As far as I know, definitely most of them you would run them with a single task and there are some exceptions where you can run multiple tasks. Let me address another question. Is it possible to chain more than one task to a single table or there should be log readability problems? Well, you can, so again, I think that's a good point. You need to be aware that what I mentioned at the beginning a bit the fact that those transaction logs and the way you can make or Debezion can make use of them varies a bit between the different databases. And that being said in Postgres for instance, there is this notion of having replication slots. And as long as you make sure that those connectors are using individual replication slots, I don't think it should give you trouble if you try to do what we were suggesting in your question here, provided that I understood it properly. So I think the answer here is it depends and you need to make sure and verify in the documentation. I never honestly tried to do that capture with multiple connectors or tasks, the same table from the same database. I haven't done that myself yet. So I can't guarantee you that it will work, but I would assume that for some databases where you can configure like separate kind of, again, this notion of having replication logs, these replication slots in Postgres for instance, this should work, I guess. I'm not 100% sure, but I'm pretty sure that the documentation gives you some hints in that direction or you are always welcome to join either the mailing list or try to reach out with your question directly to one of the engineers or the contributors to the project directly. I think that's about it. We are more or less also on top of the hour running out of time. I thank you very much for tuning in. That's about it. It was a lot of fun to walk you through that session. I hope again, it was interesting. You learned something new. You can take away something from that and again, feel free to reach out to me to the project maintainers and visit some of the materials that I shared in the slides or that you find on the Debesion block, which is also a great resource to stay up to date on any recent additions and advances for that project. With that, I thank you again for joining us today. Watch out for upcoming definition deep dives. There is at least a couple coming up towards the end of the year. So still in December, deck talks and maybe even another deep dive, check that out and thank you so much again for joining us today. Have a great rest of the day and all the best with your CDC efforts. Bye.