 Yeah. Thank you. It's good to be here. So, yeah, you can think of Simple. As far as I'm aware, we are the first bank born in the cloud. The company was founded in 2009, and we have been 100 percent on AWS from the beginning. So, I have some bias there. Like our thinking is totally about how to operate in AWS. Yeah, Simple is, it's a consumer bank. It's interesting to see what Simple does compared to what we offload. So, we are not technically ourselves a bank. We're actually owned by VBVA, a large European bank. We offload a lot of that. The actual customer accounts are held by a separate bank. But as a simple customer, all the branding, it looks like Simple is your bank. We're the ones that you interact with in terms of customer support. And it lets us be really the best user experience for banking that we can be without having to worry about all of the details of what a bank needs to do. But we still, we end up having transaction data in our infrastructure. We're subject to PCI compliance and a lot of the concerns around that. So, I work on the data platform at Simple. That group is responsible for both operational and analytical data stores. So, Postgres, Kafka, Amazon Redshift is where we do a lot of our analytics. I'm hoping that not all of you are experts in logical decoding. I'm hoping that not all of you are experts in Kafka. And this will be a chance for you to learn some basics about what those things are as we go through this. Before we launch into specifics of what's going on, I wanna give you a broad overview of what I'm gonna talk about. We're gonna follow the journey of a single transaction as it flows through our system, hits different things, see some of the use cases for this pipeline as well as what the pipeline itself does. So, Simple is based in Portland, Oregon. A lot of our engineering team is remote. I personally live in Columbus, Ohio. But I get out to Portland every once in a while. So, say you visit Bioway's Cafe in the lovely Pearl District of Portland. You have a nice breakfast and you pay with your simple card. When that gets swiped, there's this whole arcane process that goes through. There's a payment processor involved. It bounces around to a couple of data centers. Go takes a little visit to Mexico. Comes back to BBVA's AWS account. And we end up hearing about the transaction over a message queue in AWS. We are a service-oriented architecture. So you'll be seeing a lot of this general pattern of a compute instance that's attached to a Postgres instance. And we have dozens of these in our infrastructure. So over that message queue, that transaction finally makes it to our transaction accounting service. We are so excited to have that transaction that we celebrate by writing a row to Postgres. So it's persisted, fantastic, in our infrastructure. But it doesn't end there. There's more fun to be had because we have kind of a cooperating service to this, which we call PG Kafka, that is listening on a logical replication slot in Postgres and getting that insert, a representation of that insert that happened in the database. And then it's publishing that to Kafka. All you need to know about Kafka right now is that it's kind of this message queue-ish data store. And things are organized into units called topics. So it's being published to this topic in Postgres. Or in Kafka. Another service is consuming from that same topic. And this is our demultiplexing service, reads in those records. All it does is it publishes back out to some other Kafka topics about specific tables that were happening in that changes to specific tables happening in Postgres. A third service is consuming from one of those output topics. This is our user activity service. When you open up the simple app, it tells you about your recent transactions that's being powered by this user activity service. That service accepts the transaction. It does some cleaning on it. It saves it to its own database. That database, that instance of Postgres, is also hooked up to a PG Kafka service that's listening on a logical replication slot, pulling that change over publishing it to Kafka on a separate topic. While all that is going on, we also have our analytics pipeline. Consuming from those Kafka topics, it's grouping those up by table, the changes up by table, batching them to S3, which is Amazon's object store for kind of long term storage. We get a full audit log there of what's happening in the database. And then we also persist those changes into Amazon Redshift. And that's where most of our analytics happens. So that's the high level view. There's a lot of stuff going on. The rest of this talk is going to be drilling down into the bits and pieces of that to focus on what's going on. So the very core of this is this change data capture pipeline, which is about how do we get those changes out of Postgres and into Kafka. Before I talk about the specifics of logical decoding, the mechanism Postgres for getting those changes out, I want to talk a little bit about Postgres, what's going on under the hood, and a little bit about Kafka. Here's a little excerpt about Wall, the right-of-head log from Postgres Docs. I'll let you read that for a moment. So the right-of-head log is central to Postgres durability. So if something terrible happens to your Postgres instance, while it's updating some tables, it's okay. Those tables might have done some weird corrupted state. But when Postgres starts back up, it realizes what happened. It goes back in the wall, reads through to everything that it's committed, and it's able to get those tables back to a consistent state. So important thing for the operation of Postgres, and it kind of looks like this. So the wall ends up being these files on disk. It'll kind of rotate through segments. But some elements of this is there's a defined order here. There's no ambiguity about the order in which things happened. It's immutable. So if one of these records is about an insert into your transactions table, and you all later change, you update that same row, that's not mutating anything in here. Instead, there's just a new record being appended to the end of the log that says, hey, this row got updated. So you could reconstruct the entire state of the database by replaying the wall from the beginning of time, or by starting from some snapshot and then reading the section of the wall after that snapshot was taken. And it's durable. This is ending up on disk. It's something that's going to persist through restarts of the application. So it's ordered, it's immutable, it's durable. It turns out that this concept of a log or a commit log is not unique to Postgres. It's not unique to write-ahead logging, which is a system that's used by other databases as well. It's just a useful abstraction of a type of data that you can store. And this is also central to Kafka. If you looked at the Kafka documentation about a year ago, what it would have said right at the top is Kafka, a distributed commit log. So Kafka is a data store that puts this abstraction of a commit log front and center. In Kafka jargon, an individual commit log is called a topic partition, or more simply just a partition. You interact with it via producer clients and consumer clients. So a producer connects to Kafka, sends messages to a particular topic partition, and Kafka will just always append those messages onto the end of the log. Again, this is immutable. Once a record is written, it sticks around. There are offsets that are sequential for all this. So when record 12 gets written, the next record that gets written is going to have offset 13 associated with it. When you consume from Kafka, there is no concept of querying. In Kafka, you don't say, hey, I want all of the records that have such and such an attribute. Instead, you connect from some certain offset, and then you get all of the records after that. If you want to do filtering, that's your applications problem. It's going to get all of that data back, and you can filter it once you've consumed it. A consumer can attach to the end of a partition and only get the new things that are being written as they get written, or it can decide to attach to the beginning, or to some particular offset in between. And in practice, this is how applications maintain their state. Occasionally, they commit their offsets to Kafka. And then if the application crashes and needs to connect again, Kafka knows the last known state that you read from was offset 13. I'm going to start sending you records beginning with offset 13. So topic partition is a single log in Kafka. In practice, you usually don't interact with topic partitions. You interact with a slight abstraction over that, which is called a topic. So a topic is a collection of partitions. Again, Kafka is a distributed commit log. So you have a Kafka cluster containing many brokers, and a topic will have partitions spread across those brokers. So if you have some new data source you want to write, you say, I want to create a topic with such and such a name. I want to have three partitions. Kafka will spread those partitions across the brokers, and then you start writing to it. By default, Kafka will just round robin those rights to different partitions. But you also have the option of setting a key on your Kafka messages. And Kafka will take that key, hash it, and use that to determine what partition it goes to. So you have an option of control. If you set that key to be the user ID, for example. And you send a record for ALIS that ends up on partition zero. You can be guaranteed that every subsequent record you send for user ALIS will end up on that same partition, and that can be useful for applications. So that was a little bit about the wall inside Postgres, the idea that that is a commit log. We talked about Kafka and basically what Kafka is. It's this distributed commit log. And this change data capture pipeline is about how do we get that log of changes out of Postgres and into Kafka. And that's a new feature in Postgres since 9.4 called logical decoding. So I'll give you two more excerpts from Postgres documentation to read through. One is about logical decoding itself. So this is about extracting changes out of Postgres. And there's this bit at the end of about putting it into some other format that you can understand besides this binary format that is the wall. The way that that decoding into some other format happens is through an output plugin. This is the last time you will read Postgres documentation in this talk. So an output plugin is a Postgres extension. So it's something that you install in your Postgres instance. And then when an application connects to a replication slot, it can consume whatever output format you've decided to find in your output plugin. So logical replication at simple consists of two components. The first is an output plugin written in C. It's a fork of an open source project. The second part is the application that is interacting with the replication slot in Postgres and then producing those messages to Kafka. So I want to give you kind of a hands-on look at what this looks like. So I'm going to show you an example of an insert, an update, and a delete. I'll show you what does that insert look like in Postgres and then what is the message that actually gets onto Kafka in our pipeline. So let's imagine a very simple table, transactions. So this is happening in the Postgres instance associated with our transaction processing service. The table is called transactions has these two fields. The transaction ID is its primary key and then a dollar amount. So we're going to do an insert into this table. Up on the top in blue is what's happening in Postgres. What's in white on the bottom is what ends up in Kafka. So we're inserting a single row into our transactions table. And we get this big verbose blob of JSON. Let me explain briefly what's in there. The first five entries in that are all what I'll call metadata about the context of what was happening in Postgres. The very first thing is LSN. That's the log sequence number. That is an offset in the wall as to where this change was stored. That seems kind of superfluous here. It becomes very important later on when we talk about analytics and what we do in systems further down the pipeline. The second thing here is XID. That's the transaction ID from Postgres. So all of the changes that happen within the same transaction will have the same transaction ID. This is an insert into the transactions table in the public schema, and the time stamp is the transaction commit time stamp from the Postgres perspective. Finally, the last three entries there are describing the actual data associated with this row. So what are the names of the columns? What are the types of those columns? And what are the values? Again, a pretty verbose format, but this gives us a lot of flexibility down the line. Next slide is going to be an update. So this is very similar to the previous slide. I highlighted in purple the things that are different from the previous slide. So we're updating that row we just inserted, changing the dollar amount associated with it. You can see the LSN advanced, the XID advanced, because this is happening in a separate transaction. This is an update rather than an insert. The time stamp has advanced, and we have the new value listed in values. The other really important thing is that this last entry is new, this identity entry. What is that? This gets to an important concept in logical decoding, which is that every table is assigned a replica identity, which you can change. By default, it is the primary key of the table, and that's what you see here. So that identity is telling you about the previous version of this row. What are the values associated with it, and it's just giving you the information about the primary key. So that's how you know what you're updating is that blob that's in the identity field. You can change the replica identity to be full, in which case it would tell you all of the values for all of the columns. You can turn it off, you can set it to be an index, but usually primary key is the most useful, and that's what we're using. Finally, I'll show you a delete. Again, highlighted in purple are the things that are different. You can see that the columns, types, and values have disappeared completely. There's no new data associated with this. We're deleting data, but we still have that identity field so we know what is the thing that we're deleting. And that is the pipeline. So we end up in Kafka having a topic with each insert, update, and delete as a separate message. And that becomes a rich source of information that other systems can use. So the rest of this talk will be all about use cases for that data. First thing I'm going to touch on is very brief. This allows us a mechanism for doing asynchronous messaging between services. So there's nothing particularly special about Kafka in this use case. You could use any message queue in here. Simple has historically used RabbitMQ as a message broker between services. But the way that we were using RabbitMQ has some problems, which I will talk about right now. So this image comes from a blog post by Martin Kleppmann. I will talk more about this blog post later. It's about bottled water, another Postgres logical decoding to Kafka pipeline. But he talks about the dual rate problem, which is it's a bad thing to have your application do this. And this is what we were doing at Simple a lot, is that your service has some data that it's processing. It wants to persist that to the database. And at the same time, it wants to update some other things. So we would often have code where we open a database transaction, we insert some information, and then very last thing before we close the database transaction is that we send a message to RabbitMQ so that other services could know about it and take action on it. It's a mess. There are all sorts of ways that can go wrong. You're relying on your application developer every time to get that right. There are always going to be cases in which you don't close that transaction correctly. You might have a message that ends up in RabbitMQ and then there's an exception and it doesn't actually get into the database and you end up with skew between what's in these various data stores and what happens in your message queue. So that's bad. That's one thing that's great about this logical decoding pipeline is that using that pipeline, now what our services do is they just write to the database. They just write to Postgres and their job is done. They don't have to worry about separately notifying other services. They don't have to worry about separately interacting with a message queue. They just write to the database and now it's the job of a separate application to handle correctly getting those changes out of Postgres and putting them into Kafka. So what's happening right here is that some other service is reacting to what happened in that transaction service just by that transaction service writing to the database and then letting the rest of the pipeline handle it. So that's powerful and that puts us in a better spot even without any of the other stuff, any of the other use cases that I'm talking about here. This is a better situation than our traditional talking to message queues. Next I'm going to focus on analytics. This is where I have the most expertise and I've spent the most time. I have spent an awful lot of time dealing with Redshift and getting it to where it is today in terms of how simple use is it. So a couple of notes on, there's a lot of value we get just from putting stuff into S3 here. So I want to remind you that this change data capture pipeline, it's a full audit of what's going on in database tables and in many cases what's coming out of this is information that may no longer exist in Postgres. So if you have tables that you're mutating, you're doing updates and deletes, Postgres is no longer the source of truth for that. The wall eventually rolls off and you'd have to go back to some snapshot or whatever to get at that data. It's incredibly freeing to know that that whole history of changes that have happened in the database, those end up in S3 and we can use that as the source of truth for what happened in the database now. So we used to often have a cycle where risk and compliance would realize, hey, to comply with whatever rule, we actually need to keep a whole history of what's happening in this particular database table. We'd have to make a request to backend developers. They'd have to add a trigger and some history table and now we're bloating our Postgres instance with this information that the service is never going to use, not a great situation. This frees us from that completely. We have this pipeline set up so that we can get those database changes into S3. There are also a lot of tools these days that can talk to S3 directly and do analytics. If you've heard of Presto, Presto is a way of putting a SQL interface on top of files like this. It can interact with S3. Amazon has actually wrapped that now in a product they call Athena. So there's nothing for you to manage. S3 is just this object store that is infinitely scalable and you just throw information there. You forget about it. If you want to do analytics on it at some point, you can go to the AWS console and do some queries on it. So that's cool. We're only kind of dipping our toes into that. It's largely we're just archiving that data off and it feels good to know that we have some place we can go back and see that audit history. All the action happens in Redshift. So we send all of this information, we batch it up, we send it to S3, we also put it into Redshift. And I'm going to spend a good chunk of time here talking about what it even is Redshift and what this PG Kafka pipeline data looks like in Redshift. So first of all, what is Redshift? It builds itself as a data warehouse. It shares history with Postgres. So the history as I understand it is there was a company called Park Cell. They forked Postgres 802. They did some ungodly things to it. Turned it into a column or distributed data store. But it still talks like Postgres. And then Park Cell was bought by Amazon, rebranded as Redshift, integrated with the rest of Amazon Web Services. So you can connect to Redshift using PSUql or Postgres drivers, which is kind of convenient if you're already used to Postgres. You're connecting to a leader node and you see all these tables. But the evil truth is that all of the data in those tables is actually distributed across all of these compute nodes. So a very similar concept is involved. I talked about in Kafka. You have this log that's distributed across all these partitions and you can set a key on the message that determines where data goes. You get that same opportunity in Redshift. When you create a table in Redshift, you choose a column to be the distribution key. And so again, if you choose that column to be the user ID, you've been guaranteed that every record for Alice is going to end up on the same compute node. And then you can take advantage of that data locality. When you're doing joins between tables, hopefully they're distributed on the same key. And if that's true, then joins happen locally on the compute node. And then you're generally doing these big aggregate queries. Each node can do its own thing and then just push up results to the leader node. And the leader node just has to stitch those together. So Redshift is distributed. It's also columnar, so instead of each row being stored together, each row gets split up so that columns are stored together and they're compressed. And generally you're doing these huge aggregate queries where you have to pull a bunch of information. Typically you may only be looking at two or three columns out of some big table with 50 columns. So those are ways that it's operationally very different from Postgres and optimized for doing aggregates for looking over huge amounts of data. Just for context, simple, we have 48 nodes in our Redshift cluster. It's about five terabytes of data. Redshift can scale up to petabytes if you have need of such a thing. So let's talk about what it looks like when we replicate our PG Kafka data into Redshift. First of all, this is a relational database system you need to have a table to put information into. This is what the table in Redshift associated with our back-end transactions table in Postgres looks like. The columns here, the first four are Postgres transaction data. And this is a chance for you to earn extra credit by being a very attentive listener. I just said there are four metadata columns here. If you remember, in that Walda JSON format, there were five metadata columns. Where did that fifth metadata column go? It's the name of the table, which is actually encoded in the name of the table in Redshift. So the name of the table here is PG Kafka underscore name of the source service underscore name of the source table. So we have our metadata fields. We have an ingestion timestamp that we add that service that's pulling the stuff out of Kafka and batching it up. It adds its own timestamp, which is useful for debugging. And then we finally have those two columns that are actually the data from the service that we care about. You'll notice some embellishments here compared to what a create table in Postgres would look like. At the bottom you see the disk key and the sort key. The disk key is what I talked about, the distribution key we're saying here. For a given transaction ID, all of the rows, all of the updates for a given transaction ID will end up on the same compute node. And then a sort key is kind of a compromise for indexes. You don't get indexes in Redshift. Instead, you get to just choose one sort key, which determines how the data is stored on disk. So I just want to harp again on the fact that this is a full audit log of what happened in that table. That opens up a whole category of analytics use cases that you wouldn't be able to do just based on the current state. So there are things you can do with this that you wouldn't be able to do if you logged into Postgres itself and looked at the back-end data because you'd be missing all of those changes. Okay, I promised you that there was a reason we have all this metadata. And PGLSN in particular, it is a unique ID for every change that happens, which means we can deduplicate based on it. So this pipeline, we can't guarantee exactly once delivery. In general, we have at least once delivery, and there are duplicates that don't open the system. So by the time data gets to Redshift, it might have been written several times. And it's nice to be able to get rid of that. So we have a nightly process that partitions by this log sequence number and is able to get rid of duplicate rows based on that. Similarly, many analyses don't care about this whole change history. They just want to know what is the current state. And we're able to create a current view on top of this audit log. Again, by doing this window function partitioning by, in this case, the primary key, which is the transaction ID. And then we only keep the most recent row for a given transaction ID. We get rid of deletes. And this view tells you what's the current state of that table in the back-end. So that was what things look like in Redshift. We do a lot of analyses based on that. I want to note that this is a very raw data kind of thing. In order to do analytics in Redshift with this system, you need to know all the terrible details of what the schemas look like in our back-end Postgres databases. Those schemas are optimized for the operation of the services. They are not optimized for analytics and understanding what's going on. So there's a lot of duct tape that you end up having to put over this to make it actually accessible for analytics. We run a bunch of nightly jobs where we create tables that are derived data based on what's coming in through Peach of Kafka. But we're doing loads throughout the day. In general, there's about a 15-minute latency between something happening in Postgres and that record getting into Redshift during the day. All right. Very last topic I want to talk about is stream processing. We're running low on time, so I'll try to keep this brief. If you don't know much about Kafka but you heard about Kafka and you're like, what is this thing? Why do people care about it? It sounds like it's just a message queue. This is where Kafka really distinguishes itself from traditional message queues. A couple of things that make Kafka different. First of all, there's the ordering. So things like RabbitMQ that use the AMQP protocol. There is no guarantee about the order you get messages back. Kafka is able to guarantee ordering in this abstraction of topics, this commit log thing. That's important. Also, traditional message queues do acknowledgments per message. That requires a lot of coordination between the clients and the server and it limits throughput. Kafka gets that idea of acknowledging messages and just uses that concept of the offset where you are in the log as a much lighter weight version of doing acknowledgments. That's a big piece of what lets Kafka scale to a much higher throughput and larger data, more brokers than a traditional message queue would be able to do. It has some operational characteristics that are nice and there's an additional feature which is cool which I will try to motivate right here. Bottled Water is this other logical replication to Kafka pipeline. Martin Kleppmann writes about he was funded by Confluent. Confluent are the main maintainers of Kafka, kind of a consultancy built by the original writers of Kafka. This blog post is great, tells you a lot more about this. I encourage you to read this blog post. It's very similar. Bottled Water, this is what things look like. It's a simpler message format and in particular it takes advantage of this idea of a key on Kafka messages to partition stuff. The key here is just representing the primary key of the table. Here's an insert, an update has that same primary key as the message key, and then a delete is again that same primary key and then it's an all. Getting back to the idea of a talk of partition, I didn't really talk about retention. By default, Kafka topics have seven days of retention. Old records will roll off after those seven days. You can configure that per topic and you can also configure compaction rather than a time-based retention policy. Compaction is this process where a topic is only required to keep the most recent message associated with a given key. This example, key K1 is occurring at offsets 0, 2 and 3 after the compaction process runs. It throws out offsets 0 and 2 because it knows that it has a newer message with that same key. This is perfect for modeling a mutable database table, the entire current state of a table in a Kafka topic. This is something that you definitely cannot do in a traditional message queue. Kafka is a durable data store. In this case, when you have log compaction turned on for a topic, it sticks around forever. It's only cleaning up old rows. If you have some message with a key and no newer message comes in with that same key, that message will never get evicted from Kafka. That actually allows you to do a lot of cool stream processing things. It allows you to do joins between a stream and a compacted topic is really modeling a table. You can do joins between streams and tables. Kafka now has a library called Kafka Streams, which is kind of an abstraction over its consumers and producers, that gives you some nice semantics for doing those sorts of things of joins between tables and streams all modeled as Kafka topics. This is something we're just scratching the surface of. There's a ton that you can do with this. Stream processing is a huge topic in and of itself. Some closing thoughts. Things are not all sunshine and rainbows with any piece of technology. There are a couple of lingering issues we have. Burst of activity, due to large transaction commits. Logical replication does the right thing in that it won't emit records until a transaction actually commits and that data is in Postgres. That means if your application is poorly behaved and keeps a transaction open for several hours and does millions of records in that time, you're going to get this huge flood of information into your replication slot when that ends. We weren't anticipating that. We've had to adapt to it. Another thing is performance of our current views in Redshift. As tables have gotten very large, this doesn't perform well at all because it has to scan through the entire table to figure out what the current state for a given row is. We may be revisiting that and instead of when we're loading data in, we may materialize the current state of tables so that those are available for analysis and more performance. Finally, schema management and propagation. Redshift has a schema. It's a rigid schema. It's up to you to make sure that stays in sync with what's happening in Postgres. We initially didn't really have a pipeline to do that in an automated way. We were losing data. We were having issues. We built a pipeline to try to grab those schema changes as they happen. It's not perfect. There are some issues with that. It's great if you would think about this before you implement the pipeline. Some future enhancements just quickly to talk about. We would love to be able to verify intent completeness that what we get in Redshift exactly matches what is in Postgres. We don't have a full view of that right now. It's great that the log sequence number is a unique identifier and we can deduplicate based on it, but there's no way we have it detecting that some record was lost in that pipeline. In general, we're just relying on the guarantees of we think we're doing at least once delivery everyone in this pipeline, but we can't verify that at this point. Then we're looking at rewriting PG Kafka, which is currently in C in Scala. Why would you do such a thing? Most of our infrastructure is written in Scala, so it's been hard to maintain this weird one-off project that's written in C, and this has been enabled because I'm in the newest versions of the Postgres driver. There's actually an API for the replication protocol. Thank you for listening to me rant. A couple of minutes for questions. Before I take questions, I just want to mention if you look on Twitter, my most recent tweet is a copy of these slides, and also we have a blog post that I was hoping to have published before today. It is not published yet, but I linked it to a draft of the blog post, which gives you the same content in words that will hopefully be a better reference than disembodied slides. Questions? Regarding the retention, have you yet had a situation where one of the services fell too far behind and you had to recover, or where you wanted to spin up a new service that needed data older than older in one of the topics or the wall? So we have not had a situation where something fell so far behind that we had already gotten rid of stuff. In that 7-day retention period, that's been perfectly sufficient. We don't have a great story, particularly for analytics. I want to have a better story for doing a batch job that looks at is able to stitch together what's in S3, that persisted whole history of what's happened in the log, stitch that together with what's currently in Kafka and be able to get you up to that same point. We haven't really tackled that problem yet. We don't have audit history until we actually turn on this pipeline and start persisting that to S3. Question? On the left side of your slide there, all your transaction jumps up a little bit. Postgres is very kind to you in that situation. Postgres will hold on to wall segments until all replication clients have consumed them. Postgres is keeping track of, hey, I know this client is connected to this slot. It hasn't read stuff. It will make sure to hold on to wall segments until that application comes back online. What was that? For better or worse. You can end up in a situation where you're filling up disk because of that. If that PG Kafka goes down, we just aren't going to have records flowing into Kafka until that service gets restored. We do monitoring around whether things are falling behind. You're able to replicate these groupings or units? Which part of it? Transaction service, transaction postgres. Set a whole another try and it goes down. If we needed to hire a throughput for getting data into Kafka, we're saying. We're a long way from hitting that limit. So it is just a single instance that's pulling stuff off of Postgres, off of that replication slot. I actually don't know what it looks like if you can have a replication slot going to multiple consumers in order to improve throughput, if that's the question we're asking. No, it would be you would add another transaction service and double the cost when you had that. We aren't set up to do that at this point, but it's something that we do. It sort of relates to that question. How would you handle the failure of the Kafka cluster? Several things to note, Kafka. If you have your cluster, Kafka has a concept of replicating. If the entire cluster goes away, we don't have a great HA story for that. You can mirror Kafka clusters like to get some other data center or whatever. We aren't doing that at this point. So if Kafka went away completely, if we lost that entire data center, we don't really have a story, we would have to wait until that data center comes back online or we would, if we are taking backups of what's in Kafka, we could potentially... What other questions are you using? We are not at this point. That is something we are investigating this year. This is Postgres on EC2. Right now we manage that platform. We would love to not manage that platform. No, we don't. That's basically manual. How do you know that? We do not have guarantees around that. We are very early in stages of relying on that. Except for the fact that that same data is getting into Redshift. People are doing analysis all the other day when there are discrepancies. People complain of this terribly. So we do kind of have a good pulse generally on the general reliability of it. When we have missing data in Redshift, there are pretty explainable reasons. And it's usually pretty late in the pipeline that those things are coming up. We've had very good reliability with that central pipeline of getting stuff into Kafka. So it can't direct... Sorry, you're saying I can't consume directly from wall segments? No, I'll try to go there. You could potentially write some application that reads wall files from their original format. But that's... Logical decoding is handling that, getting it into a format that's readable. One last question? Alright, one last question. Why does it say customer stable? It doesn't go anywhere. Yeah, so this is something... We don't actually... This is for the Seiko argument here. This is something we could do. This... I put that up there primarily for talking about a possible use of stream processing. So this is like a heavily normalized schema in the original database. We could use stream processing to denormalize this. So customer's table could here be a compacted topic. So we have the full current state of all of our customers in there. Transactions table is a stream. And then you could do a join between these two to create some output topic. There's a stream of transactions with all the customer information attached in there. And that might be useful. You might want to adjust that into Redshift so that you don't have to do the joins in Redshift. Those are already done in the stream processing layer. That's the kind of thing that you can do. Thank you everyone. Thank you very much.