 All right, good afternoon. Who wants to start early and finish early? All right, this is like the last stretch of the conference, right? Good. So we're going to make it pretty fast. So we'll start off with the introduction. So I'm Henri Venenburg. So this is a talk with Charles Schwab. So I was actually with Charles Schwab about a couple of weeks ago. Jago actually asked me like, hey, when we need to submit this paper, and do you want to co-present? I said, sure. Then I switched companies and now I'll work for Pivotal. So it's still also Schwab. But Jago, you want to introduce yourself? Yeah, he kept calling me Jago for the last 20 years in my career. Everybody's called me Jago, but it is Jason Go. I'm a technical director for Charles Schwab, one of their larger organizations called CTS, Core Technology Solutions. I primarily function as a solutions architect within our client solutions reporting technology, mostly focusing on our Schwab charitable platform. Great. So thanks for coming. Notice is towards the end. So we'll try to make it interesting. Just kind of a quick show of hands. Anybody has seen distributed systems around the conference this week? Yeah. There'll be Cubanetics here, like CF, CF Summit. Anybody talk about data? Any chance? A little bit, right? So we talked about the easy stuff. Let's talk a little bit about the hard stuff. Because this is easy. I see you've pushed, right? You got a new application. Easy, scaled up. But before we start, here's some inspiration. If you haven't looked at these books, I highly recommend them. The one I'll point out specifically is designing data-intensive applications. If you really want to know the fundamentals of distributed systems, specifically around the data, read this book. It is fantastic. It just lays it out very good. So some of the things that we're going to be talking about today is specifically on the data side in distributed systems. I love distributed systems. I've got a background in distributed systems. But we always forget about the data side. So we're going to talk about a couple of different principles and concerns that you have to address in the media world. We're not going to give you prescribed solutions here, specifically because we submitted this for an hour and we got half hour. So I didn't want to compress all of them. But I'm going to give you a couple of things that you need to consider when you design your systems to go through. Because don't just walk into it like, hey, I want to create a distributed system. Active, active, it's all cool, right? There's things you need to make sure you handle in your system. So why don't we get started? So first, definition-wise. This is very high level. We've all seen centralized systems, right? They tend to be like a single database. Everything is all together. And we might even have some microservices around it, but it's still with one database. So they tend to be optimized primarily for central management, central control. Light is all close together. Or even classically, the cost of storage. We wanted to have them together, close affinity. But on the other side, we want to move to a decentralized model. Now, why do we want to move to a decentralized model? Well, we've all talked about scaling things up, right? Creating more resiliency. Those things are all good. But you also have to decentralize the data side of the house, right? And so we'll be talking about some of the properties. But it's really about separating of concerns because those concerns are the ones that address things around consistency and scalability. Now, a couple of key highlights to keep in mind while we go through this. Latency or the speed of light is still a problem, right? Doesn't matter how you spin it, it's still a problem, right? You have multiple data centers you have to deal with. And in the decentralized model, it was actually fairly easy to deal with this, right? You really only had to deal with the latency between the consumer of the data and the data store itself. And distributed, we're kind of pivoting the latency more now because we need to have replication, synchronization, those kind of construct in place. Latency is now in a different. So that's why we have things like eventually consistent. The other part is we're partitioning the data more as well. So we need to have that as a concern when we're doing it. And another key point to really keep in mind, we'll talk about this, is the read and write patterns and access patterns are different in your systems between them, okay? So let me give you a couple of specific concerns you're gonna have to address. First and foremost, in your system, your access patterns are different. If you have one system that has one access pattern, I'd love to talk to you about that, right? But there's not that many. Now there are some edge cases obviously, for example, like in the trading world, you have a lot of read and write concerns that are different, right? You might actually be doing a lot of reading versus actually writing. Or some other systems that are reference data systems, mainly reading. But in general, in a distributed system, your read and write access patterns are different, right? Your volume, the transactions that you're putting are different. The other part is a concern is your functional aspect of it. Functional, we're kind of describing in multiple dimensions, is you have single stack deployments, right? You go from the UI all the way down, or you have some horizontals where you're splitting out the data, the logic and the UI aspect of it. However you spin it, you wanna have some functional aspect of it. And again, the reasons that you're doing these concerns because of complexity, the system becomes complex because if it deals with multiple functions it has to do, all at the same time, it becomes a brittle system. The other one is the storage side. We've been able to do more polyglot type of storage to optimize for the problem that you're trying to solve for. If you're doing basic tabular type of systems, you might still use relationship, or if you need more social network, fraud detection, you might use graph type of storage engines to optimize, right? So it's another concern you have to trade off on. State is becoming more and more important, not just state inside of your systems, but also on how you persist your state. Classically we've done tables, we do updates, right? But more and more we're moving more towards true event store type of mechanisms where you have ledgers and you really are capturing the state transitions, right? So you're pivoting your databases upside down where instead of change data capture, you now that becomes a first class citizen. Like we've known the Kafka as the event stores of the world. Another key concern that you have to address is actually the scaling, right? Different parts of your system scale differently. Like for example in the trading worlds, during market hours, your trade order sites might be scaling differently than your fulfillment site, right? That is all different dimensions that are going on in your system. Last but not least is fault resiliency. There's some great patterns around this area. Everybody heard about circuit breaker, bulkheads, fill in the blanks, right? But how do you make your system more resilient? So those are kind of the key concerns that we'll be talking about. But before we go there, if you have to understand how did we get here, right? What did we do in the past to get our systems to address these specific concerns? And that's kind of led us to certain architecture as well. So with that, Jason is gonna go through the history aspect of it. All right, so when we think about the decentralized system that Anri was showing and we think about the separation of concerns here that it achieves, it might be helpful to actually understand that this is not something that happens overnight as Anri alluded to. So indeed, it is a journey to get there. So and when we finally get there, it may turn out that that journey isn't actually over just yet. So we're going to start this journey with our monolith. It's a system where any or all of your architectural layers are encapsulated within one single component or tier. Here we have it represented as just one large rock. In fact, the word monolith comes from the Greek word monolithos where monos stands for single or one and lithos means stone. Now to tell the story a little bit better, let's imagine that this system is one that manages food items. We have red apples and yellow bananas and some kind of leafy green there. And generally speaking, we can now start to see some of those layers that I was talking about. So at the top here, we have a presentation layer, business layer right in the middle, and then also a data layer. But why would we build a system that looks like this? Well, it's simple. It's simple, right? It's low complexity. It doesn't have a lot of moving parts. All those layers I was talking about is all encapsulated in one single tier and it's really easy to build. But then we got bored. Took our axe and started breaking down this monolith and we ended up with two tiers. Now technically this could be a distributed system, but take a look at this client tier. It's so fat and loaded and thick. We have the presentation layer in there and also our business rules, right? We didn't really like that, so over time we started to transition to thinner clients. So what would have cost us to build a system that looked like this? Well, the network happened, right? I still remember back in, I think it was around 93. I was shopping around with my dad. I was probably 15 or 14. We bought some gateway computers from a garage sale. I had one in my room and I had one downstairs and I still remember running a cat five cable from one to the other. In fact, they have to be a crossover cable if you guys remember that. So that's one reason, right? Another reason that we have is the pricing on those machines. They were highly reduced at the round that time. Do you guys remember owning pinniums or even celerons, right? I still remember owning one of those type of machines. Dating us. Ha, ha, a little bit. Now comparatively though, if you look at the database side, those things ran on much more powerful hardware and it could stand to reason as to why these business rules actually start to migrate down to those database tiers. Just to avoid confusion where I have the cylinders down there, that's just more of a logical representation of the persistence. It's not to suggest that those are individual database, the whole rock is the database. But then we said, you know what, that two-tier system, we can do better than that. Let's do three tiers, right? We decided that as a principle, it was much better to separate those different layers, right? So now we invented this middle tier and took all those business rules and plopped them right on top of there. But here we're still sharing the same storage engine. That's a little less bloated because it doesn't have all those business rules anymore. But why would we do this? Well, again, it's the network, but this time it got better. We got faster networks, higher capacities, lower latencies, and all of a sudden it was okay to slap an HTTP interface on top of those business rules, right, maybe sometimes TCP based ones. So that was becoming more the norm around that time. But also we can start to begin to identify some of those concerns that Henri was talking about in the top right there, right? It was very common that if you had a system that looked like this back then that you actually had a very stateful system. It could have been session state management, it could have been some kind of affinity to your servers, what have you, you kind of had this stateful system. And with that, you kind of lose some of those resiliency patterns that you could have identified that Henri had mentioned. But we did achieve something here. We had a little bit more functional alignment. Not quite there yet, but as we can see, we kind of separated all those different layers in this architecture. So raise your hand if you guys have seen or built something that looks like this, free tier systems, right? Everybody's built them. They were cool. Ha ha. So again, we love our acts. We love breaking things down. And this time we decompose our system around that middle tier. We took all those business rules for our domain and we started partitioning them, right? That made us feel happy. That made us feel pretty cool. And now that we have separated our individual fruits into these different components, we gained a little bit better functional alignment with that. And now for the first time, if we experience any kind of load or pressure on one of those domains there, say for example, the apple, we can just scale that thing up, right? So what would drive us to design a system that looked like this? Well, around that time, thought leaders like Eric Evans suggested that if you built a system using domain driven design, that that's one way to solve a big problem. In other words, if you took a big problem and decompose it into smaller ones, that's one good way to solve it. So has anyone built a system that looked like this one with domain services, right? There's also an M word for it that people love. My privilege? Ha ha. I'm guessing that we're all starting to see a theme, right? Where we keep breaking these things down. And this time we took our axe and we went to town on that middle tier, right? But we didn't stop there. We also took out that bottom tier and started slicing that thing up as well. So we can start looking at more concerns there on the upper right, right? So with this system, we still have the same properties as the last system where we have functional alignment and scalability. But now it's gained better fault resiliency as well, right? The last radius is a little bit smaller if one of those components or nodes were to go down. Also we feel pretty cool about the polyglot aspect of our database tier because now we can make decisions, right? If we decided our app will should be better represented in a relational database, we can do that. Bananas could be object databases or document databases and our leafy green could be a graph if we wanted to. And here we're starting to see evidence of a CQRS-based system as well. But as we design the system that looks like this, we begin to see patterns and differences of how we do reads and writes, specifically around the behavior when you put it unload, right? And the shape of that data that's coming out. So for example, we might see that we have more reads than writes and the manner that we persist that data is different than how we read it out. So again, we break up our system. We just love that X. And this time we broke up that database tier, that bottom tier even more into many more separate storage engines. So looking at this very complicated rock structure over here, we now see evidence of queries and commands and ledgers, AKA event sourcing for the bigger buzzword there and projections of that state. And we may have actually solved one of those previous concerns that we were talking about earlier, namely the read and write concerns and the storage. But as we've progressed through this journey and all the different distributed systems that we've seen, one thing that we've noticed is that we always come up with new concerns every time we try to solve problems, right? Now the concern here is around consistency, something that Henri had alluded to earlier. So we've really reached the end of our journey to getting to the decentralized model, but have we really? Right, it just continues. We've seen this pattern that every time we solve problems, we just generate and introduce new ones. And so acknowledging that, the fact that there's always going to be concerns may be just as important as a journey itself. So with that, I'm gonna turn it back to Henri. Great, thanks, Jason. So as you can notice this as well, the journey has been great, right? You took the rock, you chiseled away, you got some nice horizontals and you addressed certain concerns. Then we reached that maximum of chipping away on it. Then we pivoted it on that, no punting tenant, is two more vertical changes, right? Chipping away in the rock. And then we say, well, it's great. We've now addressed all the concerns and now we have this beautiful micro-lith architecture. And what the challenge is, we've actually introduced one of the biggest problems back in, consistency, we haven't solved consistency. Because fundamentally, and we've done this in the data world for a long time, you have to address consistency, either XA transactions in the past, right, and confirming, but we want to move away from that because that creates skill issues. Which is interesting because if you start thinking about just the replication or creating consistency between the different polyglot storages, you have to still address consistency, right? Meaning that at some point, everything has to see the data. And we solve this by saying it's eventually consistent. But we've actually seen that a lot of projects actually fail to accomplish this. And why do they fail is because they, from a development or engineering perspective, they don't realize the things that are going on or what they have to do for compensation in this area. So this point, from a failure perspective, hopefully you guys remember this, okay? These are essentially the eight fallacies of distributed computing. You should never forget this. Well, I've actually got a plaque hanging up and remind people that you should never forget this. It's beautiful to decompose things, right? But never forget this, right? Because it's always gonna be true, right? So let's kinda walk through an example of what are some areas where you can kinda address this. So you guys alluded, Jacob kind of alluded only to one of the patterns, two key patterns, CQRS and event sourcing. Anybody familiar with these? Anybody implemented successfully? And everybody understood how it works. Too shy. So what I'm gonna do is I'm gonna highlight a couple of different things in here that you need to be aware of, right? Because fundamentally it uses an event sourcing pattern, so you're trying to, similar as databases, use the replicate data, like change data capture, but now you're raising this to a higher level in your architecture, right? You're going from the fundamentals to a higher level of creating actual events or domain events. That's fantastic, right? You got aggregates coming in, I've separated the writes from the reads, right? I make events out of these and then I project it somewhere else, simple enough, right? The problem is a complex system, if you have n number of items that you have to deal with, the complexity goes up, right? If you can also combine the number of interactions that happen, another factor that goes up in your complexity. So actually we've created a more complex system because if you put this, and I love architecture because it's easy, you draw boxes and you don't even walk away, right? But it's the instantiations that happen here. So take a look at the left. You see, if a simple example, and let's take placing orders online for a minute, right? So orders are coming in, you might just store it's very easy in a transactional or ledger type of mechanism, no problem. But now you need to create events as well that then move over somewhere else. So you have n number of events that can then formulate an order status, right? Where's it at? Has it been filled in a marketplace or not? And you now need to create n number of projections. Well, that sounds easy again, but we've forgotten the fundamentals of latency again that I'm gonna talk about here shortly. So what things do you have to keep in mind to make sure that your system stays consistent through all the different projections that you have? So let's break it down. So let's think about consistency this way. So we're trying to have the left and the right being consistent with each other. So what things do I need to address to make that happen? First thing you have to recognize is LT, right? So LT is just the number of how long it takes to replicate the data, right? We've all known this in databases, but you have to now do the same thing in your system. If you use RabbitMQ, right? Or any other queuing systems, there's a lag that happens, right? You need to factor it in because your developers that are writing the right-hand side, they say, well, hey, I never see the data or there's a couple of microseconds delay that creates a weird user experience. Now if you're in Amazon and look at a shopping cart, let's, oh yeah, it shows up in the shopping cart eventually. If you're a trading system and you don't see the order size, that's a little, you know, okay, it's a different. You actually get calls because you're not really listening by taking money. So you need to be aware of that as a developer on the right side that this is happening. Now there's a way you can compensate for these things either through making users aware, like it's eventually consistent, that's the easy one, right? Or you build in mechanisms that allow you to make sure that say, hey, the status that you're reflecting is not as critical. If you need very high consistency, you don't allow the reads to happen, right? And there's different patterns for addressing these kind of things. But this is why a lot of projects actually fail because this particular complexity that you forget about this. Another piece to keep in mind is the throughput or the bandwidth, the number of transactions you're throwing into. Do you think that by going to event sourcing mall we get less events that we need to process? We got more, right? So now you need components that are dealing with this bandwidth of events coming down to Python. You know, function as a service, yes you can be idempotent and helps you in this area scaling things up. That sounds good, right? You think like, hey, just throw more stuff at it. Let the auto-scaler kick in and pick up more. But wait a minute, what's the other thing we have to keep in mind with consistency? Sequential ordering. Darn, I just thought that solved the problem. But ordering is one of the hardest problems in distributed systems, right? You need to make sure that one comes before after. Now have we half systems in the world that can do this? Well yeah, think about TCP IP, right? It solves this problem. There's buffering capabilities. There's partitioning. So there's patterns in this area you can rely on to use. The other thing I want to say about this is when you actually start creating events, another pattern I've seen very often to say, you know what, it's great. If I'm creating a write and at the same time I'll just send out an event, right? And it's all going to be anky-dory. Well, then failures happen in between, right? So you might have done the right, you don't get an event. This is where you want to use systems like Kafka events store that they become first class citizens so that on your right patterns you don't have to be too concerned with the event sourcing part of the house, right? It's all about creating simpler frameworks in this particular area. Another key enabler, so I listed out a couple of enablers to kind of walk through is you have to think about key enablers like single consumers, right? I talked about the sequential ordering, right? You want to create a single area over there. You can still split things up. You might partition by account or whatever it is, but you need to design for this in your architecture. Durable event store, I mentioned it as well. Data partitioning as well is another trick in the tools. You can actually avoid a lot of things by just partitioning your data that this region only deals with these clients, these region only with these clients. So you can avoid some of these constructs. Item point keys, anybody heard of this before? Okay, it's a great tool in the trick because a lot of times in failures what happens with timeouts? What's the first thing that a developer does after you hit a timeout? Retry, so if I create an order to place $1,000 of Berkshire headway and it fails and I place another one, what's gonna happen? I get an order for 2,000 now. If you leak the latest stock, that's a lot of money. So item point keys allows you to kind of indicate that you've seen the request before, right? But you use the same trick with event sourcing that when the events flow through your system you can tag in with an item point key so that if you've seen it before or if your processors are crashing they can restart and rehydrate and see okay, I've seen this event before I can keep moving forward, right? Create that scalability. Last but not least, this is one of the hardest semantics in distributed computing. Anybody implemented exactly one semantics before? Yeah, it was easy, right? It's one of the hardest things to accomplish. So basically it says that you exactly only once deliver the event and treat it. In the data world that's one of the hardest things to accomplish. Even Kafka, you read Confluence, they just fairly recently finally implemented exactly one semantics. And Eyes Wide Open, if you really want to use this it is a big price to pay in your architecture, right? Because think about distributing that and trying to make sure that everybody sees the same thing. So Eyes Wide Open, before you actually do this it's a challenging proposition. So with that let me kind of, a couple words of wisdom or things to keep in mind per se. So even if you go down the event sourcing route a couple things you want to make sure you also do. Versioning of the main events. Schemas change, semantics change, right? Bake that in, right? You have to make sure you deal with that. Also ensure that you know that there might be multiple domain events in your system going on that I now need to create projections on. Again, they can be coming in at different times and there's probably not a talk, we'll give another venue where we talk more details. There's patterns for this particular piece and how to address this. And again, there's a lot of moving parts to this. Make sure you do this for the right reason. If somebody walks up to you, it's like, hey, we need to do CQRS event sourcing. For example, like, why? Do it for the right reasons, right? There's other ways of solving the problems. And there's no magical consistency solution. If a database vendors might say, hey, we can create consistency. The problem is, like what I mentioned, is that you also have to deal with it at the application layer, right? You need to be aware that you eventually consistent and compensate for when you're not consistent, right? When you have lag, it has to be a balance between the two. So with that, we're two minutes last, but we're glad to kind of answer some questions. At least thank you for your time.