 I got some more folks filing in. So in the interest of time, let's, yes, let's start. We only have 30 minutes. This topic can go way beyond, it can be days. But let's start. So, hi, my name is Alia Borker. I'm a software engineer and I've been a developer for 20 years. And I'm currently a design authority for Feast platform in JPMorgan Chase. And in our asset wealth management line of business. And I worked during the course of the years. I've worked on many enterprise systems, monolithic apps, and also service-oriented architecture applications and industries such as telecommunications, media, and now financial industry for more than five years. So in the last three years, I've moved to microservices. And I've been creating microservices and deploying to a custom pivotal cloud foundry platform called Gaia. We always have to give cool names. It's some Greek goddess in JPMorgan Chase. And we're having a lot of fun with it. But just out of interest, I want to know how many of us here have worked with enterprise systems versus traditional web apps? OK, great. So it's more than 50%. That's great. So the reason I'm here is to talk about data integrity concerns in enterprise systems, which still tend to be a bit different and more complicated than web apps such as even Netflix. So that's why I'm here. So let me start. So I posed this question. Microservices architecture means that you have zero concerns for data consistency now, right? Is that true? Not. Not true. And that's what I'm going to be focused on. In the last three or four years, I've seen new adopters of microservices struggle to put enough time and focus and upfront design for managing the data integrity in distributed systems. And there's a rush to move to microservices infrastructure. And there are many very, very good reasons of moving to the cloud infrastructure. But we still have to put in due process and thought into how do we manage data integrity, right? All the concerns of distributed systems of yesterday are still present. And they still need a solution. So that's why I want to focus on bringing in a little bit more clarity and refocus on this need, which really hasn't gone away. So through in the slide to show what our monolithic and service-oriented architecture systems used to look like, so the shapes can represent some components. So you have, with a monolithic application, you would have all the components always being deployed as a unit as one application instance in different environments on a predefined infrastructure. And the predefined infrastructure didn't go away with service-oriented architecture, also known as distributed components architecture. Now you had service providers coming in and developing individual services called components. So we also like to give new names in the industry, right? So we began with components. Everyone was components, the programming mad for a while. And then now we're all just talking about services. It's exciting. It's fun. But we kind of have to remember some of the same concepts still apply. So we were components, and now we are service-oriented. We used to call us a service-oriented architecture, but there was a lot of tight coupling still and all predefined infrastructure. And some of the usual pain points, as you can imagine, in monolithic applications and all of you who've worked with enterprise applications already, were time to market and the LENDY, which was caused because of the LENDY regression test cycle and the user acceptance test cycle. But every single small, minute change that had to occur in any component. And then, lo and behold, even after a LENDY regression test and user testing process, you could still have catastrophic failures. I should not have done that. I'm almost blinded, but hey. Because one specific use case was not sufficiently tested. Now, how many of you have faced that problem? You had a great regression test cycle, and you spent like three weeks or one month with your operation users or your business users testing, and you still had a massive failure, which caused a rollback. OK, great. We're not the only ones then. JP Morgan is not doing anything wrong, but you know. Excuse me. So the pain points of monolithic applications didn't quite go away with service-oriented architecture either. We still had LENDY regression test cycles and LENDY user acceptance cycle, and also, again, failures and after production releases. So let's see what's happening. So when we talk about transactions, I wanted to revisit what was how were transactions handled in legacy systems. And before we even do that, I should level-set. What is a transaction? We should have a basic definition of transaction that I want to build upon. So I'll say the most basic definition of transaction is a sequence of changes, a sequence of events that occur as a single operation. So if you let that sink in a bit, you could have a set of changes. They all have to go in, so it's all or nothing. Otherwise, in case of a single point of failure, the entire set would be reverted back to its original state. So that is the basic definition of transaction. So in purple, if you follow this diagram, I've tried to show how would a monolithic application manage transactions. We used to all have this great, that big drum that you see there was a big helper. These were the RDBMS databases, the articles and the SQL servers, and really the Trojan horse, and not the Trojan horse, but really the workhorses in the monolithic application development world. And that provided us with a way of leveraging a gatekeeper, which was our database. And the database would handle the consistency of transactional flows by ensuring that the set of changes either went in all or none went in, and there was a rollback implemented. So the databases were key to managing consistency with monolithic application, a shared database. So if you have, for example, and customer come in in a simple online bookselling application, not Amazon, way more complicated, a lot more features. So a simple rudimentary online bookseller application, which is going to, say, for example, allow a user to find a book. Maybe they're interested in Harry Potter, the Order of the Phoenix. I was recently ordering it for my daughter, just fresh in my mind. And say they see a count of 10 books available. And the customer then decides, well, yeah, they're going to buy that book. So they click the Buy button, and they submit the payment information, which makes a call to a payment service. This could be a component within the monolithic application or a distributed component running on another infrastructure provided by a service provider. And either case, the expectation is that if the payment fails for any reason. For example, perhaps the credit card information was invalid. The expiration date was not correctly provided. Then another customer who logs in and searches for Harry Potter, the Order of the Phoenix, should see the original count of 10, and not 9, because the first customer's transaction did not succeed. Hence, the count of books should not decrease. So that is what we mean by data integrity and consistency in a transaction. So we still have to take care of these scenarios with our new paradigm. But I did want to bring in also a retouch on what was to face commit. So with service oriented architecture, we entered into the complicated environment or basically of having service providers having to interact with one another to ensure this consistency across boundaries. So then there were these entities, transaction managers, which then became the gatekeepers. Every transaction manager had a registry of registered resources that participated in specific transactions. In this case, it would be buying a book transaction. And then either synchronously or synchronously, every single change was checked and verified before the entire transaction was succeeded by the transaction management. So the complexity came in, and we still have to manage this cross boundary transactional flow. In the old days, we used to do it with our databases and our DBMS, relational databases, and with transaction managers. But we want to do something better now. All of this diagram that you're seeing up here, all these boxes here are still very tightly coupled and dependent on predefined infrastructure. And for great reasons, the industry has decided that we no longer want to be having to design upfront all our scalability needs of 10 years from now or even three years from now. We want to be able to scale up and down and be fast to market. And so this type of rigid infrastructure which leads to tight coupling, it wasn't working well anyway. So where are we now? We're, so I tried to throw in a little diagram of what our components look like. Now called microservices in the environments of today. So you could have, microservices can be scaled as needed. That's a great benefit of having a microservices architecture. In fact, in JPMorgan Chase as well, depending on the volume and the traffic, we scale our services on demand and we're not, and we're actually on our Gaia platform, we actually pay per instance. So our infrastructure cost is only, is going to be about our on demand and it's not preset. So we do see a benefit and reducing our costs by spinning up infrastructure only when demand requires that a new instance be spun up. So as you can see here, microservices no longer mean that the same set of services will be deployed in one infrastructure environment. You could have a complete mixed bag. This leads to reducing all the dependencies between services and decoupling so that your time to market, your specific service, if you're a service provider, your time to market is not impeded by another service provider's delivery cycles, for example. All right, so we still have transactions. They didn't go away, right? Nope. So we're back to the simple online bookseller service and customers trying to, again, buy the book. Now you see there are even more connection points with microservices architecture. So with our service oriented architecture, we used to have a few more connection points and with microservices we have expanded the number of connection points, right? So the point of failure has increased and we still need to handle the transactional integrity. Again, the counterbooks should not be decremented if payment service has failed. It is a simple example to keep in your mind. And at any point you could have multiple instances of any service running on the cloud. I just like cloudy pictures so I just put everything in a cloud here, but yes. So that's about it. And by the way, transactions are not always user initiated. I'm coming from the enterprise, in my mind, I think of it the most complex data management systems, which is financial systems presently and before then also insurance. And transactions can be initiated by other services as well, right? So you can have an admin, I've shown a user here, but you could have an admin service actually calling in a grooming process, initiating a grooming process, which would be changing the perhaps refreshing items in the books catalog. So the transaction initiation can occur from a user or from other services. And in both cases, data integrity concerns are valid and need to be handled. So I threw in this quote, it seemed epitome. Life is a series of natural and spontaneous changes. Don't resist them that only creates sorrow. Let reality be reality. Let things flow naturally forward in whatever way they like. This is from Lausie, right? And it seemed up to me. I mean, just think, if we can design our new systems that are built with microservices to expect failure at every connection point, then we'll build great systems. We should not hope that there are no failures. We should plan and expect failures in designer systems such that our data needs, our data integrity is going to be maintained regardless of failure at any point. So that's the goal. So keep that in mind. Okay. So in order to walk us through some concepts, I wanted to bring in a simple application with a few simple use cases. And that'll help us walk through some important concepts before we can get to the couple of approaches that I wanna present on how to manage consistency and data integrity needs with microservices applications. Let me also check my time. Okay. So imagine you have a simple tour operator online application, right? Not thinking orbits and Expedia, but we've all used Kayak orbits and Expedia, hopefully. So if you get the general idea, and in such an application at a very basic level, customers should be able to search available tours. Customers should be able to book a tour and an admin can add a tour, right? Simple use cases. So we have simple use cases and this is gonna help me talk about an important aspect which is what is the domain and context here, right? So we need to talk about domain-driven design and bounded context and I'm going to come back to these use cases when I talk about the domain-driven design and bounded context. So I threw in some diagrams here. So when we think about context, we have, for example, the booking tour use case, right? We have a customer trying to book a tour. When a customer is trying to book a tour, they will probably be dealing with the concepts of a tour, vehicle, customer reservation, hotel reservation and hotel. You know, some general conceptual entities you can imagine are into playing and into reacting to produce a, to allow a customer to book themselves on a tour which is offered on November 1st. Maybe this tour is hiking in the Appalachians, right? And on the right-hand side, you see a tour search context, right? In tour search context, you also see the tour conceptual entity. You see hotel and you see tour catalog and top 10 list. But I asked you to think with me here. Tour, in this case, may not mean exactly the same thing and actually it does not mean exactly the same thing as a tour in the booking tour context. Tour for the search tour context could mean that anytime there's a unique name, for example, hiking in the Appalachian, that denotes a unique tour, right? So the dates may not be relevant in the search tour context at all because the search tour context may just be trying to categorize tours by top 10 list, perhaps. Top 10 tours and North America list, if you can think it through like that. Then the tour conceptual entity means different things in different contexts. This is what we mean by bounded context. And when we further expand this idea that your use case flows have a specific context and your entities are relevant within that context, then you design your domain model, which leads to your physical data model creation, right? So let's follow that through, right? So now I'm gonna pick one use case. So instead of the three use cases I had listed, customer searches available to us, customer books a tour and admin ad tours. I'm gonna choose a customer books a tour. So I'm sticking to one bounded context and we still have some decisions to make when you think about domain driven design. There's some options. All right. So you could decide to model your data model, your domain model first, which can then lend itself to data model with a large aggregate model or a small aggregate model. Now the key difference here is that in the large aggregate model, you have a compositional model where the tour is composed of the other entities. The tour is composed of vehicles. Perhaps, you know, there's a boat that is involved in the first leg of the tour and then there's a bus that everyone's going to be boarding and lastly, there could be another boat that brings you back. Perhaps it's a list of vehicles. And then a tour, which is Appalachian hiking in Appalachians on November 1st, could have a list of customers already reserved on that tour. And you could also have a list of hotel reservations for that tour as well. Right? So it's a compositional model. If you notice anyone who remembers UML, it's still, I still use it these days. And as you can see, the black diamond denotes compositional modeling. And on the right hand side, you could also decide to model the domain model as small aggregate model where you have a tour and a vehicle, customer reservation, hotel reservation, all disconnected and only hanging on to a value object called tour ID to make the connection. So I should have also backtracked and talked about exactly what is an aggregate. Right? So if anyone who's not come across aggregate when we talk about domain driven design and aggregate means that there is, it's a cluster of object interrelated objects which are going to be updated or changed for that cluster of objects will always go through one route. So within a cluster of object, one object will be nominated as the aggregate route and that aggregate route will be responsible for making any changes in the entire cluster of objects. So that is what aggregate route means in these diagrams. So as you can imagine, with the large aggregate model, you'll have a few extra blocking scenarios if you think about it. Two customers trying to book in the member first hiking in the Appalachian tour. And with the aggregate route being a tour, then every time there are changes to be made to any part, say a customer decides they wanna switch the hotel reservation or they decide they're gonna add another family member, then the entire cluster of objects is going to be locked and unavailable for updates till the first transaction has completed. Right, so I'm just doing a time check. Right, with the smaller aggregate model, you have less dependencies. You have decoupled your domain model in a way where multiple updates can occur. It's totally kosher because every aggregate route is responsible for the set of changes inside its boundary. Right, but there's more to discuss on this. So again, I tried to just consolidate the pros and cons and I'm gonna bring it back to transaction boundaries. So with a large aggregate model, it's compositional object model and it's familiar to UML users, right, from legacy to present. There's continuity. So that's reassuring sometimes, but it does result in a large number of user transaction failures where, for example, if you have two users logged in and you're gonna see this some more and if someone is hanging on to an old instance of the aggregate, then their request will be rejected if they're trying to update, if someone else wins. So we'll see a little bit more on that. So you have generally poorer user experience with a large aggregate model, but there are good scenarios of using a large aggregate model because it does ensure data integrity, right? If you're going back to your monolithic ways, you do keep everything in one umbrella, one transaction managers, all changes. With a smaller aggregate model, it does reflect the more pure sense of multiple microservices interacting to complete a use case, right? A microservice, by the way, is almost a one-to-one with an aggregate, with an aggregate. So when you think about microservices and domain-driven design, you should automatically equate one aggregate to a microservice. And there are a lot of articles written on this and I've added some links also to the last slide. I believe the slide is available. These slides are available on the schedule, right? So we do improve user transaction success, right? If someone is changing the customer reservations list, then they do not have to necessarily block another customer trying to maybe pick a hotel reservation on the route, right? And, but small aggregates still have to handle cross-transaction boundary communication, right? If you think about it, if you are, and we're gonna see this actually a lot more, let me not jump ahead. So I wanted to get into two design approaches and I wanna present the first approach which is optimistic concurrency with shared databases, which everyone who's worked with enterprise systems in the past should be quite familiar with. So when you have a shared database and you have a large aggregate model, optimistic concurrency still works pretty well because it's the entire paradigm is to allow most updates to go in and only reject updates when there is a evidence of overriding behavior, right? And we'll see this, but I wanna show you a print. What are the two options I want to get into some details on? And the second option is of course, well, eventual consistency across distributed systems. So this is going back to the small aggregate model, right? We said it's purer. It lends itself more to the microservices architecture because one microservice really equates to one aggregate when you look at domain-driven design. And so we cannot rule out eventual consistency across distributed systems where, for example, a booking, a tour booking application really doesn't need to know when a payment application finishes processing a customer's payment. So that would be cross-transactional communication across bounded contexts and we look into that as well. So just to finish up on the optimistic concurrency model, so with optimistic concurrency we get optimistic locking provided by our really, excuse me, shared database and in the tour operator online application how this would work is the tour aggregate which is nominated as the tour route. It would have a version number and each time any change is made to any of its composite entities, the database version would be checked for that tour instance and if the version on the thread making the call is different from the version found in the database then the entire transaction is rejected. So this is the old way of working of ensuring data transaction integrity and if you design a large aggregate with microservices it still works very well. So the pros are of course data consistency, needs are met fully because you are in charge of all or nothing within one microservice, one large microservice that you've designed and it's easier to debug because again all the parts of all related entities are inside one microservice and you have the biggest problem is we haven't shared dependencies across transactional boundaries at all. So all the parts of a tour booking application would kind of have to roll out like a monolith. So this would be the first cut of migrating a monolithic application to microservices architecture and as long as the bounded context is not huge it's still a viable option. So we don't want to throw away optimistic locking and all the benefits that our databases provided unless there's good reason. Some of the good reasons are of course you may have hit an infrastructure wall now maybe you're not able to scale anymore there's just no space left in your database and you must create or bring in a new database server if the minute you bring in a different persistent store you may as well change your design paradigm and think about smaller aggregates in multiple microservices and how to handle data consistency across transactional boundaries. So that brings us to event sourcing paradigm and I'm keeping a time check but I think I'm good. So with event sourcing things become a lot more fun as you know I've been enjoying myself exploring this. So now you have the more freedom to communicate across transactional boundaries, right? So across bounded context your tour booking application has its APIs it has event handlers, data handlers it has its own persistent store it's completely separate from say a payment service, right? You could be on Chase, Pay, Apple Pay any payment service that's being leveraged in an enterprise world you would actually have multiple payment services being leveraged and the user being allowed the access of choosing their payment service and that's how we like things to be we wanna give users choice and we do not want tight coupling on the infrastructure layer. So you see these bubbles, these bubbles are trying to denote context so each microservice, tour booking service, payment service and admin service are running in their own bounded context their own data model and physical data model and domain model as well and all the interactions between these contexts is through a distributed event log. So this is a general paradigm and we're gonna see how this works with our simple online tour booking use case. Right, so before we do that I do wanna talk about one framework which allows us to implement such an event streaming paradigm event sourcing paradigm and that is Kafka. So I become quite a big fan of Kafka recently and Kafka is providing us the key APIs to publish a stream of records to one or more topics and we can also out of box we get a consumer API to subscribe to one or more topics and we're gonna get in a little bit of what these things mean in Kafka and then there's a streams API which kind of think of it as your ETL layer of the olden days where you used to transform data that was coming off of out of different databases or through message calls from other services. So there are many times where you need to transform the data before you can consume it, right? Even if you think about your JSON and your Jersey and Jackson there are always transformational needs even with the REST APIs or any interactions between any systems. So a streams API is now part of Kafka. It became part of Kafka I believe in version 10, 1.10. And then we have also the connector API which is your basic way of connecting to Kafka topics. So this Kafka platform as you can see there's a lot more information of course on it on the Kafka sites as well. Kafka Apache.org. It started off, it was started by LinkedIn and now is under the Apache umbrella. And yes, so Kafka cluster is as I can have multiple producers, multiple consumers, multiple connectors and multiple same processors. It scales really well and provides high availability and performance out of box. It's pretty impressive. Little bit more on the Kafka concepts. Each record gets a sequential ID, a number called the offset produces published data to the topic of their choice and consumers label themselves with the consumer group when each record publisher topic is delivered to one consumer instance within each subscribing user group. And we're gonna come into and touch upon user groups a little bit more, consumer groups, because order with streaming event sourcing applications is pretty important. Order of message consumption is still very important and we need to consider that if we really need to achieve, planning to achieve data consistency. So coming back to a simple tourbooking application, what if we implemented it with Kafka? How would things look? So again, you have your tourbooking service. Perhaps it has an API called make reservation and it has a Kafka stream client with the streaming APIs that come with Kafka. It could be a producer, a registered producer maybe with the label AA with its own name and it could be, there could also be a consumer group A. So it could be consuming messages also meant for tourbooking service. For example, what about acknowledgements from payment service that, hey, yes, I received customer XYZs payment requests and I've successfully processed. So their reservation is good to go. So asynchronous communication between bounded context that's what we're trying to achieve. So payment service could likewise have its own APIs process customer payment. Again, has a Kafka stream client and be registered as a producer and also have a consumer group B that it's participating in. Admin service likewise has its own APIs changed to a route perhaps. The itinerary is changing. Perhaps there was a hurricane and you can no longer dock at a specific Arabian island and you need to be diverted to hopefully not Texas but maybe if you're starting off in Boston, possible. And then you have this concept of topics in Kafka. So you could have the tour application topic, admin topic, payments topic and each a Kafka producer can choose to write to specific topics. So it's a lot of freedom and you can also choose to consume from multiple topics as well. So both ways flexibility. The order of and I should bring the highlighted in important concept of consumer groups. Now, since one instance in a consumer group will get to consume a message that is subscribed on. Generally it's a logical subscriber. For example, tour booking service and it's multiple instances that would make up one specific consumer group. That's pretty important because you do need tour booking service to get all acknowledgments or declinations from payment service. You wouldn't want to be missing out a message by mixing in tour booking service and a payment service and an admin service and one consumer group, which then causes some of the messages not to get to tour booking service instance. So you have flexibility of scale here because depending on your scalability needs tour booking service could have N number of instances running and all of them are registered as consumers in their own consumer group. So very achievable, but some things to think about. Right, so I think I'm reaching the end, yes. All right, so just to sum it up, don't forget your data. Data integrity within and across multi-user transactions still needs to be handled with microservices, architecture, and in this presentation we've discussed two design approaches to manage data integrity and across transactional boundaries. So thanks for listening everyone and any questions? Thank you.