 Good afternoon, everybody. Thank you for joining me. My name is Daniel Jones, and I'm CTO of a consultancy called Engineer Better. We help people with Cloud Foundry. We also help people use Cloud Foundry and get the most out of it. This talk is about service-oriented design when applied to microservices. And we've worked with customers in the past who have been trying to move from monoliths to microservices. Sometimes that's gone really well because they had well-structured monoliths in the first place. Sometimes they've really struggled because they had too many blended concerns in their monoliths. And this made me think that before microservices were the new hotness, 15 years ago when I first entered the IT industry, I worked for a financial company whose trading system was built using service-oriented architecture, so message-oriented middleware. And all of the design patterns that I learned then apply now to microservices. So this session is about trying to share some of those ideas and act as a bit of a reminder that a lot of the hard problems about microservices are not about using the latest Spring Cloud implementation of a Netflix pattern. It's just old-fashioned distributed computing. So we'll be going through some ideas, some patterns at a very high level that you can apply when starting a new project, when building a monolith, that then will make running it in a distributed microservice architecture easier. So the executive summary, microservices are a deployment concern. How you do things with circuit breakers and service discovery, that doesn't impact your business logic. It shouldn't do. They're a deployment concern. And you're probably deploying that way because you either want resiliency or you want scalability. And that means that you end up with a distributed system. Distributed systems are hard. They're hard to reason about. All the things that worked when you were working with one process, with a lot of certainty, and you took for granted, you can't do any more. But luckily, distributed systems have been around for a while. In fact, I was talking to Dr. Jules, who PMs the Garden Project, about this the other night in the bar. And he pointed out the Lamport clocks where the paper on those was written in the 1970s. So this stuff has been around for a while. So if you practice service-oriented design from the start, you'll end up with an app that's easier to split into distributable chunks. And your code will be easier to reason about, which will have immediate benefits, even if you choose not to split it out and distribute it into microservices. One of the things about this, and the original abstract that was talked about taking through an example using Spring Java app, a lot of frameworks for the sake of productivity make it very easy to get things done quickly, but by munging together concerns and putting too many things in one place. The Cloud Controller is a good example of this. It's a Ruby on Rails app that could be written really, really quickly. But it also means that it's hard to evolve, because some of the JSON representations of objects are exactly the same as the database representation, because there's a one-to-one mapping and lots of metaprogramming magic going on. So the kind of thrust of this talk is think about some architectural concerns to start off with. Don't necessarily do the easiest thing the framework puts in front of you, and you'll have an easier life in the future. So the world before microservices was fashionable and trendy. It looked a lot like this. If you had things that needed to talk to each other, they did so via the enterprise service bus, which was normally manifested with messaging or into middleware. You would send a message. You didn't know who was going to receive it. The message broker would make sure it got to the right place. Why did this fall out of fashion? A number of reasons. But one is that, as you can see in this diagram, the service bus becomes a linchpin. It becomes a logical linchpin. It becomes a scaling linchpin. When you get to the scale of Netflix or Google or Amazon, this isn't going to work, having one central thing that knows where all of the services are, so it can route messages to them. The only way to get around that problem is something that looks like this, which I'm not sure is necessarily more simple or more elegant. We take the logic from that service bus, and we distribute it, and we push it down into each app. So all of that functionality about retries, and temporal decoupling, and routing, and finding out where things should go, that goes into every single app. If you're going to operate in this way, and you want microservices, and you want to have more than one instance of a thing, you are going to need to prepare to optimize for availability over consistency. We do get people that say, I want to do microservices. I hear they're cool. Can we put some distributed transactions in? The answer is, you really don't want to do that. Fortuitously, last night in the bar, probably about half midnight, a friend of mine who works for a software vendor came out with this classic quote, which was quite lucky, that transactions are for lazy architects. OK, transactions, we put in systems to stop bad things happening with our data. It's preventative measure. When we have to deal with eventual consistency, which we do have to do as soon as we have two instances of a thing running, then we instead move to a world where we have to remediate bad things that have happened. We don't try and prevent data getting inconsistent. We allow it to become inconsistent, but we make sure we deal with it at a later date. We'll do something to detect and then act on that divergence of history. So in this whistle-stop guide to future-proofing monoliths, there's kind of three broad strokes of things that we can do, going from the quite easy and the quite straightforward to the, this is getting a bit more complex now. This is adding a bit of overhead to our code. First up, separation of concerns. I mean, really when we get involved in moving apps and breaking them apart, that's normally the number one thing that causes a problem, like fairly straightforward stuff of, well, that class is doing too many things. This knows about the database and it shouldn't do. Then we start talking about communicating via events rather than direct RPC, direct method invocation. That in itself isn't necessarily going to help with moving to microservices and dealing with eventual consistency. But it gives us some benefits that we'll see in a moment. And it also allows us to handle eventual consistency. So what do I mean when I say services? The opposite of a rich domain, fat object. You read an object-oriented programming book from the 90s and they probably should have said, well, you've got a user object which has a save method and it has an update method and it contains all of its state and it knows about all the behavior of things that can be done with it. I think we've all moved away from that now because that ends up with these objects that know about all the things that can happen to them. So my definition of a service is a thing that does a thing with a thing. It's really quite succinct that. And they should be stateless. So they receive some kind of communication in. They do something with it. They chuck something out the other end or they have side effects and change the state of the world. They don't hold state themselves until maybe when you start event sourcing, but we'll get to that in a minute. Now, somewhat embarrassingly, this is still in my slide deck. So if you can imagine there, like a large object called user or something with a save method and a get name and a set name and all that kind of stuff, that's what I'm talking about when I'm talking about rich domain objects. This is the sort of thing that I'm talking about when I'm talking about services. So I was trying to come up with some really trivial examples to demonstrate principles in this. But this is the kind of thing. We've got a domain service that knows about accounts that can have transactions acted on them. It gets set up. It doesn't hold any state. When something needs to change, it gets told, arguments. It does the business logic. And then it asks someone else to do the persistence for it. So that's the kind of thing that I'm talking about when I'm referring to services. Now, the thing that I really hope, if this is new to you that you take away from this talk, is this. If all of the rest of it gets confusing later and quite complicated, this is the main bit. This pattern of these four layers, I think applies to pretty much every system I've ever worked with. At the top, we've got gateway services. So the gateway's job is to translate from the outside world into our domain. Be it HTTP, AMQP, even if you write a CLI tool, the entry point into your application that then parses command line arguments, that's your gateway layer. It doesn't want to have any logic in it. It just wants to be able to translate from the outside world into something meaningful that we then understand as, oh, right, somebody's trying to achieve a thing. The orchestration layer is below that. So the gateway doesn't do anything king whatsoever. It's just translation. Orchestration knows about workflow. So maybe in a software service, when we register a new user, the orchestration gets one request from the gateway layer saying, register a user, please. The orchestration tier is the bit that knows, right. First, I've got to go and write some records into the auth system. Then I've got to go and put something into the user's database and then maybe invoke some email sending service to let them know they've got to confirm their email address. Hands up if you are familiar with the difference between orchestration and choreography. OK, so orchestration is a particularly meaningful term here because it means that someone knows what's going to happen and it knows the workflow. There's one thing that's responsible for that. The alternative to orchestration is choreography. And in a distributed system, where we have lots of microservices, that means that the services are responsible for emitting events and then another thing will pick that up and then maybe another thing will pick that up. They're all decoupled, but there's no one central place where you can look at and go, all right, the system is going to do that, that, and that in that order because the behavior becomes emergent. Now emergent behavior is awesome in people. We like emergent behavior. Diversity leads to complexity, which leads to emergence. That's where innovation and creativity come from. But it's also unpredictable, so therefore not a desirable feature in one of our systems. So orchestration knows workflow. Domain services, hopefully this should make sense to everyone, business logic, the boring stuff that we normally do. And below that, data services whose sole job it is is to closely guard a data store. All they do is IO, no business logic, no validation. That should have been done by someone hire up the stack. These are responsible for abstracting away whether using MongoDB, Cassandra, MySQL, or whatever. Every app you write, I'm willing to bet you a drink will fit into that pattern. Yeah, maybe it won't have the data layer, depending on what it's doing, but pretty much everything you write will end up fitting into those four categories. Just knowing that and just separating your code into these four tiers will make your life easier in future. So here's an example. I don't know whether you can see that at the back. Hopefully not, actually, because it's a rubbish example. Of the kind of thing that frameworks like Spring lead you into doing, this is a controller. And it's doing some business logic. And it knows about repository, so it knows how to save things. It's got far too many concerns. But if you had to get something up and running quickly, this is probably what you'd do. But it's not a good idea for the long run, because you have all those concerns mung together. As it evolves, you'll end up with more and more. In this one class, it will be hard to migrate away from. Resist the temptation to do this. So we've now written our application, and we've got it into those four different strata of services. What's the next thing we need to do is make this portable, make it migratable. CQRS Lite, hands up if you know what CQRS is. Cool, quite a few people. For those that don't, command query responsibility segregation, there are various connotations of what CQRS means. The simple bit is separate your reads from your writes. You have a separate write path through your application that can have side effects and changes states. Reads must not do that. This makes your system very easy to reason about. You know that when you call something called get account, you're not going to change the state of the world. You're just going to get some data back. And that allows us to add in quick reads to this architecture. So because the reads are always side effect free, we can do them anywhere. And it doesn't matter. They're safe. It doesn't matter if any layer talks down to the data layer. So we've got an application in these four tiers. And we've got our write path separate to our read path. If you want to go really hardcore with this, you could split them into different interfaces. Now what we need to do is start preparing for an event driven architecture. Now the event driven stuff is going to lead to our ability to handle eventual consistency. It's a means to an end for us here and now. So to prepare for this, we can take some steps that you can do in any old app right now. And it won't hurt you. You have to write some more code. But you'll have a code that's easier to reason about. The first thing is to communicate with immutable value objects. What do I mean by that? Click. There we go. Something that looks a bit like this. So that's basically a parameter object. We've taken two parameters on our do a transaction on a bank account, please, service. And we've wrapped it up into one object. It's got public finals. I love this. And I love it when people get upset that I'm making things member variables public in Java. They're immutable. Why not be public? We don't need all that getter and setter nonsense. So if we communicate with these parameter objects that are immutable, we have easier to reason about code. I know that the thing downstream of me is not going to fiddle with the data I've given it. I've asked it to do a thing. I expect it to do exactly what I've said, not start changing the state of stuff. If your code works on assumptions, that you're going to have shared memory and that you'll be able to change the state of an object after it's been sent to somebody else, you can have a really bad time when it's distributed. And you don't have shared memory because you're going over serialization hop. And then after that, we get into the really quite the first quite tricky issue, which is dealing with idempotency. When you've got a distributed system, when you've got any degree of uncertainty, you can either have at most once messaging or at least once messaging. No matter what a salesperson from a very large vendor might tell you, you cannot have exactly once messaging. If I send a message and I don't hear anything back, what do I do? All right, well, I can't send it twice because something bad would happen. So I'm just not going to try again. Hopefully it got there. That's at most once. If I send a message, no response, I'll try again. That's at least once. We can't have exactly once messaging when there is some uncertainty as to whether that's going to be received or not. So we have to deal with idempotency. We have to know that when an operation is invoked, it could be done several times because the network might fail on the acknowledgment. So we have to start thinking about that. And tied into that issue is also the idea of being deterministic. So idempotency, I can do the same thing twice. I get the same result. Determinism is that there are no random processes in this. Now, where's Sushant? Ah, boo. I was working with a customer last week. We were working through an example of we were basically emulating a multi-DC approach to eventual consistency. There he is. Did you hear me from outside? Cool. So we were working with this. And we realized that when we had multiple databases, multiple DCs, we couldn't do things like have auto-incrementing IDs because you got two different databases. That's one bit of state that you would need to be consistent between the two. And we need availability. So one of the ways we can deal with idempotency and with deterministic behavior is to identify a transaction as close to the client as possible. If when a client asks you to create an account as a trivial example, if the client decides the ID or if they tag that request with some kind of unique identifier, a UUID, then you can tell if you've received it twice. Have I seen this message before? Yes, yes, I have. I won't do it again. If you don't have any way of doing that, how do you know they don't want to create two accounts? We need to tag things as close as possible to the client that will make it much easier to deduplicate these messages going through the system. So in this example, which is very colorful, we've changed from our original controller, which was where are we? Creating an account. I can't point there and look there. Here, the client is specifying the ID. Also, you can see that we're now using a parameter object. So rather than passing through just the ID, we've wrapped it up in a parameter object. That's going to make it easier to turn into an event-driven system later on. So speaking of events, in an event-driven system, we communicate with events, immutable things that have happened in the past and are therefore incontrovertible. Our parameter objects were immutable. This is one of the reasons why. So we can say that a thing happened and that no one's going to fiddle with it later. You'll notice in the previous example, that way, this parameter object was called account-create-requested. That's an event. It's a thing that happened. We've given it a name, so it makes more sense when we replay these in history. So we communicate with events, but that also means that we're not directly invoking a downstream system. When our domain service wants to save something, instead of calling the data service and saying, data service, please save this record, we instead admit an event to whoever might be listening, saying that somebody asked for an account to be saved now. So we decouple our systems in that way. One of the downsides to this and the thing that you'll end up having to change loads of code because of is that you can't have return values. Event submission is far and forget. If we're in a distributed system and I'm sending an event saying, somebody wanted to save an account and then the data service sends back an event saying, I created an account. I don't know if I'm going to be the one receiving that. It could be another instance of the domain service. So we can't expect return values and would have to block and all sorts of other things. So our code ends up looking more like this. We've got an example here where we're just doing a post to some event bus. We don't know whether it's going to work. That's for somebody else to figure out. They will emit an event when that has happened. When we do this and we can't get a return value immediately because we're asynchronous and we don't know if we're going to be the person to receive the system to receive the request, what if I need something? What if when somebody asks me, can you domain service? Please create an account or update an account. What if I need to emit another event saying, yep, it was done? And I debited 30 pounds, and the resulting balance was 100 pounds, and this is the account number. What if I need to then emit something out to the rest of the world, saying that I did my job? Well, if I'm not the person to receive it, how's another domain service going to know how to do that? We then get into the realms of event carried state transfer. The idea here being that anything somebody might need to know in the causal chain of events that we want to trigger has to be carried through with these events. We can't rely on me holding some state. I fired off an event to save this record in the database. When I hear back, I'll tell everyone else that that particular event was completed. Can't do that because I might not receive it. So an example of this would be where we're doing a debit on an account. So the domain service receives a request saying, can we take 30 pounds, or euros, or Swiss francs, or dollars, or bitcoins out of account one? Domain service checks the balance, goes, yep, they've got enough money, I'm happy for that to happen. It does the business logic, the validation, and then tells the data service, set the balance of account one to be 70 pounds now. The data service responds back, going, yep, I've updated the balance of account one. The receiving domain service that's listening to the data has been written events, it doesn't know how to respond. It hasn't got the information it needs because in that flow of events, data was lost. So instead, we need to pass through more state. We need to pass through the whole chain of stuff. So here, even though the data service doesn't need to know the amounts by which the balance changed, all it needs to know is the final result, which is 70 pounds. But we pass through the amount it was changed by, so then it can communicate it back up the stack, so then whichever domain service receives that event can then report correctly at an event of, yes, an account was debited, and here's all the information you need. That's a vented state transfer. That allows different actors in the system to respond to whichever messages they receive without having to hold or share state directly. So we've kind of now prepared ourselves for distribution. And these things aren't necessarily massive changes. The separation into those four tiers, not a big deal. I would recommend you should be doing that anyway. Immutable parameter objects, definitely do that. More immutability, always more reasonable code. Event-driven, OK, maybe this is getting a little bit inconvenient now. This isn't how I would like to write systems quickly, but it will future prove us. We're now going to get into the realms of things that are necessary for dealing with eventual consistency in a distributed system and are more work. I said that using events and an event-driven architecture for this discussion was a means to an end. It's a means to an end of being able to replay events. So event sourcing is the idea that we keep a track of every single one of those events that was emitted by any actor in the system. We store those. And then when we want to build our view of the world, we replay them from beginning to end. This is called because it gives us a means to deal with eventual consistency. We'll get to that in a bit. But it also gives you some other benefits as well. So our event list might look something like this. We've created an account. Somebody asked to save it. The data service said, yeah, I've saved that. And then the domain service above it says, yes, an account was created, so on and so forth. So we get this log of all the events that have changed. This gives us benefits without worrying about distribution and about eventual consistency. Because if there's a bug, we can replay everything that happened during that day. The financial company that I was talking to you about at the beginning of the talk, we used to do this. If there was weird behavior in one of the systems, because every single message had been recorded, we could replay a whole day's trading activity and see where the problem lied. It's also useful for modeling and for testing. If we want to change our business logic and see, I don't know, if we're a bank and we charge someone 10% when they go overdrawn, what would happen if we charge them 20%? We can change our implementation, run the event log through it, and see how things would have changed. So it gives you modeling capability, as well as the ability to debug into what would apparently be one-off issues. Once we've got that list of events, we can start thinking about how we might merge those histories of events together from different databases. So here's an example that I was working through with Sashant last week. So he definitely deserves some of the credit for this. We've got two data centers with two copies of our app stack, whether it's a monolith or those apps actually are a whole bunch of microservices. For this example, it doesn't matter. We get a request in on the left-hand data center saying, let's add 100 pounds to account number one. That gets stored into the left database. And by some means, those two can talk to each other. That information gets shared. Like the actual implementation of that pretend it's Cassandra, I don't know. It doesn't really matter. But conceptually, that's what's happening when there is no network partition. But if we introduce a network partition, so we've added 100 pounds into our account now. Now, the user wants to take some out. But there's a network partition, and our apps and our databases can't talk to each other. So on the left side, we think we've got 100 pounds. On the right side, we think we've got 100 pounds. On the left side, the user asks to take out 70 pounds. We go to the left database. Yep, you've got enough money for that, great. I'm going to take out 70 pounds and set your balance to be 30 pounds, great. The same user then runs a second request going, ah. And it can't be shared because we've got a network partition. So that state is isolated in 1 DC. The user, maybe a naughty hacker, realizes there's a network partition, runs a second request, rooting through to the second data center, and ah, I'd like to take out 60 pounds now, please. The right-hand database still thinks we have 100 pounds because it hasn't heard any different. So it sets the balance to 40 after doing its validation checks and it can't share that information yet. Network partition gets resolved. These two databases start talking to each other again. What happens? Well, we've got a situation like this where we've got some common history up at the top and then we've got these events down here and these events over there. How are we going to merge those together? How do we even know what order they happened in? When we're putting these in, we could use timestamps, but time is entirely relative. Time does not flow the same for two observers in different parts of the universe, depending on what speed they're going on, all those kind of things, let alone when you introduce NTP and clock drift and leap seconds. So how are we going to order those? The answer is vector clocks. Hands up if you understand vector clocks. So Chance is putting his hand up after we muddled our way through this last week. So there are two possible answers to this, right? One solution is vector clocks. The other is that we could tag the last event that we saw. If every event gets a unique identifier, then we could always chain our events and say, well, this one was caused by that one, which was caused by that one. Vector clocks are basically a more efficient way of doing that. So each actor in our system, every time it sees an event or acts on an event relating to a particular account, it tags that event with its identifier. So this is the left data center. And it's saying, this is the first time I've had an event that's acted on this. The next event happens. This is also the left data center. Our right data center hasn't got involved, hasn't seen any of this yet. And then our third event happens. So we've ticked up three times that we've acted on this account in the left data center. Then we get our network partition. And we get one branch of history over here and one branch history over here. And we can see the numbers tick up for the left data center over here. But over here, L will get stuck on three are increments. Vector clocks can determine whether things are causally related. An event is descendant of another event. It is caused by another event. If all elements in its ancestor have the same, if all elements in the descendant are greater than or equal to ones in the ancestor. So for example, we can tell that this is not a descendant of this because three is not greater than four. One is greater than zero. But we can tell here that we've got a conflict. We can tell that this is a descendant of that because four is greater than three. Zero is equal to zero. And we can tell that this is a descendant of that because three is the same as three and one is greater than zero. This allows us to build a causal history. We can track through all of the events in our system what happened, why, and in which logical order. We don't know what time they happened in. And we can't tell that this happened before or after that, but we can tell that history has diverged. And we've got two parallel dimensions of transactions. So if we can track those histories, then we can detect when they diverge. That means that when we're doing event sourcing and we're replaying state, remember that we build our view of the world with event sourcing by replaying each one of those events to build our view of the world. When we detect a branch, we can do something about it. If you've got two databases that say, well, the account balance is 30 and the account balance is 40, you've got no way of intelligently merging those together. Those are just conflicting bits of information. When you've got conflicting event histories, then you can do something about it. You can see why they diverge. And you can see when they first started diverging. That means that you can do something particular to that type of event. You know what event it was that caused the branching. And then you can implement your own logic there. You can have an event-specific divergence handler that will take remediative action. So remember at the beginning, I was saying that transactions are preventing bad things happening. In eventual consistent land, you have to take action to correct the bad things that have happened. Maybe in this instance, we attempt to do both branches of history. Maybe they did have enough money. If they were too conflicting with draws, and they did have enough money, that would be fine. Maybe we cancel both. Maybe we do one and not the other. Maybe we do something completely different. If we, in processing that event, had talked to an external system, we could send a opposite message. We could do the equivalent of a git revert. Only if we have that causal history and a list of all the events can we do something sensible when we detect a conflict. If you're going to have a distributed system, and you're going to have micro services in different places for resiliency or scaling, you're going to have to deal with this. So that was a lot of information for half an hour. So the engineer better guide to future-proofing your monoliths, I would have loved it if there'd been fewer things on this slide. Unfortunately, it's not necessarily that simple. But you can do that separation of into those four tiers of gateway, orchestration, domain, and data. Start doing that now if you're not already doing it. Separate your rights from your reads. Make sure that read operations don't have any side effects. So you can reason about them. So you can do performance optimization. Do caching. If you then move towards using immutable parameter objects, you're a step closer towards being able to event source. You also have code that's easier to reason about because your classes are immutable. You can then move to doing asynchronous event-driven architecture. And once you've got those events flowing through the system, you can start recording them. If you tag them with vector clocks, then you can assert a causal history. And then you can detect when the history is diverged and then you can take remediative steps when you realize that there is a conflict. And that's it. So hopefully, that wasn't completely baffling. Any questions? Yes? You lost me somewhere when you started to talk about that the microservices should communicate through events. And I wasn't totally clear why that is. And if I have microservices, they can talk over HTTP, and that's request response. So they get a response. So why should I use events? So the events allow you to do that event sourcing and to build up a meaningful semantic history of what's changed in your system. They can do that still over HTTP REST. HTTP REST is a transport mechanism. But something in your app needs to be able to tell that a thing has happened. We're going to record that thing. And now, in order to tell the rest of the world, you might, in the old world, you might do that with a message broker of you emit an event and you do some TCP, which is like act sync. The message broker receives that. You know it's received it. And then it works out where to go. In a microservices world, it's similar. But the thing that's working out where your message needs to go to is probably inside your app, because you're doing some service discovery or something like that. So a layer between your application and knowing where to call can perform that same job. The important part about the events is being able to record that history of what has changed and why. Does that make sense? So it goes in the direction of prepare for resilience for a really widely distributed system. So if I have a user sitting there in front of the screen and the user presses a button and wants to save, I would reply with, well, it didn't work, the save. That is probably the very naive, easy case. So the question was, what would you do if somebody is trying to save something, a user click to save button, and it's failed in which way? Yeah, it's like failing, because can't talk to the database or two of the microservices can't communicate in this very moment. Just propagate the error up and bubble it up and show the user, well, that didn't work. Yeah, so the client can time out or you can emit failure events. So if a handler has failed, if the receiver of a message has failed to process it correctly, you would then emit an event that said this thing failed and it would probably include the original event, like the causing event in there. So then somebody can listen to that and report it back. Any more? Or are we, I think we're over time, aren't we? Anyone know when the next talk starts? Is it now? Sorry? OK, yes, we are out of time. It's four minutes past four. Great, thank you very much for that. I hope it wasn't completely confusing and happy to answer questions later.