 Let's get it started. Thanks for coming. We're so excited to be here. I hope you're enjoying this conference as much as we do right now. Our name is Ami Kehle. And Tim, we are with SAP Cloud Platform. And we are part of the Cloud Platform Performance team. Our team basically has a pretty ambitious mission of taking care of performance throughout our whole Cloud Platform, starting from hardware basically all the way up to applications. Now, today, we're here to share a little of the knowledge with you that we built over the past year, basically, building reference applications, as well as helping others at our company, making the applications increasingly awesome. So how many of you have seen the keynote from Björn Gerke? Hands up. Nice. So by now, you understand SAP people are lovely dorks. And bonus points if you understand what the reference is from. Today, we are taking you on a journey from a monolith into a nicely groomed, Cloud Foundry application, modular, microservice-based, that leverages the app in the right way. And before we take on the journey, we need to share something with you. We may have slightly misrepresented what the talk would be about, because we have only 30 minutes. And so we will not be exactly able to show you as much of a code as we would like. Moreover, most of these things happened in one particular story, but narrative and giving you as much information as possible means that we are going also to merge some other learnings from other things. So take it as a journey. All these things have happened to someone somewhere. Most of these things have happened to a particular someone, probably you, or us, anyways. Now with applications, as they say, if you're smart, you start with a monolith, of course. And there's so many good reasons to do so. Like it's just super easy to start with coding, with your application, you have less time to market, it's easier to debug. You basically don't have to deal with all the intricacies of distributed systems. And of course, there are a thousand good reasons to mention here. However, we want to make an addition to that statement, which is monitor your runtime in order to find out when to splinter out to microservices before it's too late. Now, we have a sample application for this session which is just a super simple web application which deals with time series data. So we have a post endpoint where you can just upload time series data. And of course, again, endpoint where you can retrieve time series data by ID. Right now, this application is implemented as just one single Cloud Foundry application. And with that kind of architecture, that gives us already the possibility to basically scale it horizontally, which is awesome. And it can be scaled vertically. Don't turn the light. Please don't. No, I'm good for you. Now, I already hear some of you thinking, well, this doesn't really look like a microservice architecture and fancy at all. Yeah, right. Turns out you're actually right. This is something that you see sometimes where people assume that just by the power of safe push, your code, maybe your legacy application, becomes immediately a microservice. That's not how it works. It takes actually quite some effort. And today, we're going to discuss exactly that to get you to a state where your application can proudly be called a microservice. By the way, if this doesn't end on t-shirts, by the end of the day, we're doing something wrong around here. But, Michaela, one also has to mention that sometimes a monolith is just a thing. I mean, not every application has to be like cloud scale and is called 1,000 times per customer, every single time somewhere, somewhere, and sometimes a monolith is just good enough. So don't be ashamed of your monoliths as long as they work. But you need to understand when they work. And with cloud software, monitoring is the key. Have you ever heard that story about the tree falling in the forest? If there is no one there to listen, did it make a sound? With your transactions on the cloud, that's exactly what happens. If you don't monitor them, for you, they practically did not happen. And when people start thinking in terms of monitoring, I mean, we all immediately think about how much CPU are we using, how much memory we're using, what is the 95% percentile in response time. And that's all very good. But that's not necessarily the way your users actually evaluate your service, right? It's a little bit like, hey, the response was narrow, but hey, it was blazingly fast and super memory efficient. But actually finding out what to monitor is surprisingly hard. But there's something to the rescue, which is the service level agreement. You might say, like, this is boring and corporate stuff. And what are we doing here? But that's only half of the story. Now, we're here keeping it just super simple, looking at an example service level agreement for one of our APIs, which is the post API. So it comes with a quota per customer and max total requests per second and stuff like that. But what does it really give us? If you look, for example, at the quota per customer, as well as max total requests per second, you eventually start, most probably, you start to think about topics like rate limiting, or you even think about topics like how to enforce that, because you eventually have to enforce that even in code. And if we start, for example, thinking about what is the maximum amount of data that customers are going to send over us with one request, and then we are going to send back, then we start thinking about what kind of data we are supposed to use to test the application. And then we start wondering, for example, how fast does it have to be? And that gives us the targets for what we need to certify our application based on the load tests. And remember, under promise and over deliver, that's how you make customers happy. That's key. Now, when you look at topics like failure rate and target availability, you're most probably thinking about your architecture in general. You eventually start to ask yourself, if I want to have a target availability and max failure rate, is my monolithic application still enough? Is it still what we're looking for? And by the way, since we're talking about target availability, let's throw another thing out there, Michela. So how many of you have deployed applications on Call of Foundry where there was just one instance of that? Hands up. We all have, right? It's the monolith. It's the one true monolith. Now, when you're going after the magic four nines of availability, 99.99, that's the holy grail, right? That boils down to fundamentally 8.6 seconds per day. And it's little. It's really little. It is so little that it takes me longer to tell you how little it is than what it actually is, right? And containers crash. We have heard it over and over and over again in the talks of this application. Containers crash. Call of Foundry is very good at bringing them back. There are many reasons why containers crash. And later we'll discuss a couple of them. But it takes time to resurrect containers. Call of Foundry has to notice. Your software has to spin up. And maybe it's doing some initialization. So don't do single tones. Create your application to work with horizontal scalability. And make sure that there is always something there to answer the request of your clients. And since you are talking about containers that crash, one other thing that we all do immediately when we start monolith is that we hide state into it. Maybe we're not thinking about it, but imagine the case you're getting a response from an HTTP client and you send back an accepted, which is a promise that says, eventually you'll get something out of it, not right now, but eventually. And then you take your response, you put it on an executor, and that is executing. Well, if the container goes down like a brick, your task that you took on on behalf of the customer is gone. And you broke a promise. And that is the kind of trust that is very hard to regain again when you lose someone else's jobs. And it's not only about asynchronous tasks. There's much more that we tend not to think much about. At any rate, whenever we are thinking in terms of state, in Cloud Foundry state goes into backing services. Doesn't go in containers. You cannot trust them. Your state is for faith when the container goes. In particular, your session data is gone with the containers. And for example, people tend to put those things in redis because it makes sense. You don't want to pin a customer on one particular application server. Then it's ugly when the server is not there anymore. And do not use the file system of the container. Don't trust it. It's ephemeral. Your container crashes. The file system goes, unless it's backed by a local volume, meaning that there is actually a service behind that will persist the state somewhere else. I mean, that's all fun and games. But let's finally go back to the title of this talk, Splitting that Monolith. Now, when it comes to splitting an application, let's just say here for another second. Some other people than I say since ages, basically at least since two years, that splitting an application has to be very well controlled. Apparently, splitting an application is as hard as splitting the atom, and the outcome can be equally devastating if done wrong. By the way, we are from the Cloud Platform Performance team. We may be a little biased in thinking in terms of making stuff fast and reliable. So you may have heard from others other criteria for splitting microservices. We're going to give you a set of criteria and a mindset focused on making stuff work well. There are also other reasons why you would split and maybe add the different granularities, but take these into account as well. Now, let's finally start to look at asynchronous processing, for example, in order to split out a microservice. Now, as you have seen already, our web application comes with a post endpoint to upload time series data. Now, what we do in order to split that API a bit is rather than basically accepting a request and applying maybe even algorithms on that, what we do is we just put that request into a queue. And like, for example, Redis or Rebedem queue, it's whatever you feel comfortable with, basically. So, and we queue the task in a cache. Now, eventually, yeah, there we go, eventually a dedicated worker node starts fetching and actually processing that request from the queue. From a client perspective, this, of course, is a breaking change because it's now a two or two accepted and you just get a location header. And by its nature, by the way, the worker node, of course, comes with no web API and, of course, no route. Now, once the worker is done with basically processing the request, it stores the result and the database as it used to be already. Now, the client, of course, can use the same get endpoint in order to fetch time series data from the same endpoint, of course. You can use, basically, polling or server-send events in order to do that. And when we look at that now, we already gained some awesome advantages, actually. Now, for example, we got rid of the whole state in the web API and you can be pretty confident once you return a two or two accepted to a client. You can be very confident that there's the task in the queue and eventually the worker starts firing up and working on it. Also, so let's, yeah, there we go. So you can be pretty sure that you will have no loss after you accept it. Also, an awesome advantage is that our web API, you could be pretty sure that there's no real stuff going on since it's quite dumb now, which is basically very good. Also, since we have a dedicated worker node, we can just fire that up individually by its process type, which is a worker. Now, we have the worker that is taking jobs from the queue and it's processing them. And what is that? It's state. Now, if the worker deletes the task from the queue when it starts processing it and the worker goes down, the task is gone. So remember, when you're using this type of architectural patterns, please think in terms of reservations, optimistic locking, there are several different mechanisms you can use to make sure that the jobs don't disappear until they're really done. Think in terms of a little bit like small transactions, because you do not want to lose jobs. And since we're talking about making stuff nicer and nicer, something that we tend to see quite a lot is that people tend to get a bit old-fashioned with HTTP APIs. So for example, when you upload something, you send a tool one created, that's very nice. When you want to retrieve something, you get a 200 OK with the entity in the body of the response. But today, we have a lot of interesting toys. We have WebSockets. We have server-sent events that are really nice by the way to make APIs for web applications that need to be updated as the data comes in, where you can populate, for example, charts as you retrieve tons of data from the back end. Something that happens when you go a little less traditional in the way you do HTTP is that you move away from the normal responses and you go into HTTP chunking. With HTTP chunking, what you're saying is that you start streaming the response out before you know exactly what it will contain. You're going to build the response, very likely, as you stream it out. What does that mean? It means that your status line, which say, for example, 200 OK, starts, goes out before you have done all the work to ensure that the response will actually be successful. So it can be the case that, for example, when you're streaming stuff out of a database, then you have an error halfway through. And there with HTTP chunking is not really so nice, but modern libraries actually can handle that. And they will notice that there is an error. They will not know, for example, what, but they will know that there is an error. And it gets very efficient. And we're going to talk later about why streaming is the key. Finally, we can, of course, show the Compulsory 12-factor app slides, because there's no real microservice talk without at least one pointer to that. And pretty sure it was in the call for papers. True, true. So as you've seen, we're treating our backing services as attached resources. Multiple instances can connect to that, which is good. We execute our applications as one or even more stateless processes. So now we even got rid of the state in the JVM of the actual request coming in. And we can scale out our web APIs, as well as the worker nodes individually by their process type, which is awesome. However, so we have gone from a very traditional monolith already to something that's asynchronous. We had the change in the REST API because now we started doing a set instead of creating, which means that we need to touch up the service level agreement. We are executing jobs asynchronously, which means we have to give a promise to the customers about how long it's going to take. And it's a promise to them, as much as it is to yourself, because you should actually test for this under as realistic circumstances as possible and potentially more than that. Now that's cool. So now we have an even more scalable API already, which is great. And eventually it gets even too much again. And so containers come and go and, for example, your web API may be overloaded again, or you have a bug. I mean, yeah, sometimes developers create bugs. You may be losing requests for customers because containers crash too much. And then those requests go down with them before you manage to put the stuff into the queue. Or you face up memory issues in your web API that might happen from time to time. But by the way, I think it's the second time now we're talking about memory issues. I think it's time. In the face of disaster, we have a couple of toys we want to share with you that we open sourced that we hope will make your life troubleshooting your applications much easier. We have. So the first one is something I really like. It's a Java plug-in for the CFCLI, which is super useful. It's just a plug-in that you install as you're used to it, and it enables you to use CF Java, heap them, and the application name, for example, just fetch a heap them out of your container. Same applies for thread dumps, and that's awesome. Moreover, not only you are always there when you want to heap them, sometimes you need a heap dump when you're not there to watch. And from Ben Hale, we have learned that there is the JVM kill mechanism that is coming to the JVM, where when there is a crash of the container, a heap dump will be done and saved for you, for example, on local volume. But sometimes you actually want to start troubleshooting before things get too dire. So we have open sourced a small Java agent, as in Java lang instrumentation API, that will monitor, depending on configurations that you provide, for example, the old generation has increased by more than 50% in the last two minutes, will monitor how the memory usage of your application is doing, and take heap dumps for you using the same mechanisms as the JVM kill, and we are working to integrate it in the community.buildpack in a way where you will not notice the difference between two mechanisms. You have heap dumps when you want them, you have heap dumps when it crashes, and you have heap dumps when you think you need it, because you specify in advance these conditions. Now, back to our REST API. We have split up in our monolith, the web API away from processing tasks by putting it in between backing services. Is it good enough? No. Or maybe yes, for a certain amount of time, but at some point, the load is too much. So what's the problem with our web API? It's another monolith. It's actually doing very different things. They are so different, in fact, that they are doing different types of tasks. Take the data, save it into, put it in the queue for processing, or go to a database and fetch the data and give it to the client. So we are actually using different dependencies. We are consuming different services. And when you start coding these APIs, in particular under load, you'll see that, necessarily, they behave differently in terms of how much CPU they need to process resources, how much memory, how long do the client connections stay open, how vulnerable they are in terms of dependencies going down. So what do we do? We split again. Of course we do. So now we split the web API into several different parts. So now we have, in that picture, we have a dedicated ingestion API, which just deals with the uploading part. And we have a query API. And to illustrate it even more, we split up the query API once again in order to distinguish between small query APIs as well as large query APIs. And since we're talking about the large query APIs, there is something really important, which you consider always when it comes to querying data, which is streaming. Tim, have you ever seen what happens when you actually do an old data query asking for top 50,000 resources of a type to a JVM, and you're not streaming? Gory. It gets like HellRiser kind of gory. It's really disgusting. Now, when you look at that part, you will most probably notice that even, for example, when our query API goes down like a brick for various reasons, we still have our ingestion API up and running, which is good. So you can basically even distinguish between business critical endpoints and other endpoints, which, I mean, Sam Newman is basically referring to that as splitting workloads, which pretty much hits the point, actually. And by splitting our APIs into different Cloud Fund applications, we need to hide this change from our users. How do we do that? With an edge router. What is an edge router? It's fundamental epoxy. Now, there is a bit of confusion in terms of terminology. Imagine it as a sort of very flexible proxy that can really split load well. And depending on what you're going to do in your application and how you want to split it, you'll have different capabilities in the edge router. For example, let's talk about REST APIs. We have a slash time series. And depending on the HTTP verb, get, or post, you actually want to do different applications. So we already know we have to split by path. We have to split by verb. Maybe we are doing service and events. And we are doing also more traditional HTTP APIs. And those behave very differently. So maybe we want to put the service and events into different nodes that scale separately. So now our edge router needs to be able to split by media type. And there are other things, for example, query parameters when you're using protocols like OData and you're putting a lot of logic into the queries. And this is something very important for Cloud. And that's why we fundamentally have a lot of proxies. So much that in the old times, we used to say that every web programmer makes its own text editor. And today in the Cloud, everyone has its own proxy. Or it feels like that fundamentally. We even have our own inside SAP. We call it the app router that does this type of splitings. And it works particularly well for our SAP Cloud platform. But there are many possibilities out there. And you should really pick your edge router to be the one that is best for you. And since we are talking about splitting APIs, so let's make a recap of what are the kind of things that we should consider when we want to split a web API in terms of making them work really well. It's about what we need to invest in terms of resources to serve requests. It's about which dependencies we need to consume, which services we need to consume to service those requests. It's about what we want to allow to fail independently. It's about how long we're going to handle the connections for the clients. For example, go routers and keep alive things, or just simply how many threads does your application server need to have in order to keep everything going on? Which brings us more or less to the end of the monolith part. Yeah, that's basically it for the splitting part. And since we're here already and we have some minutes left, I would just say we basically high-check the session a bit in order to share some pitfalls that we've seen around. And I personally really love pitfalls to see others telling about their pitfalls. The first one that we would like to share with you is something we usually refer to as the thread snare. I mean, every pitfall needs an awesome name. Threadsnair? That's everyone, actually. That's cool. Threadsnair. So when you look at the picture, we just have a simple container. And you have a JVM running in your container. Of course, you have all the different memory areas in your JVM. Ben Hale yesterday did an awesome job in explaining that in detail. Thanks for that. Now, the problem that we see very frequently is that people tend to get careless about the amount of threads. Wait, but Java does magic with threads, right? Yeah, but the problem with threads is that they cost memory. Now, when you look, for example, at a super simple example, when you just get a cash thread pool, it comes with a non-controlled amount of threads. There's no upper limit with that. And as said, every thread costs money, like a megabyte by default right now. And that basically is an unrestrained memory consumption. And if you look at that, you might end up with something like that, which is not very beautiful in production. So what can I do? What can I do in order to avoid that? Usually it's a very good practice in that kind of area to really know the upper limits of your threads in your application. So think about fixed thread pools, for example. Think about the app server threads. Tomcat, for example, used up to 200 threads by default. Look at libraries, for example. There are some very hungry libraries out there. Fork join pools if you use them. And what is very important there is that once you know your potential upper limit, that you really let the memory calculate later know about that. So to give Cloud Foundry the chance to consider that when calculating your memory settings during staging, so that's super important. I think there is even some documentation out there. We contributed that to the open source documentation in order to really explain that in detail and to help you out there. So we have established a trend by now. It's about memory efficiency. Too many threads, they eat up memory, and it's not even from the heap. It's outside, and that causes garden to get really PC and killing your containers. But you have to be memory efficient also and especially in the heap. And there is a little something that we see distressingly often that we like to call the death by list. Now, Java is a very convenient language. We have a lot of libraries. For example, we have the Java Persistence API. And those libraries are very convenient, and sometimes we get sloppy with them. For example, what we see over and over and over is that people just query the database, get whatever it's coming back. They take all those rows from databases, put them all in Java objects, slap them in a huge list, which cannot be garbage collected very easily actually at all, then transform it into JSON, all at once by giving it over to JSON or Jackson or whatever you're using, and then sending it over to the client. And until you're done marshaling everything to JSON, your list stays in the heap. And maybe it's huge. Maybe you think it's not, but it's huge. So that's the reason, for example, why just long, the ridiculously talented person we have seen opening this truck, in his talk, he sometimes says that we do JPA because we do bad life choices. There are ways to do sanitized JPA using the Max and size annotations and the paging, but there are better ways to do that. For example, streams and observables, reactive programming. What is the idea? The idea is that instead of getting all the data from the database and putting it all in a huge list and then take the huge list and put it all in a huge JSON, which is finally a huge string, we're going to process the data little by little. Maybe we fetch them in small chunks from the database using setFetchSize instead of downloading half of my SQL in memory. And then we process them out. And if you remember, we were talking about HTTP chunking and how that enables you to make streaming. And that's exactly the kind of things that you start doing here. As soon as the data is ready to be put out in front of the client, you start sending them out. You don't know in advance how many they are or how big it's going to be, but they will all get that out there and be very efficient. At the moment in Java, we have pretty good tools. In Java 8, we have Java streams that allow us to process fundamentally an event-based fashion, large amounts of data. We have even better. We have reactive, where we have much more flexibility. If you're stuck in a pre-Java 8 world, which sometimes happens, for example, you can use Eclipse link with scrollable results, where fundamentally you make your own small streams. And remember, setFetchSize is your friend. Now, when we start comparing reactive programming and Java streams, how many of you have heard of reactive programming? Erics Java, Reactor, awesome stuff, right? Nice. Very nice. So we have two fundamental differences. Streams are push-based. You subscribe to a stream, and the stream will go to the end until you do strange stuff like throwing gathers at some point just to make it break. With reactive, it's based on actually a pool model. The consumer is asking, give me more. Give me another item. Give me another mission. And then the consumer can stop. And that's very important when you write, for example, your web APIs, because sometimes the customer disconnects, even when you're halfway through giving responses, and you want just to stop it. Something also very important in terms of memory efficiency and using your resources well, which is fundamentally the base of cloud, is the deal of back pressure. The back pressure are mechanisms in reactive where the consumer can tell to the producer, for example, the piece of code that gets data from the database and gives it over from Marshall into JSON, that it's overwhelmed, it's too much. We are not making it fast enough. Please slow down. And that means that you're pulling up less resources to serve the same amount of requests. In terms of parallelism, we are getting there with reactive. Reactor is doing a pretty good job. And so is RxJava too. Streams are already available for parallelism, but I cannot get my head around for giant pools. So I don't really like them very much. Now, there is one thing to say about reactive programming. And it has a learning curve. It has a learning curve because like this and then you fall off. Because if you're not used to thinking in terms of functional programming, it's really hard. Nevertheless, when you get up there in the stars, it's absolutely worth it. It's really good, the kind of things that you're going to do. So we wrap it up, because in SAP tradition, we're over time. Oh, of course. Now, let's just briefly recap what we've just seen in our session. So you start with the monolith because it's about the smart choices. Then you start working on your SLAs in order to know what to monitor. And you think about, for example, asyn processing in order to splinter out or API splitting and the edge route in order to fulfill your SLAs. And you work on the topics that we were talking about, like memory management for Java in the cloud is king, is key, it has to work. Absolutely. But then you will eventually end up with awesomeness, which is what we're here for. So thank you for your time. If you like this press that like button, which makes me feel like a YouTuber. And if you like what you hear, come talk to us. We are hiring. And we wish you a wonderful day. Cheers. Thank you.