 My name, well, as you can see, so let's start with the topic. That's my topic for today. I hope you enjoy it. Basically, the audience for this topic is any developer who's looking to build a high throughput system, wants to monitor his apps on Cloud Foundry, is interested in significant load testing of those apps. And of course, anybody who ate the pancake breakfast this morning who's looking for a recovery area, please feel free to attend as well. My name is Dale Robinson. I am the anchor for the A6 project. The team I work on is an XP team consisting of eight developers. I work for Arati, which is a company that does the publisher and subscription processing for a number of companies you may have heard of, such as Allstate and Insurance. And the application itself is basically responsible for ingesting all of the sensor data for those companies. My previous history, as it were, in terms of before I started out on Cloud Foundry and moved across onto Pivotal's implementation of it, was Java Enterprise, Java E in particular, working on, as Justin mentioned previously, the rapidly less useful web sphere. So in terms of the application that we were writing and wanting to use, called A6, one of the very first requests we were given when certainly when I joined the application, or application team, was to test and see whether or not the application was actually capable of ingesting the number of messages being sent by the IoT devices. To that end, the first thing we had to do was obviously find some way to actually test the application. So in short, we had an application. And we had A6. But what we really wanted was a tent. We wanted to know that the performance of the application was suitable. Thanks for that laugh. I'm actually in two minds about this slide because quite simply put, when I first showed it to the marketing team, I suggested quite jokingly that maybe we should switch out Chris Hemsworth and myself, and I should be the tent, and he can be the six. All I can say is that the laugh that I got from that was longer than the one we just heard. So, OK, so moving on from that, in terms of the application itself, what we wanted it to be able to do was to deal with a significant number of concurrent requests specified, certainly by our customers, being somewhere in the region of 7,000 concurrent requests. That was the biggest issue that we faced. And obviously, the only way you're going to know that you can do that is if you actually can generate that number of requests into your application and see what it does. So we looked at various options internal to the company and obviously Arati, as a company moving from a waterfall environment into a more agile process, it had its own internal testing teams. And obviously, one of those teams was a load testing team, but, excuse me, the problem that you always have with that scenario is that when you have a load testing team, you have to go through the usual processes of obtaining resourcing to help you with your testing, writing scripts, sitting down with those resources and explaining to them exactly what you need them to do. And obviously, the time you have to book on the actual load testing environment and tools to ensure that you can actually do this processing. So we wanted to avoid all of that and make our lives a little bit easier and hopefully turn this around a little bit quicker. So what we decided to do was to make use of the actual Cloud Foundry infrastructure that was available to us. Now, in previous environments, if you ever wanted to use a server or anything like that, you always had to go through requisitions, et cetera, et cetera, et cetera. And that was always a pain. It would take an uncertain amount of time. The big benefit obviously that Cloud Priority provides out of the box is the ability to spin these things up automatically. So what we decided to do was to simply embed the actual Jmeter tool inside of a Spring Boot application and deploy that into Cloud Foundry and then use an endpoint to simply kick off the test. We weren't looking for a complete or detailed solution. We just wanted some way to push enough load into our application to prove that the application could deal with it. So if you look at the screen on the left-hand side here is the very basic program in terms of embedding Spring, sorry, in terms of embedding Jmeter into Spring. You really only need three classes, the application class, the property loader and the JMEX processor. The Java processor was just happened to be there when I was doing the screenshot, so unfortunately it got included. On the right-hand side you can see the actual information that is held inside of the Jmeter YAML file and it just really contains these three values which are then collected by Spring itself and injected into these fields. Finally, the dependencies for this project are also quite small. You can see there's really only just the four. The only thing to be aware of is to the actual exclude module over here which prevents Spring's logging, causing any carnage with Jmeter's internal logging. In terms of getting Jmeter itself to run, all you need is these two libraries, the Apache Jmeter HTTP and Apache Jmeter Java JAWS. Once you have those in place, you can kick off the actual Jmeter tool itself and what we decided to do is to actually use a JMEX file which you can obviously create quite easily using Jmeter's own UI. So reading that in, using it to kick off the test which we then kicked off simply with a curl command. This then is the endpoint as defined inside of the Spring boot application. You can see it simply going to listen on slash JMEX as the URL. As I said before, kicking it off quite easily with a post call using curl. The internals of the program are quite simple. All of the information, if I step back one, all of the information that you see over here with regards to these property files or the actual internal configuration files for Jmeter itself, you can get those from either a local installation of Jmeter or alternatively in the same way that you can import these Apache Jmeter HTTP files. You can look up the Apache Jmeter config information and retrieve those files from there. So as long as we make those available on the class path as a resource, the application will load and will then kick off the JMEX file that we also have specified within the application. This then was a nice simple way for us to kick off any low tests. So once we've got to this point, the good news was that we were able to kick off the low tests. The bad news was that we didn't have sufficient load, even though we were using Cloudfinery, we didn't have sufficient load running from that single application. The reason why and the reason why we couldn't just have used our local machines ourselves, anyway, was because of the fact that my team is based in the UK, the data centers for our company are based in the US, which meant that there was a horrible latency between those two continents whenever you try to run your low tests. So the easiest way was to simply embed the application in one of the data centers and then get real information doing it that way. Unfortunately, deploying just the one application, what we found, and this was obviously going to differ based on your Cloudfinery implementation or your Cloudfinery platform, was that we certainly at that time couldn't increase the number of instances in terms of the so-called virtual users that JME to allow you to use beyond, say, 500. The minute we went down that route, what would happen was JME to would basically crash up. So to hit our target of 7,000 users or simulated users, what we had to do is we needed to have 14 instances of the application deployed. The minute you go down the route of having 14 instances of the application deployed for this type of problem, you run into one very significant issue very quickly and that is orchestration. The reason why orchestration becomes such a big issue is because the instances themselves are load balanced. So there's no easy way for you to send the message to tell the application to kick off without some sort of trickery, really, because the load balancer will simply send the message just to one of the instances. So how did we get around that issue? Oh, I think I've jumped too far again. At the time, we were sitting on Pivotal Cloudfinery version 1.6 and at that point, there was no way to directly invoke a instance. So we had to use a rather horrible hack and the horrible hack was simply to create a script and iterate through a name appending a number so that we would have a list of routes into the applications and we could then call those routes using another script. Keeping in mind that we weren't interested in kicking off the tests immediately or at the specific configured time, something which you could do by simply updating your JMX file and importing it and writing it the way you want to. But in our particular case, we were intending to run these tests for a sustained amount of time and the easiest way to do that was to simply just kick them off and let them start running. We were running for hours and hours and hours. So the milliseconds in between each test suite kicking off didn't matter to us that much. Thankfully, there are now much neater ways to achieve the same thing. As of PCF 1.7, and I believe in PCF 1.9, they've actually taken it a step further. In PCF 1.7, you could basically use a curl command, a CF curl command to retrieve information about the IP addresses of the instances. And then you can basically pull those IP addresses out and obviously do exactly the same thing as what we were doing in a formal sensible way. And at least that allowed you to have the instances themselves. In other words, to do the scaling within Cloud Foundry rather than having to push up 5, 10, 20 apps under different names. And even neater possible solution is to use, assuming that the option is available inside of your company if you're using Cloud Foundry is to embed the actual Jamie to application inside of a doc image and push that up and then potentially do the necessary steps to orchestrate that. I personally haven't done that, so I can't comment on it, but it certainly looks like a neater solution or an easy way to do it. So the great news was that we'd finally managed to increase the load of our application and we got to the point where, yes, the application itself was running really, really poorly, unfortunately. It's the only way to put it. We found out that the application itself could not deal with the load that we had. Why was there anything, you know, why was that a problem? What was the issue? Did we have a really poorly written code? Was there some sort of underlying cause? Well, the application was developed, it worked, did everything correctly, but it obviously followed the maximum of no premature optimization. So we were quite happy because it meant from our perspective at least we had a lot of opportunity to actually tweak it further. So we started to investigate the options for finding where the actual problem was inside of our application. Now, at this point, it's probably very useful for me to describe the actual architecture and the reasons why the architecture becomes so important in load testing. The normal tests that we had in our application, things like our acceptance tests to our tests, you know, our ordinary unit tests, really, they're only interested in the actual result that you're getting. So if you make the acceptance test call, you get a result back and basically it says that, yes, your code is working exactly as you want, but it's not really telling you very much about how the code executes in a live environment. And again, as I said, my team was UK-based. So if we try to run this application from the UK, going across the water, immediately any monitoring that we did using a tool like Visual VM would highlight the fact that there was some sort of database, you know, the application was database constrained, which may have been the case, but unfortunately, because of the fact that it was reaching across the Atlantic, was quite often an artificial representation. So what we really, really wanted to do was to be able to actually monitor the application inside of Cloud Foundry itself. And to that end, these are the steps that you would need to do if you wanted to do that yourself. It's actually quite easy. I've given the example here using Pivotal's Cloud Foundry, but the reality, I did a quick check and had a look at the Bluemix setup. The actual instructions are universal to any Cloud Foundry application, so you should be able to do this if you're using Bluemix. I have included a link at the end of the presentation. You can go to the slide deck if you are inclined to go and look at a website that I found to do it. But anyway, let me press on with the Pivotal version. So, using the Pivotal version, there's only three real steps that you need to take. The first step is to actually enable the ability to perform JMAX calls into the application. Second step is to create an SSH tunnel so that Jmeter or YourKit or Jprofiller, whichever tool you want to use, can actually reach into the container and connect to the container and start to monitor it. And obviously, the final step with regards to the process is to actually establish that connection from Jmeter into the application. So, if we look at the steps that are certainly defined over here, just to cover it off in case you're curious, the CFSSH step in the middle there, the dash n, dash t, dash l, all that's really telling the container is that it doesn't need to open this up in any other form other than headless mode. In other words, you're not gonna get a presentation of the screen, so you don't have a TTY that comes up. And the container shouldn't expect an incoming command. It's literally just a pipe into it. The 5,000 port number is something that's defined in the Java build pack. So, basically, all you're trying to do is link your local port to the port in your actual container. So, here's the next step of the process. We're basically, you're now trying to add the JMS connection into the container. Handily, we used Visual VM because, again, we were trying to avoid the pain of having to go through a procurement process, which is inevitable if you want to use something like J-Profiler or your kit. We decided that it would just be much simpler if we made use of the free tool and if we needed to, we could fall into the process of trying to get the necessary resources to buy a product. As it turned out, we didn't need to go very much further. The Visual VM was actually quite useful and quite capable. So, on screen, you can see the initial step to open up a JNX connection. All you need to do then is specify the actual local host port that you want to use after you've established the associated connection from the command line using the CFSSH command shown previously. And once you click on the connection in Visual VM, you should be presented with a screen very similar to this. Now, you will notice that on the slide, I've highlighted the heap dump button. The reason I've highlighted that is because, unfortunately, when you monitor things inside of the application, the biggest problem you have is that not all of the features are available as they would normally be if you're running Visual VM locally. Specifically, and probably the biggest one that's missing, is heap dump. Basically, if you want to make use of heap dumps in Cloud Foundry, there are a couple of extra hoops that you need to go through to get an actual heap dump out of the container. So I'll cover off the actual issue with regards to why heap dumps are such a pain, but initially, I'm going to talk through the two options that we used to actually get around the problem. And then on the, well, four slides from now, I'll explain exactly what the heap dump process is doing and why it causes such issues. So the first way you can access the heap dump profile or the heap dump files is simply to use the mBean method in Visual VM. So if I step back, you will see in the middle of the screen over here, there's a button called mBeans. You are simply clicking on that, and you should be presented with a screen similar to this. If you navigate to the common sign management section of the list shown on the left-hand side here, and you go to the hotspot diagnostic class, there's a method that is exposed called dump heap. The first parameter of this method is the actual name of the file. So all you need to do is specify the name of the file as you want it to be in terms of what you want to use. And the second one is a flag which indicates whether or not you wanted to only save and dump live objects or if you want to dump everything, including items that are potentially not available or being garbage collected. Another operation, although this one is available, for some reason I managed to confuse myself and I looked at it initially, is the thread dump. You can actually run this from some visual VM, but I'm highlighting it here anyway just because the slideback was basically built before I found the fact that you could run it directly. So this shows the thread print command. Again, if you just take out the string array argument on the right-hand side over here and you run the command directly, you will basically be able to produce a thread dump. The second mechanism is to, and this obviously assumes that you're using spring, is to use the spring boot actuators. First you need to confirm that they're enabled and sensitive. The reason why I like that is because different versions, I think it was version 1.4 of spring boot, basically defaulted these to be unavailable, whereas previous versions had made them available out of the box. So if you want to make them available, you actually have to go into either your application properties or your application YAML and set the associated flag. Once you've done that, you should be able to simply call into your application and specify the associated URL that you see or complete the URL, and when you hit enter at that point in the browser, the actual application will work. I will say this, at the end of this process, I'm hoping I've got enough time to do it. I will actually go through, or at least I'll load an application and just show you how these work. If you need more information, certainly the spring documentation is very, very useful. There are plenty of endpoints that are not being described here, but these are the four that are found to be most useful in terms of trying to solve problems. Right, so to come back to the issue around heap dumps, the real problem is the fact that when you request a heap dump, what happens is that the heap dump is saved into the container, okay? So you actually have to increase the disk space that you have available on your container or else your heap dump doesn't work. Not only that, when you use the MB mechanism, you actually have to log into the container to find the file and then copy it out. And the only way, well, the way I did that was to use SCP. It works perfectly well, but unfortunately it's just a little bit more clunky, a little bit more difficult to do. I found the spring actuator mechanism much, much easier to use because what it'll do is it'll actually do the dump for you and then it'll copy out and giz up the file and drop it onto your local hard drive, making it all that little bit easier, just a little bit more useful. Reasons for using one over the other, well, the only real reason I could see you using the MB mechanism is because of the fact that maybe you've ported a G application into Cloud Foundry or you're not running Spring Boot specifically. So as far as the journey in terms of the ASICS application that we deployed, we now had a situation where we could certainly create the load we needed to prove that our application was or wasn't able to achieve its objectives and we'd now figured out a way to look inside of those containers and prove that, or at least look at what the issues are and identify the problems that we found. Once we'd found those, we obviously now started to look at what we could do to try and fix them. So as a person who's coming from a background in a Java E enterprise, a background basically, what I can tell you to use, the big lesson we've learned is just how useful and interesting microservices are in terms of solving particular problems, especially in Cloud Foundry. The reason why I'm highlighting that is if you've ever tried to connect to an application and probably one of our microservices is very useful to highlight this point, one of the microservices we had basically only had four classes in it, but if you connected with Visual VM to that application, you'd see that there were over 10,000 classes loaded. Now, one of the biggest issues that you tend to have is that when you've got these large, shall we say monolith applications, the problem is all those edge cases, the bits where you're interacting with either the G container or you're interacting with even in this case, Tomcat, because Spring Boot obviously has Tomcat embedded in it. You run into all of these issues where something weird happens between your application and the code. And the big benefit that we found with microservices, apart from surprising performance benefits, was just that it allowed us to think of the function that the code was performing in much, much simpler terms. And not only that, gave us more control in terms of fine-grained control around the number of instances we needed to actually perform the necessary work that each service was providing. So the A6 application initially, it wasn't a particularly complex application in terms of the code itself, but a lot of the endpoints that we had were all embedded in really one application. So when we started to do an investigation into where the performance blockers were, what we found was if we broke out certain of these components into their own microservices and embedded them behind a RabbitMQ queuing, well, RabbitMQ queues, we were able to spin up and provide that fine-grained control I mentioned earlier more effectively when it read off those queues and pushed it downstream to either our customers or onto databases, et cetera. So that led to a more distributed architecture for our entire application. So basically what we had was we had the endpoints defined at the beginning or at the entry point of the system, the actual API, that would then dump information onto various queues. Each of these blocks that are listed here are essentially separated by queues. So where we wanted to save data to the database, for example, the API would write onto RabbitQ. The benefit of writing onto the RabbitQ is obviously that the RabbitQs are not indexed, so there's no constraint issues, so they tend to perform a little bit faster than RDSes and you could just throw a lot of data at them and then using a listener to read off the RabbitQ, you could dump the data onto the database in your own time and obviously give yourself the fine-grained control you might need or want in terms of scaling up the requisite number of instances, either during peak periods or as needed. The other big benefit, obviously, of putting RabbitQs in between these components was simply that it provided a buffer in situations where something went wrong or went down. Things would write into the RabbitQ and you could always recover from the situation a little bit later by applying whatever fix was necessary, bringing up more instances, et cetera, et cetera. So finally, we basically had a situation where most of our problems in terms of performance were resolved by using these microservices. However, the two biggest ones that we had issues around were specifically our persistence mechanism and our downstream customers in terms of pushing down to them. So from their perspective, we had to investigate a few more options to try and get their, shall we say, the speed of up to scratch because ultimately they weren't able to meet the target that we had been set, which was obviously the 7,000 concurrent messages a second. So to that end, we went back to some of the more interesting or one of the more useful spring annotations which is the AdAsync annotation. Some warnings around using that Async annotation. In our particular case, because we're processing a lot of IoT sensor data, at the moment I think we're pushing well over 100 million messages a day. And what we're seeing there is that when you've got a, the law of large numbers comes into play. This is really data that our data scientists in the company use to do analysis to, for whatever data scientists do basically. And quite simply, if we lose one or two of those sensor messages, it doesn't really matter that much. The law of large numbers is just basically that there's so much data coming in, one or two messages lost on can it cause any problems. So we didn't have to be too concerned about the actual issues around using the Async annotation, but your mileage will obviously vary. So these are some of the things that you'd need to keep in mind if you do decide to go down this route. The thread pool, firstly the thread pool and the queue that it's using is essentially unconstrained. So if you don't define limits for that under the Async process by defining your own thread, well, executable, you will definitely run out of memory and it will explode. So definitely something to keep in mind when you go down this route. Another thing is that if you look at the applications that we defined, when we'd written the RabbitMQ listener services, what we had was on the one hand, we had an application that was listening to an incoming input output device, which was obviously going to be relatively slow. And on the other side of the application, if we take the persistence process, it was writing to a database. Both instances, in both scenarios, what you had was a synchronous application before we put this in. And because it was synchronous, it meant it was very, very slow. It was basically waiting for the read to happen and it was waiting for the write to happen before it would do the acknowledgement back to the RabbitMQ to say, look, this message has now been processed, we can get rid of it. So what we did was we inserted the async process, literally right slapping in the middle of the application. So the listeners were able to read much faster. They would offload the message that needed to be saved or processed onto the second part of the process, which was the asynchronous thread that was running independently of the listener. And by doing that, the performance improvements we saw in one particular instance, for example, we had a test that before that was running in the vicinity of 130 seconds to complete the test, it had dropped to below 15 seconds. So definitely a very useful performance improvement. But the biggest issue that you obviously have when you decouple your reading and your writing is in the scenario that I just described is that it basically acknowledges the message as being received before it actually has been processed. So if you are writing something similar, you need to be aware that you need to take certain precautions to save that data or do something about it if there is a problem on your application side. The other issue, of course, is error handling. One of the biggest problems is that if you are throwing errors in the second thread, obviously there's no link between thread A and thread B. There's no easy way for the two to communicate with each other, so you need to do something to manage that side of the process. We ended up using the error advice controller mechanism from Spring. It worked perfectly well for our needs. But just how I like that in terms of what you guys are doing. And I think I'm running a little bit close on time here so I need to pick up pace. One of the things that we thought we would ever look at and when I say thought we would ever look at, I suppose I'm really blaming myself here because it's inevitable that as developers we think that those guys from Spring or Hibernate or whatever may not know what they're doing and that I can do a better job, really. So obviously I bumped my head on this one quite extensively and I would say that the general defaults that you're gonna find for things like Tomcat and pretty much any other framework are probably gonna be a pretty good fit for your application. I can tell you that after we tinkered with these settings we actually reverted back to the standard defaults because they worked for us. There definitely were things that you could tweak and improve in certain areas, but ultimately they were unnecessary. So the default here are 200 threads, 10,000 connections, 200 worker threads, basically these are the threads inside of Tomcat that'll actually do the processing of the incoming messages. Tomcat will accept 10,000 connections itself and there's an additional OS component where the OS is requested to store Q of messages which defaults to 100. Of course this Q is not guaranteed, it's really up to the OS in terms of whether or not it does or doesn't do it the way that Tomcat wants and then finally there are 100 Keep Alive requests. If you want to tinker with it, you have some sample code. Again, I have included the links to basically the application that runs this code. So if you want to download the application, it's just on GitHub, you can obviously play around with it and see what it does and monitor it through Visual VM yourself. So in terms of what you can see over here, this is literally the nuclear option for configuring the Tomcat server in the underlying Spring Boot application. The information over here, certainly the first four that you see there are all around setting up the values that you saw for the server defaults. So the 200 threads is the maximum connection, sorry, the max threads value, the acceptor, the max connections, in this case is set down to 1,000 from 10,000 and the backlog value equates to Tomcat's accept count value which sets the information inside of the OS. At the bottom you can see the Keep Alive requests. The reason why that might be useful to you is because of the fact that if you've got a lot of concurrent requests and none of them are actually being maintaining their connections, it's quite easy just to shut them down. You obviously have the higher initial connection costs of an incoming message going on to Tomcat but you have the benefit of not sitting around waiting for the actual connection to time out if the client is not closing the connection properly. One lost area of configuration because we used rabbit was the actual configuration of the rabbit connection queues. This actually did give us a benefit, simply because of the fact that rabbit only supplies one connection and 25 channels when you specified as default connection. The reason why that caused a problem in our application was actually again because of our application's own fault. The way in which our application was working is it was reusing this connection over and over and over and once we broke out the application into the microservices described before, the need for this configuration actually reduced. So quite simply, you know how to configure it and as I've already alluded, you probably shouldn't. Sorry, the actual defaults, as I said before, are really, really good. Again, I reiterate, we actually reverted back after tinkering around with these things. We reverted back to the default settings and they worked perfectly well. Lost bit of pain that we experienced was around security-driven development, which I can only say caused us some entertainment. I was very popular on this day. So we basically tuned the application, we'd got it at the point where it was running really, really well and what we decided to do was run a full test, but in this particular case we wanted to run it where it was passing in bad data. So that killed Cloud Foundry. Or rather, it didn't kill Cloud Foundry, it ran foul in our system because of a security policy which said that if we were sending out, if it got too many requests that led to certain errors, the actual security policy in the company would assume that what was happening was a denial of service attack and as a protection mechanism it actually blocked any more incoming requests. Unfortunately, it blocked all of the incoming requests for everybody who wanted to use Cloud Foundry. So as I say, I wasn't very popular that day. Finally, almost the last slide, these are the commands that I was referring to earlier. So if you are using anything about PCF version 1.7, you can use this curl command to get the information about the IP addresses for the underlying instances if you wanna hit those instances directly for any reason. Additionally, just for your own awareness, it is absolutely super simple to swap out Tomcat for JTU undertow. Again, the spring documentation deals with it quite extensively. It's literally four lines of code and you can switch to any alternate server container. A couple other things to be aware of, the async rabbit template and the async rest template also again, spring constructions. They may make your life easier. I had a look at these and some of the work that we actually ended up doing ourselves. Once these came out, we could actually use these instead or in preference to it. And the last one, which we've actually found quite useful just from a learning perspective is PCF Dev. Again, available on the Pivotal website, it just means that you can actually deploy stuff locally on your own machine. It reacts in very much the same way as a real cloud foundry and it gives you the option of embedding an application and just running it and tinkering around it for yourself. These are the links that I refer to. The actual slide deck is posted up onto shed.com which means that if you just log into there, you can actually pull it down or you'd have access to these links if you want to have a look at any of it further including the code I was referring to. Thank you very much.