 Welcome to another Optoplanar video. This time we won't be talking about a use case, but I'll be talking about the Optoplanar benchmarker. The Optoplanar benchmarker is a toolkit you get with Optoplanar, and you don't put it in your production class pad. It's a jar which you only use during development to find out which optimization algorithm is best for your use case. So for example, let's suppose we have a number of data sets, data sets A, B, C, and D. In this case, it's for the cloud balancing use case, 100 computers, 300 processes in the data set A, twice as many computers and processes in data set B, and so forth. And then we benchmark this against a number of different solver configurations. So we try tab with search, which is a methodistic, similar thing, which is not a methodistic, and so forth. And what the benchmarker will do for us is he will take every data set for every solver configuration, and he will run those for the amount of time he specified. For example, let's say five minutes, or maybe just one minute depends on what you prefer. And then it will tell you the score results of those, and it will tell you which is the best one to use in production. And it will tell you, give you a lot more information. That's just the tip of the iceberg of the information that it will give you. So if you run a specific benchmark, he will give you this nice benchmarking report, which gives you all kinds of information, couple of statistics. As you can see, a couple of tables with information. And with this information, you can actually get to understand your use case better. And most importantly, you can easily decide which algorithm you want to use in production, because it also will tell you which algorithm is the best, actually. Okay, so how does this work? Well, first of all, you need to configure it, of course. So what you do is you normally have the solver configuration, right? As you see here on the left, the solver config, which has a solver root element. Now, instead of creating that, you create a new XML file, which has a planner benchmark root element. That planner benchmark root element has multiple or one or multiple solver benchmark elements. And such a solver benchmark element is just basically just a name and a solver configuration. So you can copy paste your solver configuration inside the solver benchmark element, basically. Now, because you will have multiple solver benchmarks, here's one, here's one, as you can see. And you don't want to repeat the same information all over again, every time for each solver benchmark, like the solution class, like the score director factory and so forth. You can have an inherited solver benchmark, as I've shown here, which has this general information, which applies for every single solver benchmark. So in this case, I'm setting the score factory and so forth. Also, as you can see, very important here, setting the time limit on five minutes. So every solver benchmark has five minutes. That's what it inherits, that's what it gets. Furthermore, we also need to specify our data sets. So where can it get those? And this isn't the problem benchmarks. So as you can see here, I have five data sets, which are over here, input solution files. Now, in my case, they're coming from an Xtreme to XML file, but you can easily get them from your own format or from the database by implementing the problems files file IO. So the problem file interface just has a read method and a write method, but you only have to need the read method in this case, and to be able to read the data sets into memory. Furthermore, what I also do is I can enable and disable certain statistics. So the summary statistics are always there, but the problem specific statistics, they might influence the result of the benchmark. So as a result, that's why they are disabled by default, because they create some overhead, right? And, but you can, as you can see, there are many here and you can enable them all if you want to and get more information. So when you do this, it will write this benchmark report and then more information in the output in the benchmark directory, which you can specify right here. So you can see I'm writing it to local data and then the use case. So when we go there, so first of all, this is a local file because I've run many benchmarks and I don't want to check these in and to get our subversion because they would take up too much space further. And here are all the different use cases I have, all the different examples I have, each one has their own benchmarking directory. And then if you take a look at, for example, the cloud balancing benchmark directory, for every time I run a benchmark, the benchmarking framework automatically creates a new timestamp as you can see here and writes it's the information in there. So, and sometimes as you can see, I suffix it to just to remember what it exactly does. Now, if you then look at one of these benchmarks that we ran, we can actually open it and we can see there's an index HTML file in there which is the benchmark report. So that's the one I'll be showing you earlier and I'll be showing you in a minute in detail. Now, furthermore, we also get for every data set and for every algorithm that we ran, we also get the, if it had these problem statistics, we get a CSV file. So we can actually post-process those and we get the graphs which I'll be showing in the report too. So there's lots of information in there. Okay, so let's take a look at the benchmark report. So the first thing we have is we have the best score summary. This is by far, this one is the most interesting graphs. What it shows you is for every data set, so you can see the data sets on the bottom here, all five of them, it shows every algorithm. So every color here is a different algorithm and it shows you the result of that and higher is better. So you can easily see that if you compare these algorithms that the red one is terrible, the blue one is less terrible and then the green, yellow and pink one are competitive and they're pretty much the same, but of course, one is as likely better on average than the other, right? And if you actually look at that, the actual, here we also get the same information but in the table format. So this is data set one, right? And each line is one of the solver configurations. So data set one against, let's say tabu search is this score and against similar leading, then it's this score. You can see that this score is actually better. It's in the cloud, but this is for cloud balancing. So to present the maintenance cost, we have to pay for our cloud, to our cloud provisioner. So the less computers we use, the less we have to pay. So this is of course more interesting. Here we have to pay 930 less than over there. So that's interesting. But you can see that the big differences, of course, with those two other algorithms. Now to quickly explain the difference, these three are the meta heuristic algorithms, the things which opto planar, the real interesting algorithms in opto planar and these two are the first fit construction heuristics. So first fit decreasing one. So what you will find in most companies who don't use opto planar yet or who have human planners is that the results will basically be similar to first fit or first fit decreasing. So yeah, this is of course, this is playing, these algorithms play in a different ballpark, right? So what the benchmarker here tells you is that if he would have a choice, he would say, I recommend you to use similar needing in production because if you look at the average score, that's actually the best one. Now do notice that it's not always the best on every single data set. As you can see here, it's the best on four out of five data sets. Well, on this particular data sets in those five minutes, tabo search is the best. Still, I would be using similar needing in production because yeah, we cannot predict in advance, which because there's a little bit of, the algorithms takes on random, randomness into account and that's just the way meta-horristics work and that there might be supplied differences and it might be a little bit better or a little bit worse on a particular data set. What you do have to know is that they always start from the results of the first fit decreasing algorithm. So they will always be better than that. So you will never see that these meta-horristics are worse than the blue ones. So the three last colors will always be at least as good as the blue one. And as you can see, in 99.9% of the cases, if not more, they will be seriously better, right? Now, we have many other statistics, but not interesting one is to show this relatively. So what we can do is we can take the worst algorithm, which is the first fit decreasing algorithm and then compare the other algorithms against that relatively how much percent it's better. So how much percent are we basically saving and maintenance fees for our cloud provision, right? So if we do that, we can see that over here, the green one on the first data set is 90% better. The yellow one is 20% better. So we can then average that out again and we can see that similarly is on average 21% better than first fit. So if your company is currently using first fit or maybe they're probably not using first fit, that most companies actually go for first fit decreasing, then you can easily see that it will be 21 minus 4% is about 17% better. So which is a nice gain, right? If you can save 17% on your cloud costs, if you're a big company, that can be a lot of money. So it's definitely worth investing in these, in using these algorithms and a solver such as an opto plan, right? Now on top of, so this is, now on top of that, we also have performance summaries, performance graphs. So what it does here is it prints how fast it's solving, how many scores per second it can calculate. So what is the score per, so every time opto plan changes something, it calculates the score. And if you have a good score calculation, so if you're using the rules or the Java, the incremental Java one, what you will see is you will see no degradation as the problem scales out. So on the X axis, we have the problem scale, which is the number of entities times number of value. So as the data set gets bigger, we get more over here. So let me just show you that. So for the 100 and 300 processes, the problem scale is basically the multiplication of those, of the number of computers, number of times the number of processes. So we get 30,000 here. And for the bigger cases, we get a much bigger number of course, right? So that's what we see on the X axis. And on the Y axis, we then see how fast, how many scores per second we can calculate. Now, interesting thing to note here is as the problem scales out, the degradation in score calculation is actually not that big. It actually, as you can see there, we don't lose, it doesn't get a lot slower if our problem becomes a lot bigger. And that's a good thing, because if we, for example, using EZ's Java score equation, which doesn't do incremental stuff, which doesn't do deltas, you would see that it actually, it's terrible, it basically goes to null as the problem becomes bigger. And that's not a good thing, of course, because that means we cannot scale. So this is very important to be able to scale. If you wanna be able to scale, you have to have a good, this graph needs to look good. And if you use rule score calculation, then you will probably, then it should be looking good in the form of this thing. And if you use incremental Java score calculation, you can actually go a little bit faster, or actually, you can go, quest scalability is about the same, but performance-wise, you can actually go faster. But there is a lot more maintenance work to make that happen and to keep that up to date. So we definitely recommend to stick with rules, which is the best option. Now, what you can also see is that some of the algorithms are actually a lot faster. That's not, that's because it's unfair because the first fit and the first decreasing algorithm over here, they are faster because they calculate the score of partially solutions instead of entire solutions. So they're cheating. And I'm not really cheating, but that's just the way they work. They just, they construct partial solutions. So it's not a fair comparison to compare them with the others. In this particular case, the tabu search one is also better. And that's an interesting thing here. That's because we enabled all of the problem statistics and they actually take some, they have some overhead and they have a much, much bigger overhead on the similarity and late acceptance that they have on tabu search because the tabu search is a slow-stepping algorithm and the other are fast-stepping algorithms. Again, this is the detail, but what you will see, like I can show you in another benchmark I ran is that normally tabu search, you can see here, tabu search late acceptance that they do have about the same speed if you don't enable all those extra problem benchmarks. Now, so what are those problem benchmarks? Okay, so we have a couple of other benchmark, summary benchmarks, of course, but they are not that interesting. Shows you how much time we spend, like for example, we spend on all algorithms five minutes, you can see that on the construction statistics, they cannot, when they are finished, they are finished, they cannot use that extra time they got of those five minutes so they finish up, finish early, but you can see that all the metrics nicely stopped at five minutes. Now, what we can, so what I spoke earlier about is the problem statistics. So for one particular problem data set, like for example, for this data set, we can generate also statistics. Now, if we, okay, shown before, we actually have to enable those, right? So you see here I've just enabled the best score for this one, but you can enable all of them. In this case, they're actually all enabled, as you can see, while in the other, and this is another benchmark of the, where only one of them is enabled. Important to note is that if you enable them, they will influence the benchmark. The best score one won't influence it much, but some like the memory use and so forth, they will definitely influence the benchmark and will give, and will basically make it slower. And that's also what you saw in the performance for earlier, and this affects some algorithms more than it does others, right? So by default, they're not enabled. That's the main reason why they're not enabled by default. So, okay, what does the best score one show? Is the best score one shows us how the best score evolves over time. So as we give it more time, how much does this best score improve? And this is how this graph should look like, because what you see is, if in the first few seconds, it increases a lot the score, gets a lot better, right? And then it flatlines as we get to the near optimal, to the near optimal solution or to the optimal solution. So, and at some point, it flatlines up to the optimal solution, of course. Now, it's important, so that's interesting to note, that if you would give it, let's say 20 minutes, this is a score which you would get with top research, and this is the score you would get with similar, with late acceptance with similar annealing, you might actually get even a better score because that actually projects itself to the amount of time it is given, but the others are more what you see here is what you get. Now, an interesting thing there is that, so you could say, I can just run it for 20 minutes in prediction, but if you have five minutes, why not run it for five minutes and you get a slightly better score, without, if that extra time doesn't really, if that extra time, if you have it, why not use it? Interesting thing is that, if your graph doesn't look like this, if it looks like a line is going up, is that there's plenty of room for optimization that you've not even reached started flatlining yet. And as we actually go to bigger and bigger data sets, you will see that we are less flatlining already, right? So, this is less sharp, that's an interesting observation. Then you get a clue, okay, maybe I need to spend more time in this or I need to find ways to speed this stuff up and we have a couple of ways to do that. And then we will get, we will flatline earlier. Well, once you've started flatlining, you basically know that it's really not worth investing more time to get this better and better because this is what we'll get, right? Okay, furthermore, we have many other statistics I won't go into them. Some actually show the particular constraint type. So if you have multiple constraints type, all of the soft constraints, for example, you can actually show how they, their score related to each other and how it evolves and so forth. We have many more statistics. On top of that, we all, so those were the problems, statistics and the problem data sets. Then we also have for each solver configuration that you configured, we actually out print, we print out what you, how it was configured. So that allows you to easily copy paste that into your prediction solver configuration. So in this case, it's telling you, you should be using similar data in production. So you can copy paste this one and put that one into your solver configuration, right? And you can ignore the other ones. Furthermore, we have some benchmark information which tells you on which kind of machine it run, which kind of Java and so forth, which version of opta planner. This allows you to easily check if two benchmarks, if you ever have two benchmarks, if they were, if they are a fair comparison or not, right? Okay. So that's what the basic benchmarking tool could basically gives you. But on top of that, we can do a couple of other things, right? So maybe you've seen it or maybe you haven't, but no, you haven't, but let me show you. So this is our data there. So where I have the curriculum course, right? And as you can see at some point around the curriculum course on JDK6 and JDK7 and also on JDK8 actually, although I didn't suffix it here. And I want to see how they compare against each other. Another problem is that in one particular benchmark, you run one against one particular JVM against one particular code base. You can configure different solver configurations, but you cannot tell them to use different code bases or different JVMs or stuff like that. So what you can, but how did we solve this quite easily? Well, you just run both of them. And then after you've run them, you take the aggregator, the benchmark aggregator, and you can basically, and what this allows you to do is, it's just some local app you can start up. You can select two benchmarks, right? There are two different benchmarks and you can generate a report of how they look against each other. So I'm now taking these two benchmarks, which both of them have only one solver configuration regrettably and then I'm generating it. And once this is done, I can show this in the browser, right? And then it will show me in the browser as you can see how they compare against each other. And apparently they, in this particular case, it's quite clear which one is better and which one is not. I don't think they ran long in this case. So that's one thing you can do, right? And then the second thing you can do is, the second thing you can do more than just use the benchmarker is use the templating support. So how does that work? Well, let's say you wanna try, you wanna tweak a particular value, right? So let's say you wanna tweak, really tweak the taboo value, right? So you're saying, okay, the default is good, but if I can tweak it and get 1% improvement, well, my company is actually quite big, so spending 1% less expenses for a particular planning problem will save a lot of money. And it's really worth spending that extra time to tweak it a little bit further. So for the power users, what you can do is you can tweak these particular values and instead of using the default. What you can then do is, with the templating benchmark, what you can do is you configure a planner benchmark as you did with an XML file, but this time we do it in a free marker XML file. And then you just configure one solver benchmark, as you can see here, but we put it in a for loop. Actually, it's in two for loops in this case. We have a for loop for different entity taboo values and a for loop for different account, accepted account limit values. And so for every combination of these two lists, it's going to add another solver benchmark. And so we get a lot of solver benchmarks, right? And then for each of those solver benchmarks, it will run them against each of the input solution files. So it will be a lot of benchmarks, right? So you definitely wanna run this during the night. And then if you come back in the morning, then you'll get a large report with lots of information. So and it might look a little bit like this one, where as you can see, we ran a lot of, here it was only five data sets, but we ran a lot of different solver configurations. You can see that we're actually only, there was this one with where we had the 10, 20, the 30, which was used by a free marker variable basically. And you can then see which is the best one. Now, luckily we have the graph, we have the table because in the graph, it becomes too, there's just too much information, nothing becomes readable anymore. But in the graph, what we can then easily see is, okay, this one is the best one. This is the one which I'll keep, right? Furthermore, with the AccurGator, we can actually throw out, filter out the non-interesting ones. So we can actually get a graph which will be readable again, except instead of this one, right? But so the templating benchmark is very powerful to do power tweaking basically, right? Just make sure you don't overfit, over tweak, just like you shouldn't be over tweaking your JVM garbage collector. You just be careful not doing that then, to avoid doing that, the best remedy for that is just to add more problem benchmarks, data sets, they make sure that you don't over tweak, okay? So that's my video for today. So I hope you enjoy watching and if you want more information about Autoplaner, just go to the website autoplaner.org.