 Hi, everyone. You may have seen me earlier today talking about some deep learning and how we might apply it to the verification of open JDK implementations. Now I'm going to talk about how we are verifying open JDK implementations at adopt open JDK. And I'm calling this the open and transparent quality kit. And this is sort of a repeat of the slide from earlier, but just so everyone who wasn't there in the talk earlier knows. At adopt open JDK, we are testing multiple open JDK implementations. In fact, we have tested four different ones. We have multiple Java versions, very many platforms. And we've logically divided the tests up into kind of groups so that it makes it a little easier for us to automate and it makes it easier for us to give developers kind of a chance to run them. And a lot of these test cases that we're using, these test suites that we use, underneath it all, they're actually wildly different. They use different test frameworks. They have different test outputs. So we've kind of laid a thin veneer over top of these tests to make them look and feel the same to a developer or to someone who wants to run this testing. Because I could ask the development team, OK, now you have to learn all about open JDK regression tests and how to run those. And then you have to learn about these system tests and how to run those. Oh, and then you have to learn about these performance benchmarks and how to run and read those. And then you also have to. So we're trying to kind of standardize a little bit across all of those very diverse sets of tests, just enough to make it a little bit easier to run them. And in fact, we're sort of curators of test suites. And essentially, our goal is to pull all of the best tests from open communities and bring them to run them at adopt open JDK. So in fact, as I mentioned, many of you may be familiar with the suite that comes with the open JDK itself, the class library itself. So that's the regression suite. There's a whole whack of functional tests that come from OpenJ9. For performance tests, right now we only have a couple benchmarks running at the project. But in plan this year, we'll be pulling a whole lot more benchmarks in and start to run them. For system tests, we have a giant donation. In that category of tests, those include load and stress tests. But again, this isn't a firm set. We'll never change these. If the community has stress tests, for example, that they want to donate, I want to talk to you after. And then finally, the suite that we'll call external tests. These are actually pulling from large Java applications, those functional suites. And this is very easy for us to add. Right now, I think we have 14 different applications running. And those include the micro profile TCK suites. So Scala, Solar Lucene, Elasticsearch, all of those functional suites. And that adds up to a lot of rigorous testing for these implementations. And I guess I should also mention, just for these, I'm doing a quick overview. And then we'll get into some more details here. With those different logical groups of tests, no one has tons of machines in any open source community to run things on. We are getting more, thank you very much. But at any rate, we want to be able to logically separate these a little. And so we did. We broke them out into three levels, sanity, extended, and special. And that was me playing around with animations. It really gets sick later on when I do. But anyway, what I should mention then is that adopt OpenJDK, the sanity suite runs nightly and on releases. The extended suite runs weekly and on releases. And special is special. Those are run on custom schedules. Those are typically tests that have to run on special hardware, or tests that have we want to bring something to its knees. So we need to set aside special time for that. So that's kind of the overview of how we managed to run so many tests at the project. Now, I said the talk today is actually about, I think I tricked all of you to say, oh, it's about, what do enterprises require? When we go and we interact with enterprise customers, what are they asking us for? From a verification point of view, what can I do? And they're asking for some pretty basic stuff. They're asking that your implementation is functional, which means you're running the apps they care about, on the platforms they care about, all of the features that they expect they actually work as they're meant to. They want it to be secure. That means patches applied and tested. They want it to be performant, so some sort of performance bar, depending on their workload, and the types of operations they use heavily. They want a certain level of performance. And then finally, scalable. And this is an interesting one for us lately, obviously, in this world of containers, because we have a lot of customers. I come out of IBM Runtimes. We have a lot of customers running on big mainframe machines, but we now also are having a lot of customers who want to be running in the cloud. So the workloads are quite varied as well. And testing has to kind of address that. But this isn't really just what enterprises require. This is also what developers require. And I don't forget that, because I sit right next to our development team. And I hear right away what they want. They want all of this, but they also want ease of being able to run all the tests. And I want that too, because to be honest, if I can make it dead easy for anyone to run tests, they'll run more tests. So we'll have more people testing. And that's pretty awesome. So what is this quality kit from adopt over JDK that I am calling it at the moment? It probably doesn't roll off the tongue, but I'm just saying, act. These are the principles that guide us in our work at that project. So I'm going to talk about each one independently, but essentially open and transparent, number one principle. And I'll say why later. Diverse and robust, evolution. So that's got a whole bunch of points underneath it. That's so important to me, and I assume it's important to the community as well. You can't let tests go stale. They become useless as a product evolves, but if the tests aren't evolving with them and the approaches we have for verification aren't evolving with them, then what are we doing? We're possibly not doing the right thing if we don't evolve and continuously invest. We wanted this also everything that we're doing at adopt to be portable. So not just we can run it in our automation systems there in the Jenkins server there. The goal from testing was also for any implementer to easily run it within their shops, and any developer to easily run it on their laptop. So kind of a lot to ask for the test kit. But also tag and publish, and I'll explain what I mean there. I really mean reporting, but let's get into some of this. So open and transparent, I think everyone knows why. Open languages deserve open tests. That means tests source are open. That means they're executed in the open and results are openly available. Why? That's gonna let the community be confident in what we're doing, because they can see it. They can go in, they can scrutinize the test, they can provide fixes. Tests get loved when they're seen. And it really drives innovation. We've seen it on our own teams. When things can be quickly fixed, it's not you don't have to jump through the hoops to be able to make progress, to add new tests. You're able to work faster. And most importantly to me, it helps build community. The diverse and robust test part will why to fill that set of requirements that I mentioned earlier. So diverse, covering different categories of testing. So those logical groups really help us for functional and regression performance, the load and stress test. Those are all captured in those logical categories. We also of course cover all the versions. In fact, when I say versions, that's really can include all of the different experimental branches. Nothing stops us from running testing there. We, you can plug any one of those branches in and I'll show you just have to show the test kit where to look. And then finally, as many applications as we can run feasibly on the machines we have and we're made it dead easy for someone to add a new one. So if there's an application whose functional suite you want run, you can actually just copy the example and then replace a few lines and away we go we'll be running those in containers. All right, and so the call out is if you have suites that you think are compelling or useful, come talk to me after because I'd like to see them get run. Okay, so grouping and granularity. I do just want to quickly talk about that. This is the ease of use thing. I'm going to use the OpenJDK group as the example here. When you actually set up to run testing, you can run it at almost any granularity. So once I configure and I compile my test material and tell it where the SDK that I want to test is, then I can actually just go make OpenJDK and it's going to run the whole suite. Oh, you could do that now. And I guess the compelling thing is, could you do that for the system test? The same way. So if this was, I'm talking about the system group, make system and it's going to run the whole suite. And then within each suite, you can run it by those levels. So if I just want the sanity test, sanity dot and then the suite you're talking about. And then within that suite, a sanity suite, there's a bunch of sub-targets tagged with sanity and you could just run one of the sub-targets and then within the sub-target, if you wanted to just run one of the test classes you could. So you say, I know these tests, I could do that anyway. But you may not have been able to do that before with run make Scala test or make Acme Air to run a benchmark. So that's kind of what the groundwork we've laid out. And I guess I'm going to try to show you a couple of examples of that now just to kind of get into it. I won't show every permutation, our combination of that, because we're already going to be short on time. But I guess what I'll first try to show here is if I'm actually just on a command line and I run my get dot sh script that goes and gets me my test material, I've already pre-configured, so I've run make dash f, or sorry, the configuration file and I've already make compiled, so the test material is compiled. Can you increase the font size please? Maybe. This isn't going to last long anyway. All I'm really doing here is I'm actually going and taking one of these targets that's in a playlist and I'm running it. And all I wanted to really show was I can set up, and this is the one I actually ran, I can set up and run any of the targets in any of the playlists. And the power of that is you can quickly add targets, you can actually quickly plunk in entire directories of tests and set up really quick. The other power of that is you can add variants. So in this you see that I have a variation, is that still too small I guess, hey? Try that. So in this case I have this test running with no options, but in my branch I might want to turn on a whole bunch of other options. So it just repeats the test case with whatever set of options you want at any level. So that is that in a nutshell. It has actually run that test. It's given me very, very minimal amounts of information here, but it also dumps a bunch of output and logs into a test output folder. Why we've done this and used TAP, the test anything protocol is because we have that diverse set of suites, they don't all produce J unit or some standard. So we've, the lowest common denominator is we enforce all tests that we plug in to, we make them generate TAP output so that you can have a summary at the end. And so that when we run in automation, like over here. So if I'm running, this is the grinder, this is doing exactly the same thing that I've done on my, essentially the same thing that I've done on the command line, I guess I have to log in again. So if I want to rebuild this grinder that I've run, which is over here. Again, all I'm really doing is providing the platform that I want to run on, the version, the implementation that I want to run against what is the target that I want to run and then essentially just launch the build. The other thing I should say here is this is a tool for developers in that they may want to rerun several times. They want to run on a specific machine rather than have us select one out of the farm. And they may also want to archive results so they could download them somewhere else. So we can rebuild that thing. We can actually append extra command line options at the time of running as well. Essentially, these grinders exemplify what's happening at adopt through also the weekly builds, as I mentioned. So those are run on a jank and schedule, but they're also cover all of the actual nightly and release runs. So if I was going to look at that and say, well, what's going on? It's actually running the same suite. These are nightly tests that actually kick off being triggered from the build pipelines. If the build succeeds, it triggers off a whole set of tests into automation. All right, won't spend too much longer there. I want to make sure to hit that. Just to mention that there's a lot you can do. This is ways we tag things. In this case, in this particular example, we've had to exclude on JDK 11 because of a defect in the test material. So right now we have to explicitly say all of the versions we want to run on. If this was left out, it would just run on all the versions. If we wanted to do just nine and up, we would just go nine plus. So there's a ways to get in and kind of nudge each one of these tests or curate them. Okay, I won't do more examples. Time isn't permitting. I won't talk about the test results summary service that we're offering there, but to say that's something we're working on so that the community can come and see the aggregated test results that we have across all of the implementations, platforms, and versions. So they can say, ah, at ADOP, they ran 65,000 tests. This many ran and passed. This many didn't pass. This many were excluded, and you can drill down and actually get to some of that information from that summarized view. Okay, back to that list, that guiding set of principles, continuous investment. I really will like to see ADOP be the center of excellence for testing because we are in a very unique position there. Having the opportunity to run tests across all of the implementations means we have differential testing as an actual opportunity. It actually helps drive how we triage because we can say implementation A, B, C. The test didn't pass or did pass, but on D they did. So we can go, ah, that actually will help guide how we investigate, and it speeds up the work that we have to do under that. I guess also we're refining, as I mentioned, the automation and tools. There's a lot of performance benchmark tools that we're bringing in as well so we can actually easily see some of the results and do comparisons and diffs across the different ways the different variations that we test. Okay, so just under that continual investment process to modify, there's nothing stopping us from adding more testing at ADOPT. There's nothing stopping us from pulling out testing that we have there or editing or improving testing. The main thing here I think would be community awareness. So now that we have a technical steering committee, I guess one of the things we'll do is just define how we're deciding when we add new tests and that kind of thing so people are aware. The other thing we have started on are guides for implementers who wanna run this same suite in their own shops. How can they set up? It's very straightforward. We have some doc, but if you really wanna get started with running some of these tests in your own Jenkins server, just shoot us a note in the Slack channel. Okay, metrics. Obviously that's a way you can tell how you're doing and I'm thinking of that from the test perspective. So I wanna make sure we're selecting the best set of tests. I don't wanna just run as many tests as I can. I actually wanna choose the most effective tests. We're short, you wanna be as effective and efficient as possible. So we'll use code coverage at least to set a bar, but code coverage doesn't really give you a whole lot. It doesn't tell you functional coverage. We'll be applying some bug prediction tooling and we'll be using that comparative analysis story. As I mentioned, we're very fortunate to be able to go side by side and say this is this. This is behaving this way from one implementation but not another. It helps us in terms of cleansing the test material as well. Some test material that we bring in are only specific to certain implementations so we have to make sure they're excluded against the ones they're not valid for. Okay, the portable, I think we covered that. We have lots of guidance for how you can run any of the types of testing on your own laptop or on newly added platforms which are coming every day almost set it up. We're turning on against new platforms all the time and then in any implementers, Jenkins-Farm. And finally, I guess the tag and publish part, this is the audit trail, if you will. We don't have this in place yet, but it's in plan. So when you go and download a binary from the website, I want you to be able to download or at least link to here were all the tests that were run, that aggregated view of everything that ran. Here's what was excluded. Here are the artifacts if you care to look more closely so that you know how it was verified. All right, so that's just the review again of all of those points. Essentially, I'd like to say that this, from our conversations with enterprise customers, what they care about is captured in this manifesto. I guess what you should know also is what does this mean for what we do at ADOP, what are we doing? We're guided. We've always been guided by that set of principles and we will continue to be. We're gonna continue to sanitize and grow the test suites that we have. We'll continue to seek actively for more people to come and help us. Also calling out for contributions of test material. If you have something that's compelling and you really like and you think we don't have it, I'd like to at least have the opportunity to look at it and see if it complements what we have. Essentially build this center of excellence for testing because what I've learned from moving all of our stuff into the open is that it's such a breath of fresh air. It means you can collaborate with anyone and we are, we're collaborating with universities. We have a lot of people kind of coming in from all directions but it's invigorating, it's energizing and it's like the way to go. Obviously everyone here knows that. It makes it fun to come to work every day, I'll tell you that. So community building, this is where I hang out, spend most of my time. Thank you to join me in that and I guess I'll pause now if we have one minute for questions. I know there's a question out there. There was one on Twitter. Sure. Yep, it's working. So with all this testing, I'm just curious, is there anyone actually triaging the failures and for example you showed you're running open JDK tests. I'm just curious whether someone's triaging, looking the failures and feeding the bug reports back to open JDK. Yes, we are. Not all of us have access rights to be able to go to the bug database and actually add a defect but we have people on a Slack channel that are committers that would be able to raise a defect against that test suite. So the idea is yeah, we wanna make the community, the broad community better. So that's a goal and we have a, I think it's in our test repo, we keep track of those particular defect reports that are in open JDK under an issue 347 or something. What I need is more help for people to come once those issues get fixed in open JDK for that particular test suite, we need to have someone go and check that we have that in our branch or that we can actually turn those tests back on. So we have only a handful of people working on this right now. If you're looking for a little 10, 15 minute project in an afternoon, come and retest so we can re-include stuff more quickly. Otherwise, we get to it every couple of weeks. Okay, thank you. Can I ask a second question? So you show different type of tests suite, stress tests, functional tests. Yeah. I'm curious how you keep those up to date. So currently JDK is built, JDK JDK which is the main line most of us I guess are working on here. Is for JDK 13. And I'm just curious how you keep some of those test suites up to date because tools bit rot, they have to be updated and so on. Yeah, so for the external tests, for example, we have some of the functional suites, I think it was mentioned in some earlier talks today. Some of them don't work for 11, right? So the good thing is we made it dead easy to disable those for a particular version. That veneer I was talking about allows me to say this test is only valid for a particular version at this moment. Remember, we're curators of tests, so we're going and looking for test materials from the applications themselves, right? The functional suites. If that application isn't running on that version yet, we just disable it in that case. But to answer your question, it's going to be a constant effort to keep things alive. That's why we have this notion of continual investment. We're constantly going and saying, oh, is this what's happened here? Especially if you're pulling material from other repos, we don't watch their Slack channels every day to see what changed, but we'll see it right away in the bill. Yeah, anyone? So thank you for the talk. The first question is very small. So is my understanding correct that you plan to create some environment to verify different OpenJTK implementations, correct? Yeah, we have that environment. And this is very interesting because one of the main problems that I see is how to verify tests itself. Yes. Because for me, it seems like the biggest problem here. Yeah, it's a juicy problem. And we definitely want to, my earlier talk on deep learning, one of the outcomes from a model that I want to see is how useful is this test, how good is this test, right? It's a very big topic and I'm actively interested in it. Yeah. So my question goes in the same direction. So how many false positives do we have? Because we also do a lot of testing and the problem is that you have so many errors and it's a little bit related to your previous talk. We get a lot of errors, but it's infrastructure, it's moving the test. That's right. And you need, I think, a lot more than five people. Yeah. Just check all the... I see a lot more than five people right here. Just check only the nightly errors you get from this amount of tests you're doing. I think this is a continual problem. If we run 87 million tests a night and you have 10 eyes watching them, first of all, you hope most things are running green, right, because that eliminates a bunch of them. But say we have 500 failures overnight. That's where some of that deep learning tools and that kind of stuff will help us, but we're not there yet. What do we do? Right now, we've split ownership up across the five people to say you're watching this logical group and then if there is a big flourish of failures, then we all shift and try to focus on it. But yeah, there's... The good thing is we've become very quick at spotting them. The bad thing is it still takes time to go through each one. So we're behind, I would say, in terms of triaging. So the nightlies don't get as instantly triaged as a release build, for example, at the moment, but release builds also run a whole slew more testing. Okay, thanks a lot. Thank you. By the way, thank you for your talk this morning. You hit some points that answered some questions that other people had about some types of tests. Yeah, anything else? No? All right, thanks everyone. Thank you.