 Hello, everyone. I guess I can start now at 1405. So my name is Chi Kang. I'm a software engineer at Red Hat, primarily on machine control and tele-recorder. Along with me, he's not actually here, is Marcus Hurt, director of engineering at DataDoc. He's the lead of machine control. Originally, we were planning to do a talk together, but unfortunately, he couldn't make it. However, a lot of content here is contributed from Marcus, so I've included him. So, yeah, today, JMC and JFR are in brackets, 20-20 vision, looking ahead. So on the agenda, I'll just give a brief introduction to JMC and JFR, what they are. And I'll talk about development updates in the past year or two, as well as the roadmap for the coming year, I guess, and then a nice demo of machine control in the cloud. So, yeah, JMC and JFR, for those of you who haven't heard of it before, JFR is the JDK fly recorder. It's event-based JVM tech for production time profiling and diagnostics, while JMC stands for JDK mission control, and it's a desktop application for JMX browsing, as well as JFR visualization and analysis. So JDK fly recorder, it's a low overhead, recording, it's compact, self-describing data in the JFR file format. It's extensible, and you can kind of think of it as analogous to a fly recorder in an airplane. Mission control, again, desktop application provides two main features. You can connect to JVMs and browse JMX-related data. And then more importantly, though, it has a huge amount of tools for visualization and analysis of the flight recording files. So JDK mission control rarely depends heavily on JFR. The third thing along with that, though, is JMC project does have core libraries for parsing and analyzing JFR data, and they are available for third-party use. So if you're looking at using JDK fly recorder and you wanna analyze or visualize in ways that JMC doesn't currently have, you can easily develop your own tools using the core libraries. Yeah, so looking at the tool chain for fly recorder in general, there's three main parts. First is controlling JFR. You can do that through CLI, Pojo, JMX, et cetera. So you can start and stop flight recordings of a JVM application. You can ask it to be dumped on exit. You can set the max memory size that JFR is gonna take, et cetera. Then you can add data to JFR. So there's API for that. JMC provides an agent that I'll go touch upon briefly for dynamic injection of JFR events. And then, of course, you can have your own integrations as well. So JFR does come with a lot of pre-built information inside the JVM itself, but for an application, you may have specific application data that you wanna also record, and you can use the fly recorder system that's in the JVM to do that. And then at the end of the day, of course, the whole reason we're doing this is to use the data to find problems in our application to try to solve performance bottlenecks, et cetera. So that's where mission control comes in. There's also a JFR utility tool that's provided by the JDK that you can use on command line. And then you can also build your own third-party applications to analyze flight recordings. So if we just look at a general workflow, you run your JVM with JFR enabled, various configuration options. That will output a JFR file that you will need to store somewhere, probably not in the production system. And then eventually, you get that JFR file, maybe onto your local desktop, and you analyze the data with mission control or with other tools. So you have mini demo, just quickly showing mission control and use. So on the command line, I've started mission control from the GitHub repo targeting JK11. That opens up this interface. So it's a desktop application. On the left side, we have our JVM browser. So currently on my laptop, the only JVM running is mission control itself. We open that up. We can connect to the MB and server. So this shows live data of various basic things, CPU usage, heat memory, et cetera, has triggers. So you could trigger, like if CPU usage goes too high, I want a flight recording dumped, things like that. So it's got some neat tools in there. You could diagnose a command. But of course, the most interesting part is, I think the flight recorder section. So over here, we can see that for mission control, it already has a continuous recording, which started as soon as the JVM launched. So if I double click that, I can choose to dump the last five minutes, for example. This outputs a JFR file, and then we get to all of the tools that JMC comes with for analyzing that data. So just to cover a few, the automated analysis results is pretty neat. So it will analyze all of the events in the JFR file, and there are specific rules that are written to read specific events, to try and score them, and provide useful info on maybe there's an issue here that you should look at, et cetera. So that's being constantly developed. There's various pages here, like the threads, you can look at all the threads that were running, and their states over time. It's pretty neat. There's a huge amount of pages. I won't go through all of them, but that's basically mission control. Yeah, demoed. So development updates. Before I go into this, hopefully you've all heard Oracle open source mission control and flight recorder about two years ago now, I think in 2018, May-ish timeline. And so that's even the reason why I'm here working on mission control and flight recorder. So since that time, one of the biggest updates, last two months ago, maybe now, I don't know what time flies, is the JMC project is now on GitHub via SCARA. So the contribution process can be started with a simple pull request, which is pretty neat. There's a lot of tools and bots around there too. So the development process of mission control is very much open from before. But going into actual development updates, Oracle did open source mission control in 2018. And then in June of 2019, which is now last year, the very first open source release of JMC was tagged as 7.0.0. And then following that in December, which is last month or two months ago, it's February now, 7.1.0 was tagged. So if anyone's wondering what happened to mission control after Oracle open sourced it, it does have an active community. We are developing new features for it and are looking to improve it into the future as well. So some of the features in 7.1.0 are more optimizations of the rules. So the automated analysis that I showed a little earlier, it does have some issues when dealing with multiple gigabyte-sized JFR files, trying to get to analyze all the events, et cetera like that. So that's an ongoing thing, actually. The JFR flow view, so there is a view that JMC comes with for heap dump analysis. That's converted from using JavaFX to SWT now. So you no longer need to also have JavaFX on your system to use that. And then we also have two new views on the flame graph, which is pretty cool actually. And then HDR histogram, so yeah. Of course, there's a huge number of bug fixes and stuff like that as you'll see. Just to try, oh, I just realized I probably should have clicked present. So it's full screen. But anyways, so project commits since the initial open sourcing to 7.0.0, there were actually 128 commits, 7.0.0, 7.1, there were 99. And the GitHub project itself, when I created this slide, had 238 commits. So it's definitely actively being worked on and there's a lot of features we wanna add as well. Distribution-wise, AdoptOpenJDK has provided binaries for download. Actually, for the time being, these are snapshot versions. So the latest in the repository. And I have heard from the developers there that they are going to do release binaries as well in the future. Oracle, of course, does have binaries as well as long as it clicks update site. Unfortunately, they have yet to do the 7.1.0 release, but hopefully we'll see that in the next few months. And then Red Hat has RPMs for Fedora and RHEL. So if you are using Fedora, RHEL, you can DNF or YUM install those. Yeah, and then there are a few other distributions like Azul has their machine control, but basically they're at the moment all the same, which I think is good. Yeah, so a quick slide on contributing. The repository is on GitHub. There's a mailing list. We have Slack, which is pretty active. And our bug tracker is the open JDK JIRA instance. And as well, a little plug for Marcus's GMC tutorial. If you are looking to learn how to use GMC to analyze various problems, Marcus did create a quite decent set of problems like dealing with memory leaks or hot methods, et cetera, and how you would see that using JFR and using GMC. So you can always try that out. So yeah, moving on to the roadmap. What's up in the future? Before I talk about GMC's roadmap, there is one thing I do want to share regarding JFR. And so Oracle did open source JFR and GMC for open JDK 11 plus. But the community has worked on putting JFR into open JDK 8. And so the CSR was finalized and closed in December. The remaining to do is just to merge it into mainline. And Mario actually started the thread for that in January, a week ago maybe. So yeah, once that's accepted, we're gonna see JFR in open JDK 8, which will be awesome. So yeah, back to GMC though. The roadmap is Mission Control 8. So this is our early access splash screen. But it's going to be a major release. So there will be breaking changes if anyone uses the core API for example. And it's going to target open JDK 15. So 15 releases if there's no delays in September, 2020. And so you will see GMC 8 a few months afterwards as we make sure that it actually works on open JDK 15. So what are we adding to 8? Three keep things here, GMC agent, rules update, and then core library updates. I'll go into each of these. So the GMC agent is actually a quite neat tool. It's for dynamic insertion of JFR events at runtime. So as I said earlier, you can extend your own events into the JFR system, but that does require like writing annotations, et cetera in the code. And then obviously recompiling, redeploying. So of course you can do bytecode instrumentation with other tools to do this as well, but the GMC agent will be designed ground up for insertion of JFR events at runtime. So if you wanted to modify your application without having to rewrite the code, recompile, redeploy, you can do that. It's optimized for the job. So hopefully we will see that as part of the GMC distribution in 8.0. And technically the code is already in upstream repository and it's being developed on there. So if you are interested, there are a lot of open issues for that too. And then rules 2.0, again, mainly the automated analysis section. There's gonna be improvements for performance. There's gonna be a redesign of the rules in general so that they can be reused. The idea from the developer was like, you could use rule A results and rule B and those can be used in rule C, et cetera, and build up rules in a more organized fashion than what they are now. And there'll be more typed information in the rule itself so that we can more easily visualize the rule results. And then for whatever reason, some of them have HTML and they're not gonna have that in the future. And then going into the core API. So this is the API that GMC also provides to third-party applications, mainly to do with reading in JFR files, managing the flight recorder system of JVMs or discovering JVMs in general. So a lot of these features are extremely useful for third-party applications. And in JNC8, we're gonna be updating it to use JDK8 language features and previously it was seven. So this isn't to do with compatibility between JFR and your open JDK version. This is just the actual code of the API will compile on eight plus and run on eight plus cause we're gonna be using JDK8 language features. And more things from the application will be moved into core to be reused by third-party applications because they are useful. Yeah, and so some more features in terms of, I guess, the visual aspect of JNC. We're gonna have new stack trace visualizations. We're gonna have improvements to the thread graph. The flame graph is already being updated as we speak but there's gonna be a large amount of improvements there. And then this is something Marcus has a prototype for on his own repo but visualizations of stack traces with various graphing tools. So the image is just an example of method profile events in a graph format. So yeah, moving on to the demo portion, JFR management in OpenShift with container JFR. So before I jump into the demo, I do wanna give a little bit of context so this is like a huge diagram explaining how the demo is set up and what's going on which I will show again but the basis behind this is this new project called Container JFR that my team has been working on. It's under the GitHub org, our HGMC team and the whole idea is just to make it easier for users to control flight recorder on JVMs that are in containers. Whether it's Docker, Kubernetes, OpenShift, et cetera. So the features are pretty simple. You can start and stop recordings, you can archive recordings, you can download them, you can view automated analysis immediately without opening your desktop application. You can view events available for the specific JVM and this is all able to be manipulated via a web UI or a command line tool. And then we also are looking into various integrations with things like Grafano, Prometheus, et cetera. But those are more like experimental so they might disappear. So but the idea is all this stuff you can do now but doing it with JVMs in containers or JVMs running on orchestration platforms like OpenShift is not necessarily the easiest thing to set up and the idea here is running this project alongside your deployment, you can do that without any extra work. So JVM container JVFAR is made up of four core components. Each of them are their own repository. There's the main container JVFAR which contains the management service including the API for Kubernetes and OpenShift. There's the core which has core libraries for JVFAR management. So the idea here is technically if you wanted to write your own stuff you could extend off our core library which contains like useful features for dealing with like for example JVM discovery in OpenShift or Kubernetes or something. There's the web front end just to give a web UI to easily do things. And then there's our operator project. So I'm not sure how familiar everyone is with how Kubernetes and OpenShift work but they have this operator concept to help deployment of things. And so we do provide an operator where it's basically becomes like a one click operation. You click subscribe and then everything is started and automated and managed by OpenShift or Kubernetes. And then along with that there's the side project JVFAR data source and it provides a data source for Grafana for JVFAR files. So if you have a JVFAR file you can upload it into this data source and then you could view any time series related data in Grafana. And then finally for the demo there's also an incorporation of Yeager which is end to end distributed tracing for your applications. So if you're deploying like four or five different microservices that are HTTP based and they're requesting cross networks to each other you can build a map of what's happening using distributed tracing. So yeah, going back to the diagram the setup is basically this microservice called the robot shop which consists of an order service a customer service and a factory service. So a user is gonna come in and request an order for a robot that goes into the order service which tells the factory that it needs to build a robot and then tells the customer service that they have a customer who needs like the database to be updated with their order et cetera. And so eventually a robot gets built and then it gets sent back to the user and so these are three different Java applications that are running basically HTTP servers that are gonna be deployed in OpenShift and then on the side we have a Yeager operator below which manages an instance of Yeager and the services are all implemented to work with Yeager for distributed tracing so they'll be sending traces to the Yeager instance. As well we'll have the container JFR operator deployed so there's one of those per product and that will manage a single container JFR instance which contains the web UI, container JFR, the data source and the Grafana instance and that will be able to connect to any of the applications that are running for retrieving flight recordings from them. And so you'll see as well, it's more of an implementation detail but the operator will be running flight recorder instances, resources per application in the project which is used for management purposes. It's not too important. So yeah, we can jump into the demo. So there is a portion that's internet related but the internet's a bit spotty so I might skip it. But yeah, in the project I created in an OpenShift cluster on my local machine, I called it FOSM 2020. If we go into the operators, I've installed the Yeager operator as well as the container JFR operator and then the Yeager operator itself installs the Elasticsearch operator though I don't technically use it. If we look at the deployments, I have the container JFR operator Yeager, Elasticsearch and this Yeager instance. So when it comes to setting up container JFR, the basics behind it is you install the operator through the operator hub and then once you've installed it, you simply go into the project that you want an instance of and you click create instance and then you hit in the spec minimal, either true or false. So that will decide if you want to deploy the web UI and related tools or not. So if you want the web UI, you put in minimal false and then you just hit create. And then from there, everything else is automatically set up by the system and you're done. So once you have that up and running, I already have an instance called container JFR and I just created the example instance but if we go into our route section, this stuff is automatically created. So there are exposed HTTP endpoints for container JFR so I can visit that now. So this one just opens up our work in progress web UI and so this is where you can through UI look at the JVMs that are running in this OpenShift project. So going back to the cluster, if we look at the deployment conflicts, this is actually where I've deployed the robot shop with poorly named names. So RCS is the customer service, RFS is factory service and RLS is the order service. So these are the deployments that correspond to the microservices I showed earlier. So here we can connect to the target JVMs. At the moment, it basically just shows their host's name and then the port. So down below here is the RCS ports, the RFS ports and the RLS ports for the microservices. So on these applications, I have exposed for RGMX connections, 90, 91. So for example, in this RCS 90, 91, I can connect to it. And then the basic UI is you can create recordings, you can set a snapshot, time, what events, the name, et cetera. And then once you have a recording, for example, this one which is continuously running, you can view a summary which will load the automated analysis ports in the web UI. You can choose to download it. You can save it into the archive which works with persistent storage for OpenShift. And then that's basically getting your JFR files out of your cluster. Okay. So as far as the demo is concerned, we have three microservices that are running and I'm gonna submit an order to them from my local machine. So if there's load application, I'm just gonna run. So the load application will target those three services which expose HTTP routes. Hopefully this works. So you can see here it has pretty extensive logging, but in general, it's ordering three robots to be created and then delivered to the user. And so if we go back here, we can open up Yeager and see we do have the services and their traces, so you can find that. And then with distributed tracing, we get a overlook of exactly what occurred in the HTTP network. So request was made to order validated user and then there's three build robot requests. So it builds a robot three times, then there's pickup, et cetera. As well, we can go into dependencies and look at the directed Silic Graph and just see in our setup of services. When the request came in, the order service made 10 requests, the factory service and also one request, the customer service. So it just shows you the connections between your microservices. And then back to the trace. So given a trace, there is quite a bit of information. You can see, for example, here in the creating trace series, it was of type T800 and it took 469 milliseconds to do that. Whereas when it was creating this Wally robot, it took 670 milliseconds and then you can see across a whole request how long it took, 2.4 seconds here, 3 seconds, et cetera. So you do get a neat overview of what's going on at a little higher level in your application. But when you're trying to diagnose issues or potential performance problems, you do need a little more. And that's where in combination with something like JFR, you can solve a lot. So in this scenario, this is obviously a dummy project. There are a few problems coded into it. For example, in the creating tracees, you can see here, when it was creating the Wally robot for whatever reason, it took 670 milliseconds. Whereas when it was doing coffee, it was only 272. It's a huge difference. If you know a little bit about your own application, maybe this is an unexpected result. These two robots should take a similar amount of time. So from here, though, you can't get much more than the information it shows, right? So why did it take more time? That's where I'm gonna go in now with JFR. So this was in the factory service. So I'm gonna get a recording of that back to the web UI. I'm gonna target the factory service. This is port. It has a continuous recording enabled. So I'm just gonna download that to my system. So with a click, it's now on my own machine. And then going back to mission control now, I can open that up real quickly in the downloads factory service.JFR. So yeah, on the get go for the factory servers, there isn't too much in automated analysis results, unfortunately, so you do have to go digging a little deeper. So one thing I can look at in this case, I know it's the creating chassis. And in this process, when it creates a robot, it uses a thread pool system. So I can look at the set of threads, in this case, the factory line and see what's going on. So if I then zoom in to that, I can see here the execution of these threads and what their state was. So actually in my demo, I sent the request quite a while ago, so some of the information isn't around. And the reason for that is it's a continuous recording, which I've set to only keep 128 megabytes of space. So the recording won't contain events from the lifetime of the JVM if it's lasting for a long time, but that's a configuration option. But in any case, we can still see some interesting information here, where in the scope of building a robot, a huge portion of time was spent blocked. And so there's obviously a concurrency problem here going on because I can make hopefully decent assumption that it was blocked because of something else, like the other factory line, right? Because this, so the blue is it's sleeping and then here it's blocked and then this is I think active, but so if I actually click into the block section though and look at the stack trace portion, I can see that in its block state, oops, down here it was in the paint robot, which was then in the logger. So I have a little bit of code now that I can go back to my project code and say maybe I'll go look into this logging of paint robot or logging of Create Chassis and see what's happening there. So if I go and open up my code editor now, the project, so I'm just gonna search Create Chassis. So in the Chassis code there's a span builder for tracing purposes and then there's a logger call right here and then the sleep it's supposed to simulate building a Chassis which is supposed to be constant time. So because the stack trace showed that it was sleeping in logger, I'm gonna check the logger out now. Go into logger.java, we'll see that Marcus decided to make this synchronized static void message which is terrible for logging purposes because we have four factory lines that are trying to make robots and they're gonna all hit this logging synchronized thing and get blocked. So we'd have to work that out. So this is just a quick example of how you can take distributed tracing and JFR together to get a deeper look at problems in your code. Yeah, and so going back to the open shift setup just for some clarity in terms of your JVM application apart from deploying container JFR, the only thing you now need to do for a JVM application in a container is expose a connection for container JFR. In this case, we do support IJMX. So if we look at our deployment configs and the environment variables, we'll see there's a Java ops section that I set up and in this Java ops, I'm gonna make it bigger. Here, okay, that was too big. I exposed 99.1 obviously on a non-production system without authentication and then these are the settings for fly recorder to be continuous. So yeah, in general, container JFR, the idea is to make getting JFR files out of JVMs and containers as easy as possible and yeah. The other neat thing to hear while in there is just, for example, in an orchestration system, there's concept of internal external networks. So normally when you do a JMX connection, you want it to be secure because if you're opening a port in a system then anyone could maybe access that and do something delicious. But with internal external networks, the port is actually in the setup I have only on an internal OpenShift network, so only OpenShift services with the correct privileges are able to access it. So you get the security there for free. So theoretically, this stuff isn't exposed as long as OpenShift wrote their code correctly. Yeah, so going back to the slides here, again, that was the demo for JFR management and OpenShift container. And finally, I'd like to end off again with contributing to JNC. We have a repo, mailing us, et cetera. We are pretty active. There's a huge amount of issues in a variety of areas that you can work on. And yeah, thank you. So I guess I have some time if anyone has questions, sure, in the center. So that's something, oh yeah. So the question was if the JNC core libraries will ever be published on Maven Central. So that one actually became a little complicated because Oracle had plans to publish it into Sonatype. And so they started the process and they got the permissions and the admin rights. And then they stopped. And I don't know where they went, but they have the admin rights to the name for the core libraries and everything. So we have to discuss that. But in general, we do want them to be on Maven Central so people can easily get them into their Maven projects. For the time being, AdoptOpenJDK actually publishes to a repository, the core libraries. So you can pull that from your Maven projects. You just need to add an external repo, yeah. Any other questions? Sure, in the back. Thanks, Andrew. Thank you. Regarding the container JFR, what's the deployment model like? Is it like a site guard that's injected into every JVM instance it sees? No. Or is it like a Daemon set or how does it scale? Okay, yeah, so the question was what's the deployment method of container JFR and how it interacts with the JVMs that it's trying to monitor? It is not an agent that you inject into the JVM itself or into the pod that's running the JVM. At the moment, it solely relies on connection protocols. For example, RGMX or JDP to connect to JVMs that are running. So the general process is for an application developer, you deploy your JVM, you expose RGMX and then you're done. Container JFR runs on the side as long as it has access to the same network, it will see that a 9091 port is exposed and then be able to connect to it and go from there. So the discovery at the moment is pretty dumb. It tries to discover everything, but we're trying to fine tune it. For example, you could add an environment variable or something. This is specific to Kubernetes or OpenShift, but to specify that this is a JVM that's exposing a connection for you and then so we'll connect directly there. But, yeah, so Prometheus is a pull-based system. So compared to Prometheus, oh, so the question here was how does this compare to Prometheus's discovery protocol? And so Prometheus, for those that don't know, works on a primarily pull-based system where the running application exposes a basic web service that Prometheus connects to and pulls from repeatedly. And how Prometheus discovers these is through service tags and things like that. So we will have a similar system for Kubernetes in OpenShift, where you can apply, in OpenShift, for example, they have this concept of labels, which you can apply to an application deployment. And then we can read those labels and see that, oh, that's our label, so we should check that out. But otherwise, at the moment, the discovery protocol is within the OpenShift networking system or Kubernetes networking system, where we, as a deployment in the same project, have access to the network and we can see all of the things running on that network. And so we don't actually need your application to specify anything, technically. Does that help? Sure. Any other questions? Yeah, I think you should probably take those outside. We need to have time to turn around the room for the next speaker. Thank you very much. Thank you.