 Yeah. Welcome everybody. I am Mario Tore, and this is Marco Steart. We're here today to introduce a little bit about Mission Control at Fryer Recorder and explain what it is, how it can be useful, and how it can be useful, especially in a production system. That's the agenda for today, as I said, a short introduction. Then we will see one specific use case, but that should be taken as an example of what you can do with the framework, obviously. The example is about open tracing, and we will see how to do analysis or distribute the application with load balancing and everything. The first code drop, I would say Mission Control at Fryer Recorder can be seen as some kind of a framework or a platform. It's actually two big pieces. One is living in the JDK, and that's Fryer Recorder. The other part is a standalone application with a very comprehensive set of APIs that is Mission Control. Fryer Recorder was open sourced last year with OpenJDK 11. While Mission Control was followed pretty soon after, they both resided as an official project under the OpenJDK umbrella. Since then, it's been less than a year basically. There's a very awesome number of contributions. This has been a very, very good project from the beginning. I want to take this opportunity to thank Marcus because he's running. It's really, really open on the project. And of course, why Oregon and Redata are the main contributor at this point? There's been a lot of discussion and contribution from the rest of the community. So it's a very healthy open source project. So JDK Fryer Recorder. The good analogy for that is actually from the name itself. When you think about the Fryer Recorder, the data recorder in a night plane, is something that starts gathering information of events that happened during the flight. Sometimes, also in case of catastrophic events, keeps recording. And then there is a way to store all those events and then at some point with a forensic tool afterward, analyze them. This is what happens between Mission Control and Fryer Recorder. Mission Control is this forensic tool while Fryer Recorder is the actual data itself. Again, open source in JDK 11 is very actively developed so the Osport team is still adding features at every release. An example of how you can use that. So in addition to the user metrics there for the actual JVM, there is also a very nice and neat API that you can use to extend your own application. So it's possible to instrument your application adding events and then those events will basically use everything that is the same infrastructure that the Fryer Recorder events use and then be analyzed by Mission Control. So it's very easy to add events. This is the most trivial example. It's basically creating a class that needs to extend an event and then you can start adding some information about what it is that you are logging. In this case the three-year example is just adding labels but you have all sorts of different metadata that can be useful to analyze and structure the event, things like the frequency for instance or giving meaning to specific fields. And yeah, that is basically everything you do to define those. Then whenever this is necessary, you decide whatever the event should be committed. At that point some magic happens and then it ends up in the Flyer Recorder file. As you see, this is extremely easy. The next step, the next part of the feature is Mission Control. Mission Control is an application. So it's mostly based on Eclipse. So you run as a standalone desktop application. However, it also has an API that you can use to build your own application using the power of this framework. It's an API. We will see that in the next slide. There is very similar, I would say, to the streams. So you don't actually go and do some one-by-one event processing of the data. Instead you ask Mission Control to give you some statistical analysis for you. This is very powerful. That's an example, for instance. This is basically a method that is using the API without the full application but only the API to create an HTML report out of the rules. So it's basically lower the events. This is the event file that's been created by the JVM when dumping the Flyer Recorder file. Another example here, this is a little bit more complex and by complexity it really means just five lines of code more and this explains how powerful the API is. So basically you're still collecting the events, you're still analyzing them and there at some point you get account and average and those are, as you see, you never deal with the single events. You basically ask to the framework to give you an average and a standard deviation and then you can bring the result in so few lines of code it's possible to get a lot of information out of it. Some of this information. Well, actually another example that we were creating some example for this demonstration and we went for a more complex one that Markus is going to show you. That's an example you can download. This is very nice because this is basically two Java applications running in two separate Docker containers. They are talking to each other. One is controlling Flyer Recorder and the other is actually the VM that is being targeted to be analyzed and then this is dumping events, the Flyer Recorder data then the first application that analyzes using the Mission Control API. It gives you another use case. Open tracing. This one has... Right. Yes. Okay. Yes. So I actually have a fever so I might be delirious. Anyways, I'm impressed that I'm on. So open tracing. Yeah. So I have an example here consisting of a few microservices. It's a robot ordering and building application. So these are the three microservices. We have an order service that just takes the orders and when we get an order we'll verify that the customer is an actual customer in the system and then we're going to talk to factory for building the robots and that's pretty much it. We also have a load generator that we can use to do this full system testing. So this is how it might look. We're getting a request. We're talking to the say order system and the order system in turn is going to verify that the customer exists by talking to the customer service and then we're going to maybe talk to a few factories to start building these robots. But what if something goes wrong? This is where open tracing or distributed tracing systems come in. So Google released a paper called Dapper that many of you probably already know about about their infrastructure, their distributed tracing infrastructure in 2010 and that has inspired a host of other systems like Zipkin and Jager. How many here knows about Zipkin? Almost everybody. So it's a very commonly used tool. Of course this is nothing new. Distributed tracing is something that people have been doing for quite some time. All the APM vendors are doing this in one way or another. And they are all doing pretty much the same thing if we talk about Java land. Everybody is using bytecode instrumentation to get data from various different systems. So the differentiation here is really about what to instrument, what data to get and then also the other part of the differentiation is really how they present the data to the user. So the value add here is knowing what to actually do with the data once you have it. So previously if you wanted to add some contextual information so you have your own application and you would really like to put some pieces of information into this distributed tracing well you would need to work with a vendor specific proprietary API or if you as a library vendor wanted to add some special information there again you pretty much had to work with one of the tracer implementations. So then came open tracing which is a vendor independent API so that you can support multiple tracers with one API. And you can add contextual information without worrying about some vendor looking. And yeah, there is a spec on GitHub if you are interested in looking at the spec. So I am quickly just going to go through some basic parts of the API because you will need to know this when I do the demo. So the core API, well this is not actually part of the API but the core concept is the distributed trace. So this is something that will span multiple processes or can span multiple processes and it's really a directed I-Cyclic graph of spans so we will get to spans. And a span itself has an operation name and some data so there are key value stores sort of. It has a start time and an end time that's really the key piece of information you want to know how long these spans took because that's probably one of the key pieces of information you are going to use to find out that something is going bad. Okay, then there is a span context and that is propagated across the process boundaries. So I am just going to skip ahead. So in our robot shop this is what this graph could look like. So we have the low generator which does a full system test it's going to create a customer, a random customer then we are going to post an order which might be multiple robots that we want to create. We are going to start by validating the customer then create whatever many robots we want to create pick up the robots and fulfill the order and at the very end delete our customer. So I am going to do a quick demo of what this might look like. So I am going to start a bunch of services you are going to see an exception because I am running JDK 11 from the Scanners. Well, you don't see anything? Is this going to help? Do I need to do something else? Do I need to exit here? Hello. Okay, I am near. Where do I do that? Okay, so display. Okay, so you are seeing something at least. Okay, so I have started a bunch of services and now I am going to start a little load generator single increment load generator so it is now creating this customer and it is going to fulfill an order and after a little while we are going to be done there and I am just going to start the Jager UI so that we can look at our traces here and here we go. So this is basically the full cycle of us creating a customer you can see that we first check whatever types the factory can build and then we are checking what paints are available and then we create a random user and at some point here we are going to start building robots in a factory. What we can see when we start building these robots is that creating a chassis actually takes, there is a lot of variance in how long it takes to create a chassis. So here we are creating an EVE type of robot then we are painting it red here we are creating some other type of robot and you can see that the variance in how long it takes to actually create those robots I mean it would be an amazing factory that can create robots in seconds but still, you know, it is a lot of variance and why is that? And this is typically where these distributed traces break down you don't get the information that you need to actually be able to do something about this so you know that there is a problem probably in the factory but you don't know what to do about it. So, what if we... Okay, so I am going to just do it like this Exit, here, I don't care, I've got a fever. So, here we go. So the idea then is what happens if you start marrying flight recorder with the distributed tracer So, what if you could get the low level information that you require to be able to solve these kind of problems and you can marry that with the tracer so basically what you would need is some way to take the contextual information that you have from the tracer and push that into the flight recorder so you can start correlating them, right? You would be able to get, record the trace ID span ID all these kind of identities that allow you to properly go find what was actually going down So, one way to do that is to create an open tracing tracer and start emitting flight recorder events using the tracer and there is a concept that I didn't talk about which is called scope and that is a thread local activation of a span So, that rhymes very well with the thread local recording of events in flight recorder, right? So, what if we could do that? What if we could start recording this contextual information? Then we could probably do something good So, I built a prototype I've donated it to open tracing There are still some things in open tracing that should be fixed to minimize the object allocations but it works and it's built as an MR jar so it will work on Oracle JDK 7 and Open JDK 11 plus and there is a link towards the end I think the link might be to my Github repo pre-donating it So, I'm going to do a quick demo of what might happen if we... Okay, so, I'm just going to go ahead and push some more data into the tracer and the thing is, I am actually running with flight recorder already So, let's take a look at Jager UI Now, we should have another trace, right? So, what do we want to do? Now, we want to check out what the factory was actually doing So, let's go back to my Eclipse and let's look at flight recorder So, all these guys actually have a flight recording running and it's the factory that I'm interested in So, I'm just going to go ahead and dump that whole recording So, I've dumped the recording and I've actually done a little special UI here for traces So, we can see that there is a bunch of traces here Low resolution and we can see that we have one trace that took 1.3 seconds So, maybe look at what was actually going on there and let's go to the Java application... well, Java threads actually and I want to see all of them and I only want to see stuff that was happening in the same thread during the same time So, here we can see what was actually going on here I'm going to zoom in a little bit So, we see our thread local activations of the spans So, we can see that there are scope events associated to that span ID and you know, you would get that ID from Jager if you really wanted to homing on a specific one and you can see that what's happening here is that we're actually blocked on entering a Java monitor here So, we have a synchronization problem here and well, we can either go look at the actual log instances here very specifically and we see that it's this monitor and this is the stack trace So, it's this logger that is being used and this logger is the thing that we're all synchronizing on all these threads and that's why we have this skew or variance in the time it takes to invoke this problem So, we now have stack traces we have low level events like Java monitor enter it's not just Java monitor enter we have parks, we have all these this very rich, if you haven't looked at the flight recorder you'll see that there is a very rich set of events So, there is a method profiler so allocation profiling in this case it's thread holds So, what is actually going on when we aren't executing So, what holds the thread and in this case it was Java monitor enter and that's why we're not executing So, how are we time wise? I think we have a few minutes for questions as well Okay, so I'll just go back to Five minutes more Okay So, that we can have some questions I'm going to sum this up JDK flight recorder has been open sourced since JDK 11 so it's something that you can already use There are even ARM builds with flight recorder working on open JDK So, those of you who are doing embedded systems might find that very useful to record sensor data and I have an example with using laser rangefinders and recording the laser rangefinder information and using admission control to render what we're actually seeing So, that's kind of cool It's mission control has been open sourced I'm sorry it hasn't been released yet It's processed stuff So, it should be released within, I don't know maybe a couple of weeks We'll see Open tracing is open sourced has always been and that's the vendor neutral initiative for doing distributed tracing and since I'm Oracle and since I actually said something about the future this slide Especially regarding the release Especially regarding the release that might change at the sole discretion of Oracle So, here are some resources So, the project for mission control that's up there my blog here So, I'm talking about cool things you could do with flight recorder usually and then my Github if you want to look at some of the examples or the tracer or whatever There are also some serviceability examples for the Java serviceability APIs and stuff Okay So, I'm going to sit down I think Yeah, do you have a question for us? Yeah Questions I think I have to look around there and then come So, I've been doing similar kind of event sort of like inserting event into my application and at the moment I'm actually using something called micrometer from sprain So, how is that matrix sort of event differs from open tracing or can they actually be basically I just instrument it once and then basically I'll get all the events So, what you mean is how this is different from open tracing? Oh, okay So, open tracing is just an API for a vendor neutral API to push things into the tracers, right? So, you can use it with Zipkin you can use it with Jager you can use it with a lot of different tracers So, the thing you get with open tracing is the ability to just use one API and then the end user can choose which tracer he wants to use I'm not sure about micrometer or how it's implemented If you're talking about flight recorder and stuffing things into flight recorder Well, flight recorder is a very highly efficient recording engine built into the JVM So, the overhead when you're using flight recorder and there is a ton of cool things we're doing Like some platforms we're using invariant TSC for time stamping getting really fast time stamping It's a binary format We're using integer compression to make sure that we're not using more memory than we want to or as little as possible And there's a whole host of cool things we're doing inside of JVM to just make sure that it's really, really fast And if you turn off an event and it's on the hot path you can be pretty sure that it's going to be optimized If you disassemble the assembly for the method you're basically not going to see a trace of anything that has to do with events I think also one point important is the knowledge of the knowledge of everything of those events because it's all tied to the runtime of the JVM So when you're adding some events that are custom events for your application you still have some way or contextualize that with the rest of the environment And that's where the example Marcus did before is important and critical It's not an open tracing I can't tell you because of some specific reason but it's only because it doesn't have the knowledge of the runtime it's running on where the virtual machine does have this knowledge So it's easy for you to correlate the various events in the same application and this is the power of Flaregården I think another good point Oh, this is a long answer to your question Now we're going down the rabbit hole But another good point is that it's built into the JVM So there is certain information that we can get from the JVM So heap composition for example when we garbage collect we can start gathering data as part of the say the mark phase of the garbage collector So you don't need to explicitly go say to now I want to go through the heap to see what's there We can piggyback on already existing systems to get the data So some data we pretty much get for free just as part of running the program Well, that is a good question I don't think you want to do that but you can Because the problem with Prometheus is that Prometheus is very much about harvesting the data You will hand up getting one event at a time So you can absolutely do that with Prometheus and have some way of telling So I don't know if you write a simple application that reads the fry record the events and then one by one make them available to Prometheus But I think this will kind of invited the point that you have access to all the statistical metals in the first place and there's all the overhead So to take an example just a random example If you take a flight recording file which is a binary recording which has constant pools for the stack traces I mean stack traces are literally sets of integers pointing into the constant pool If you take that thing and you transform it to JSON that thing is going to blow up by several orders of magnitude It's like and then if you take all that data and you try to push it into an event system that event system is going to be unhappy So it's better to get the data when you need it So for example the example that I did with the tracer So when you know that there is something suspicious going on you can leave the fry record running because you're not going to see more than say a percent of overhead you can actually have it running all the time So when you see something that actually is interesting to you from the contextual data that you get from the tracer that's when you can dump it and look at the data and the detailed data and as Mario was talking about you have in mission control a bunch of there is the core API from mission control which is basically a set of bundles that you can use to do automated analysis of flight recordings So then you can push that through that pipeline if you want to So this is what I wanted to ask So the best approach in this case would be to have some kind of embedding application that does the processing using the API and then output the results to Prometheus if you want to in some already automated form Yeah Yes, it's over It's over It's over Well, thank you Thank you