 So my name is Nathan Scott, I work for Red Hat for the last couple of years. I have a keen interest in performance analysis, in particularly in what we call system level performance analysis, but I'll talk about different kinds of performance analysis in a minute. I work with the performance tools group at Red Hat. We look after tools like PCP, which I'll be talking about today, and other tools like Valgrind and SystemTap and a few other tools of that ilk. So a lot of people don't know much about PCP. It's kind of a, I might like to think of it as a hidden gem of the open source landscape. It's been around for many, many years, over 20 years. We've had PCP and over 11 or 12 years or so it's been an open source project, started out as a proprietary system from SGI. So in my talk today, I'm going to talk a little bit about PCP itself because a lot of people need an introduction to it, what it's for, how it's put together, and some of the basics so that I can then go into talking about containers and what that means for PCP and how we're attempting to marry the two concepts because they don't particularly fit well together with traditional system level tools like PCP. Also in this talk I want to talk about kernel instrumentation for containers and the metrics that are available to you if you have an interest in performance analysis and you're using containers. So I'm going to talk about the concepts in the kernel that back containers and how they fit together and what that means from a performance analysis point of view. So right up the top level let's talk about what PCP actually is. PCP is a toolkit for performance analysis. It follows the UNIX design philosophy of having many small tools that do things working together rather than a single big thing that does everything. Like I mentioned before, it's for system level performance analysis. So these are tools of the ilk of ganglia, collectee, that kind of tool and less of tools like profiling tools like oprofile or perf, things like that. It's a different category of tool. So we're in the system level analysis gang. In PCP we have focus on being able to do both live analysis of a system while it's running and while it's failing being able to analyze what's happening and also be able to use the same tools that we use for live analysis but also for doing historical analysis. Being able to compare what's happening right now on a system to what was happening yesterday or two weeks ago. PCP is inherently extensible in several different directions and we have these core concepts of monitors and collectors in PCP. I'll describe what that division is, but it's extensible in both those ways. I'll talk about that on the next slide. But last point I want to talk about in terms of PCP overview is that it's a fundamentally distributed system. So baked into PCP is the ability to analyze multiple systems, multiple computers at once that are forming a larger complex system. So if you have a web application, for example, that has a database service and storage, the application, that's a firewall in front, those are all forming a complex system together and they all need to be analyzed together. It's difficult if you have to analyze different systems separately to try to get a good idea of the overall performance of the system. A little picture is just trying to show that we have the ability to focus on what's happening right now under that sort of light pyramid. We can look backwards in time and we also try to aid future analysis work like capacity planning, things like that. We don't really do that directly in PCP, but we have the ability to store data so that you can project forward using other tools. So this is the architecture diagram for the live mode of PCP. If you install PCP and it's running on your system, these are some of the fundamental processes or components that go into the typical deployment of PCP. So on the left-hand side, the green side, we have what we would call the collector system. That involves a central daemon which pretty much runs the show. It acts as a multiplexer, so that's the PMCD daemon, Performance Metrics Collection daemon. And the client tools or monitoring tools that want to report performance data will typically connect to that daemon and ask for the data that they're interested in and then report that. So there's a very clear separation in PCP between the components that extract data from whatever domain that they're interested in and the tools that are going to report it or record it or take some action based on that data. So it's fundamentally... The collector architecture is fundamentally pluggable. Like in the diagram here, we have application plugins that might know about performance data for an application. Like response times, how many things are happening per second within the application. How many requests are happening per second. You might have an instrumented mail queue. You might have a database that's instrumented. Always the kernel is highly instrumented. All of this data is available on any collector system and you can plug in new pieces of your own. So if you have metrics in your own domain that you're interested in, you can also make those available through PCP for the monitoring tools. And there's many, many monitoring tools. So here on the blue side, we have just a few examples of some of the commonly used ones. But there are many, many different tools for a part of PCP. So the PM logger process is the one that is used to record performance data. So you can record any performance data. And that is then available for replay in the other client tools with the same sort of interfaces as the live mode. There's a tool for displaying strip charts. The PM chart tool lets you plot graphs of your choosing for arbitrary metrics. There's a tool called PMIE that lets you make decisions. I can feed a bunch of rules about performance and it will analyze those rules, evaluate them on some interval that you ask it to and then take actions based on those evaluations. So this is kind of the high-level concepts of PCP. Later on in the talk, we want to look into how we can make this kind of architecture work in the container world. You probably see fairly quickly that these system-level tools don't necessarily fit so well into the container world because they're designed for a single machine exporting data about that one machine. Whereas with containers, we might have many containers, all of which are pretending to be smaller, very lightweight versions of machines, and they all have their own metrics as well that we want to export through this system. So I'll be coming back to that later how we've started to tackle that. So the last slide in terms of introducing PCP concepts. One of the core concepts of PCP is this idea of a performance metric, which is basically anything that you can measure on your system. So anything that you can measure, we need to represent it in some way. In PCP, for every metric that you might make available, there's a series of pieces of metadata that need to go along with that performance metric that define what it is for the monitoring tools to be able to interpret it. So every metric is represented in the performance metric namespace. So in my example here, I've got an example command, PMINFO, which is just another one of the client or reporting tools. It's looking at a particular metric here, which is the number of read operations across all devices. So that metric has the name disk.dev.read. So that metric name is metadata that's associated with the metric. And in this example, I'm just asking for the metric descriptor, which is the information about the metrics and the values. That's what the fetch does, and the minus TT asks for the various forms of help text associated with that metric. So we can see all of the metadata that's associated with this particular metric here, and that gives us a good example to see the kinds of things that must be associated with every metric. So we need to know its data type. So this particular metric is exported from the kernel as a 32-bit unsigned integer. I'll come back to the Indom concept in a minute. The semantics of disk.dev.read are that it's a monotonically increasing counter. There are several different types of metrics that are exportable. So we have counters that always increase. So we have what we call in PCP instantaneous metrics. They're called gauges in some other systems, but their metrics, whose values may change in arbitrary directions every time you sample them. And they would be things like the number of users that are logged into the system, things like that. We also have the units. For this particular metric, the units are just that it's a count. Other metrics, you might have units that are things like it might be measured in kilobytes or it might be measured in milliseconds, nanoseconds. That's more metadata that needs to be associated with every metric so that the client tools can correctly interpret and report that particular metric. You can optionally associate help text with every metric. And with all the kernel metrics, we go to great lengths to provide help text to explain what every metric the kernel's exports is as best we can. And if you're writing new components that are plugging in metrics, you get to add the help text that's relevant for your component that you're adding. And finally, we have, of course, values that are associated with every metric. So a metric might be a singleton value like that example I gave before, the number of users that are logged in. That's just a single number for any one system. But you could also have a set of values like with this example, which is the read operations across all devices. Well, the set of values is going to be expanded across all the devices in the system. So in my example here, I have two devices, the STA and STB devices, and separate values associated with each of those. But they share all the other metadata. So we don't need to repeat that metadata for every instance. So that's a high-level overview of PCP for anyone who hasn't come across it before. Those are the core concepts. PCP is a huge project. It's been running for many years. We're talking hundreds of thousands of lines of code. So that's just a very brief overview of things. I want to take a segue now and talk about containers for a while. So even if you're not really interested in PCP, hopefully you can get something out of this section of the talk where I talk about the instrumentation that's available that the kernel maintains for the concepts that sit behind containers. So the kernel doesn't actually know anything about containers, as you may know. It has other concepts which containers are built upon. So containers is a user space concept, not a kernel concept. So in the kernel we have... Was that five minutes, John? Yeah. Okay. So we have C groups and namespaces in the kernel. And those are the two concepts that all of containers are built on. So C groups are a way of associating groups of processes and controlling them together. There's a set of statistics associated with each C group that is created within the kernel, and things like Docker will create C groups for each container. Associated with each C group are all these metrics that are available to you to analyze what's happening within that particular container. And there's several different subsystems. So there's a BlockIO C group, there's a CPU accounting C group, there's a memory C group, and there's one or two or three other C groups as well, but these are the core ones that have statistics available at a controller level, which a C group level, which ultimately becomes a container level. And these are some of the sorts of things that we can see at a C group level. So for each of those things, you get IOPS and throughput per device, IOs that have been issued from within a C group, which is ultimately from a container, and service and wait time. They can be aggregate or per device, so you can see for each device that is IOs submitted to or from within a container, you can see which devices it's written to. Those are exported by the kernel in a way that's relatively user unfriendly, so they have device major and minor numbers, so user space tools then get to decode that and try and present that in a useful way. So that's one of the things we do in PCP. Because I'm running out of time, I'll skip through. So I'll skip through some of those metrics. If you're interested in those things, the place to look for them is at the C group mount points. Usually these days that is slash this FSC group and then the name of the container and the naming of the C group, sorry, is dependent on the system you're using. So if you use Docker, it will name its C groups a certain way. If you use other systems, they get named differently. And that's one of the problems which we have to deal with in PCP as we try to map containers to the kernel concepts below them. The other concept the kernel has is this idea of namespaces, which if you know anything about containers, you'll have come across this, and they influence the behaviour of the processes that are running within a container. For example, if you're in a shell inside a container and you cat-procnet-dev, which is where the network device statistics come from, like the IO traffic for each network device, that looks different on the inside of a container from the host that's running the container as it does to other containers. So as you can see, these are all problems that are going to have to be dealt with in tools like PCP that are system-level tools. So I'll skip through that. I've talked about that. And namespaces are inherited across fork and clone and processes within a container share a common view using this namespace concept. So I'll talk now the last couple of slides, the last slide or two about the work that we've been doing. Oops, that went back. So I'll just use the same example again. So some of the goals that we wanted to use to tackle in PCP was to allow the tools to start targeting individual containers. So typically, historically, you've been looking at a full system. Now we want to be able to focus in on individual containers. So in my example before, I did a PMInfo requesting values for all of the network devices in the system. We want to be able to say, well, not for the whole system, but just for what's happening within that container, a named container tell me the traffic that's been generated from that container rather than from the entire system. So we've added... Actually, I'll talk about what we've added in the next slide, but effectively, we want to be able to say, PMInfo fetch values just for a container named crank and the network values, and that should come up with the network interfaces from that container rather than all of the interfaces on the system. Another important goal that we wanted to tackle was to not have to require people to install PCP and have the PMCD daemon running inside every container. That's kind of the obvious solution to this because then PMCD has the correct namespaces when it's running, but that requires a lot more software installation and software maintenance within inside the... every individual container that you might want to manage. So we want to be able to have either a PMCD running on the host, on which the containers are running, or a privileged container which can run PMCD and is able to switch into and out of the namespaces of the other containers. I talked a little bit about the device major miner thing, so there's one of our little side goals is to just make it easier to do analysis of containers and solve little problems like that for you. So you get the names like SDA, STB, rather than device major miner. And be able to do final goal data reduction is one of the key goals. So we're trying to look at just individually named containers rather than all of the containers on the system all at once. And that affects things like the process set. So if you're running a tool like top, if you run it on the base system, you're going to see all of the processes. If you just want to see the processes that are running in a container, we can use PCP now to do that kind of thing. So just restrict it to the set of processes that are running in a container and which ones within that container are generating CPU load or whatever load you're interested in. The last one. So I'll probably have to skip over most of this slide, but you can probably get the gist of where we were going from the last slide. The critical concept that we've been introducing recently is this ability to switch namespaces and provide information about containers of interest throughout the infrastructure that is PCP. So that starts from the client tools like PMInfo here. You might be connecting to a particular host and you're saying, on that host, I'm interested in container named Crank and I'm interested in the network metrics. So there were a series of wire protocol extensions to allow a container identified to be passed over to the remote system and then PMCD on that remote system is then able to say, you're interested in this container and looks on its systems, figures out what C groups or what other things that maps to which namespaces are associated with that container and can allow the collector system to switch in and out of that namespace as it needs to so that it reports the correct network devices for that container. Excuse me. So a quick look at the status that we're up to. The next PCP release will include this code in and this will be within the next couple of weeks. This is the first release that will have any of these concepts in it. There's a lot of work still going on within PCP to just improve this and make it slicker as we go forward and I expect as we go down the track we'll be wanting to write new monitor tools that are specific to containers so they know about C groups in particular and are able to like tools like VMSTAT, IOSSTAT and other tools like that today we'll be building tools that are container specific and so report the activity within the containers. So I didn't really get much time to talk about the details of those kernel metrics that are available for C groups but they're actually not the same as the metrics that are available for the raw host so you can't just run VMSTAT on a container and expect it to report something useful about that container. There's whole new tools that will need to be written. And that's the small links about PCP and then I'm done. Sorry for going out of time. We have time for tapping one question. Make it a good one. Thank you. No problem. Thank you.