 Hi, I'm Josh McDonald, member of the Open Telemetry Technical Committee and an engineer at LightStep. I'll be talking about metrics, instruments, and the requirements that gave us the up-down counter, a new kind of instrument. Let's review how we got here. Metrics systems have been around a long time, but I'm thinking about what happened around 10 years ago. Looking at open source systems prior to then, we had just one kind of metric instrument. I'll call it the number instrument. We would use it to report numbers with timestamps, and then we would plot those numbers as a function of time. The result is numbers in a line. The nice thing about numbers is they support math. You can do a lot with numbers, but first you should know what they mean. And 10 years ago, the interfaces we used to report metrics began to change. Why? Because it helps the metric system help the user when numbers have meaning. Back then, around 10 years ago, there were a couple of changes in metrics interfaces. The counter instrument was introduced with an interface dedicated to counting. This is a convenience for the programmer, since they no longer have to track a running total. This also made it easier for the metric system to correctly compute rates. The semantics of a counter instrument are that resets are not meaningful. Another thing that happened with metrics APIs around 10 years ago is they changed to support attributes. Now you could decorate metrics events with a list of attribute values, and every distinct combination of attributes used would produce separate lines of numbers. In the diagram pictured, there are three dimensions, A, B, and C that categorize the data. We can picture the metric instrument, in this case, producing a cube of number lines with coordinates equal to the category values for attributes A, B, and C. Now, still talking about progress that happened around 10 years ago, these two innovations, the counter and the attributes, work really well together. When we add a new attribute to a counter, the result is a new way to subdivide the count, which generates new ways to group and aggregate the same data. Counter data for specific attribute combinations can be compared with other attribute combinations or divided by the total to form ratios and fractions. There are two diagrams shown and two snippets of instrumentation code. These two examples are both CPU usage counters, but one tracks usage on a per-CPU basis and the other tracks total usage. This shows how we can remove an attribute from a counter, either at the source or when displaying data, without a change of meaning. To remove a CPU label or any label from a counter, make sure to sum the individual counts. When the counter was introduced and defined to support resets, it meant that they couldn't be used for counting up and down. For counts that go up and down, a gauge instrument was recommended instead. Here's the Prometheus documentation for gauge. A gauge is a metric instrument that represents a single numerical value that can go arbitrarily up and down. Gages are typically used for measured values like temperatures or current memory usage, but also counts that can go up and down, like the number of concurrent requests. As a consequence, we have one instrument with two interfaces. Use set to record the current value and use add, subtract, increment, and decrement to change the current value. And here we are. Open telemetry has separated these two interfaces. The requirement that led to separate interfaces for up-down counter for things you count and for gauges for things you measure is best described as follows. Open telemetry metrics processors must be able to remove attributes for metrics data without a change of meaning. We want to be sure that when we remove an attribute, the result is just the same as if the attribute had never been recorded in the first place. When an instrument is being used for counting, attribute removal means computing the sum, which is the same count that we would have achieved had the attribute never been used. The point of all this is that a metric system can automatically transform metrics data as a form of cost control and to produce meaningful visualizations without knowing how the instrument is defined, as long as the system knows the kind of instrument that was used. To view any metric at the cluster level, a job level, or a rack level, simply remove the irrelevant attributes. However, this would not be possible with just two kinds of number instrument. We needed a third instrument, the up-down counter. An up-down counter is something like a counter and something like a gauge. Up-down counter acts like a counter in the sense that attributes give new ways to subdivide the metric, defining meaningful ratios and fractions, and in the sense that the natural aggregation is a sum. Up-down counter acts like a gauge in the sense that there is not a reset operation and in the sense that it defines a current value and is not primarily used to define a rate. These instruments exist in the real world. They generally report the difference between two counter values, and where we find these being used is because the two counter values would not be considered useful on their own. One real world example is a parking lot counter. A device that counts how many cars enter and exit a parking lot. This instrument can be used to display how many spaces are available on each floor of a parking lot structure. For example, another real world example is the energy meter on a home with a solar array attached to the electricity grid. When the solar array is feeding energy to the grid, the count goes down. When the home draws power from the grid, the count goes up. Let's review. Copent telemetry offers three instruments for reporting numbers, the counter, the up-down counter, and the gauge. These three instruments have distinct use cases. Counters are useful for monitoring rates, up-down counters for monitoring totals, and gauges for monitoring measurements and other derived quantities. When we aggregate metrics data, particularly when we remove attributes to save cost or reduce cardinality, there is a natural aggregation that applies. For counters and up-down counters, it is natural to group by the sum. For gauges, it is natural to group by the mean. Now you may think that if your instrumentation is doing the right thing, you won't need to remove attributes in your metrics pipeline. While that may be true, when it comes to inspecting and visualizing metrics data, it is easy to create metrics where there are too many number lines to effectively view them at once, say per host metrics in a large cluster. When we have too much cardinality to effectively view metrics data, we need to remove attributes, and that's all there is to it. Up-down counter is distinct from counter-engage instruments in ways that help a metric system to control costs, because whether writing metrics data or reading it, it is useful to be able to remove attributes without a change of meeting. Thank you.